Fennec Memory Module
Production-Grade Intelligent Memory System for LLM Applications
Table of Contents
- Overview
- System Architecture
- Core Concepts
- Quick Start Guide
- Public API Reference
- Configuration System
- Security Model
- Storage Backends
- Observability & Metrics
- Edge Cases & Failure Handling
- Advanced Usage
1. Overview
The Fennec Memory Module is a multi-layer intelligent memory system designed to give LLM applications and AI agents persistent, semantically searchable, and privacy-aware memory across conversations and sessions.
It mirrors human cognitive memory architecture — short-term recall, active working memory, and long-term persistence — and ties them together with vector similarity search, biologically-inspired forgetting, LLM-assisted compression, and per-tenant isolation.
Why It Exists
Standard LLM applications are stateless. Each request starts from zero, forcing developers to manually manage conversation history, hit context limits, and lose important context between sessions. The Fennec Memory Module solves this by:
- Automatically promoting important interactions to long-term persistent storage
- Retrieving the most relevant memories for any query using semantic search
- Keeping context within token budgets via intelligent selection and compression
- Isolating memory per tenant in multi-user systems
- Scrubbing PII before storage with an optional encryption layer
Real-World Use Cases
| Use Case | How the Module Helps |
|---|---|
| Multi-session chatbots | Remembers past preferences, decisions, and context across sessions |
| AI agents & planners | Working memory holds the active reasoning context for the current turn |
| RAG pipelines | Combines retrieved documents with relevant personal memories in a single context |
| Multi-tenant SaaS | Complete tenant isolation; each organization's memory is fully private |
| Personalized assistants | User profiles adapt over time to improve retrieval and response quality |
| Long conversation summarization | LLM-based compressor condenses hundreds of turns into compact summaries |
2. System Architecture
Pipeline Overview
┌──────────────────────────────────────────────────────────────────┐
│ WRITE PATH │
│ │
│ user_input + assistant_output │
│ │ │
│ ▼ │
│ [SensitiveDataMasker] ← optional PII redaction │
│ │ │
│ ▼ │
│ [MemoryEncryptor] ← optional Fernet encryption │
│ │ │
│ ├──────────────────────────────────────────────────────┐ │
│ ▼ │ │
│ [ShortTermMemory] ← in-memory sliding window │ │
│ │ │ │
│ ├─── background thread ──► [SemanticMemory] │ │
│ │ FAISS / NumPy vector index │ │
│ │ │ │
│ └─── if importance ≥ threshold ──► [LongTermMemory] │ │
│ SQLite persistence │ │
│ │ │
│ [UserProfileManager] ← updated with topics & importance ◄─┘ │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ READ PATH │
│ │
│ query string │
│ │ │
│ ├──► [SemanticMemory.search] cosine similarity top-K │
│ ├──► [LongTermMemory.get_top_*] importance-ranked │
│ └──► [ShortTermMemory.get_all] recent window │
│ │
│ ───── merge + deduplicate ───── │
│ │ │
│ ▼ │
│ [MemorySelector] ← composite score: relevance × recency × importance
│ │ │
│ ▼ │
│ [WorkingMemory] ← token-budgeted active context │
│ │ │
│ ▼ │
│ [ContextBuilder] ← assembles: profile + memories + docs + history
│ │ │
│ ▼ │
│ BuiltContext.full_text → inject into LLM prompt │
└──────────────────────────────────────────────────────────────────┘Component Responsibilities
| Component | Responsibility |
|---|---|
ShortTermMemory |
Sliding deque of the most recent N interactions. Fast, in-memory, FIFO eviction. |
WorkingMemory |
Active-context store for the current reasoning turn. Rebuilt each call by MemorySelector. Enforces a token budget. |
LongTermMemory |
SQLite-backed persistent store. Promotes high-importance entries across sessions. Supports global decay. |
SemanticMemory |
FAISS (or NumPy fallback) vector index. Embeds content with sentence-transformers and retrieves by cosine similarity. |
MemorySelector |
Scores every candidate memory on three axes (relevance, recency, importance) and returns the top-K within a token budget. |
MemoryCompressor |
Noise removal, near-duplicate merging (cosine threshold), and LLM-based summarisation. Runs during maintenance. |
ForgettingMechanism |
Applies Ebbinghaus-inspired exponential decay modulated by access frequency. Marks low-importance entries for deletion. |
ContextBuilder |
Assembles the final context string from profile, memories, RAG documents, and conversation history. Enforces per-section token budgets. |
UserProfileManager |
Tracks per-user preferences, topic frequency, and a memory_retention_boost factor. Persists profiles as JSON. |
SensitiveDataMasker |
Regex-based PII redaction (email, credit card, SSN, phone, IBAN, IP, JWT, API keys, passwords). |
MemoryEncryptor |
Fernet (AES-128-CBC + HMAC-SHA256) symmetric encryption. Degrades to base64 if cryptography is absent. |
AccessController |
RBAC system with built-in reader, writer, owner, admin roles per (user_id, tenant_id) pair. |
Design Philosophy
- Layered degradation. Every component falls back gracefully: FAISS → NumPy, real Fernet → base64,
sentence-transformers→ deterministic stub embeddings. The system never hard-fails at import time. - Async-friendly. All core methods have
asyncwrappers (asave_interaction,abuild_context,arun_maintenance) backed by a thread pool, so they integrate cleanly intoasyncioapplications. - Tenant-first. Memory is namespaced at every layer. Cross-tenant data leakage is architecturally prevented.
- Importance-driven. A single
importancefloat (0–1) controls promotion to LTM, ordering in context, and eviction priority. It decays over time and is boosted by access.
3. Core Concepts
3.1 Memory Types (MemoryType)
from memory.core import MemoryType| Value | String Aliases | Description |
|---|---|---|
MemoryType.SHORT_TERM |
"short_term", "short", "st" |
Temporary; limited to window_size recent entries. |
MemoryType.LONG_TERM |
"long_term", "long", "lt" |
Persistent across sessions via SQLite. |
MemoryType.WORKING |
"working", "work", "wm" |
Active-context; rebuilt each turn. |
MemoryType.EPISODIC |
"episodic", "episode", "ep" |
Event-bound memories with temporal context. |
MemoryType.SEMANTIC |
"semantic", "sem" |
General facts without temporal binding. |
MemoryType.PROCEDURAL |
"procedural", "proc" |
Skills and repeatable procedures. |
3.2 MemoryEntry
Every piece of stored information is a MemoryEntry dataclass.
@dataclass
class MemoryEntry:
content: Any # The stored data (str, dict, or any serialisable type)
timestamp: float # Unix creation time
memory_type: MemoryType
importance: float = 0.5 # 0.0–1.0; controls promotion, ordering, eviction
access_count: int = 0 # Incremented on every read
tags: List[str] = []
metadata: Dict[str, Any] = {}
embedding: Optional[List[float]] = None # Set by SemanticMemory
last_access: float = time.time()
decay_factor: float = 1.0 # Multiplied into importance on decay cycles
original_importance: Optional[float] = None # Preserved from creation for audit/resetKey computed properties:
| Property | Type | Description |
|---|---|---|
entry.id |
str |
MD5-based 16-char hex ID derived from content + timestamp + type. |
entry.effective_importance |
float |
importance × decay_factor. Used for all ranking decisions. |
entry.priority |
MemoryPriority |
MemoryPriority enum derived from effective_importance (see table below). |
entry.age_seconds |
float |
Floating-point age since creation in seconds. |
entry.age_days |
float |
Floating-point age since creation in days. |
entry.time_since_access_seconds |
float |
Elapsed seconds since the last entry.access() call. |
entry.has_embedding |
bool |
True if a vector embedding has been computed. |
MemoryPriority Thresholds
effective_importance |
MemoryPriority |
|---|---|
| ≥ 0.9 | CRITICAL |
| ≥ 0.7 | HIGH |
| ≥ 0.5 | MEDIUM |
| ≥ 0.3 | LOW |
| < 0.3 | MINIMAL |
Key instance methods:
entry.access() # Increments access_count, updates last_access,
# and applies a small importance boost ∝ (1 − importance).
entry.apply_decay(decay_rate: float = 0.1) # Manually apply one decay cycle in place.
entry.add_tag(tag: str) # Append a tag if not already present.
entry.has_tag(tag: str) -> bool # Check for tag membership.
entry.to_dict(include_embedding: bool = False) -> dict # Serialise to plain dict.
entry.to_json() -> str # Serialise to JSON string.3.3 Importance & Decay
Each MemoryEntry starts with a user-supplied importance value (0–1). Over time:
ForgettingMechanismcomputes an effective decay rate:base_decay_rate / (1 + recency_boost_per_access × access_count). More accesses → slower decay.- It applies exponential decay:
I(t) = I₀ × exp(−λ × hours_since_last_access). - Entries whose
effective_importancefalls belowdeletion_thresholdare marked for removal.
This mirrors the Ebbinghaus forgetting curve modulated by spaced repetition.
3.4 Semantic Search
SemanticMemory uses sentence-transformers (default model: all-MiniLM-L6-v2) to convert text to L2-normalised float32 vectors. All vectors are stored in a FAISS IndexFlatIP (inner-product index, equivalent to cosine similarity on normalised vectors). When FAISS is unavailable, a pure-NumPy brute-force fallback is used transparently.
Retrieval: SemanticMemory.search(query, k) embeds the query and returns the top-K entries with their cosine similarity scores.
3.5 Multi-Tenancy
Every memory layer accepts a tenant_id. Entries are namespaced so a query for tenant_id="acme" can never surface entries belonging to tenant_id="startup". The AIMemoryManager maintains a _TenantBundle per tenant containing independent STM, WorkingMemory, LTM, and SemanticMemory instances.
3.6 Memory Selection (Composite Scoring)
MemorySelector scores each candidate on three axes and combines them linearly:
composite = w_relevance × cosine_sim
+ w_recency × exp(−λ × hours_old)
+ w_importance × entry.effective_importanceDefault weights: relevance=0.50, recency=0.20, importance=0.30. All weights must sum to 1.0. Only entries above min_composite_score (default 0.10) are selected, and the set is capped by top_k and a token budget.
3.7 Context Assembly
ContextBuilder assembles a structured context string from four prioritised sections:
| Section | Default Token Fraction | Content |
|---|---|---|
user_profile |
5% | Compact user preferences and interests |
memories |
30% | Selected MemoryEntry objects formatted as text |
documents |
45% | RAG-retrieved Document chunks |
history |
20% | Recent conversation turns from STM |
Each section is individually token-budgeted and truncated (never dropped) if it exceeds its allocation.
4. Quick Start Guide
Installation Dependencies
pip install fennec-memory
pip install sentence-transformers # embeddings
pip install faiss-cpu # fast vector search (optional, falls back to NumPy)
pip install cryptography # Fernet encryption (optional)Minimal Working Example
from fennec_memory.memory import AIMemoryManager, MemoryConfig
# 1. Configure
config = MemoryConfig(
persistence_path="./my_memory",
importance_threshold=0.6,
max_tokens=4000,
)
# 2. Initialise
mgr = AIMemoryManager(config=config)
# 3. Save an interaction
mgr.save_interaction(
user_id="alice",
tenant_id="acme",
user_input="How do Python decorators work?",
assistant_output="Decorators are higher-order functions that wrap another function.",
importance=0.8,
topics=["python"],
)
# 4. Build context for the next query
ctx = mgr.build_context(
user_id="alice",
tenant_id="acme",
query="Show me a decorator example.",
)
# 5. Inject into your LLM
prompt = ctx.full_text + "\n\nUser: Show me a decorator example.\nAssistant:"Get → Fallback → Put Pattern
def get_response(mgr, user_id, tenant_id, query):
# Build memory-enriched context
ctx = mgr.build_context(user_id, tenant_id, query)
# Call your LLM
response = my_llm.generate(ctx.full_text + "\n\n" + query)
# Store the interaction back
mgr.save_interaction(
user_id=user_id,
tenant_id=tenant_id,
user_input=query,
assistant_output=response,
importance=0.7,
)
return response5. Public API Reference
5.1 AIMemoryManager
The central orchestrator. Instantiate once per application and share across request handlers.
from fennec_memory.memory import AIMemoryManagerConstructor
AIMemoryManager(
config: Optional[MemoryConfig] = None,
llm: Optional[LLMProtocol] = None,
*,
encryption_key: Optional[str] = None,
enable_privacy_masking: bool = True,
max_workers: int = 4,
)| Parameter | Type | Required | Description |
|---|---|---|---|
config |
MemoryConfig |
No | System configuration. If None, defaults are used. |
llm |
LLMProtocol |
No | Any object with .generate(prompt: str, max_tokens: int) -> str. Required for LLM-based compression and summarisation. |
encryption_key |
str |
No | Secret key for Fernet symmetric encryption of stored content. If None, content is stored in plaintext. |
enable_privacy_masking |
bool |
No | When True (default), PII is redacted before storage. |
max_workers |
int |
No | Thread pool size for background semantic indexing and async wrappers (default: 4). |
save_interaction
Persists one conversation turn through all memory layers.
def save_interaction(
self,
user_id: str,
tenant_id: str,
user_input: str,
assistant_output: str,
*,
importance: float = 0.5,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
topics: Optional[List[str]] = None,
) -> MemoryEntry| Parameter | Type | Required | Description |
|---|---|---|---|
user_id |
str |
Yes | Identifies the user within the tenant. |
tenant_id |
str |
Yes | Tenant namespace for isolation. |
user_input |
str |
Yes | The user's message text. |
assistant_output |
str |
Yes | The assistant's response text. |
importance |
float |
No | Initial importance score 0–1 (default: 0.5). Values ≥ config.importance_threshold trigger LTM promotion. |
tags |
List[str] |
No | Searchable tags attached to the entry. |
metadata |
Dict[str, Any] |
No | Arbitrary key-value metadata (e.g., session ID, request ID). |
topics |
List[str] |
No | Topic labels used to update the user's profile topic frequency map. |
Returns: MemoryEntry — the stored entry with its generated id.
Internal steps:
- Access check (write permission for
user_idontenant_id). - PII masking of both
user_inputandassistant_output. - Optional Fernet encryption.
- User profile
memory_retention_boostapplied toimportance. - Entry added to
ShortTermMemory. - Semantic embedding and indexing submitted to background thread pool.
- If
importance ≥ config.importance_threshold, entry promoted toLongTermMemory. UserProfileManagerupdated withtopicsandimportance.
entry = mgr.save_interaction(
user_id="alice",
tenant_id="acme",
user_input="What is RAG?",
assistant_output="RAG is Retrieval-Augmented Generation ...",
importance=0.9,
tags=["AI", "RAG"],
topics=["AI", "LLM"],
metadata={"session_id": "session_001"},
)
print(entry.id) # e.g. "3f2a1b9c7e4d5a6f"build_context
Retrieves relevant memories and assembles a BuiltContext ready to inject into an LLM prompt.
def build_context(
self,
user_id: str,
tenant_id: str,
query: str,
*,
documents: Optional[List[Document]] = None,
top_k_semantic: int = 8,
top_k_ltm: int = 10,
include_profile: bool = True,
) -> BuiltContext| Parameter | Type | Required | Description |
|---|---|---|---|
user_id |
str |
Yes | User whose profile and access rights are applied. |
tenant_id |
str |
Yes | Tenant to query memories from. |
query |
str |
Yes | The current user question or task description. |
documents |
List[Document] |
No | External RAG-retrieved documents to include in context. |
top_k_semantic |
int |
No | Number of results from semantic search (default: 8). |
top_k_ltm |
int |
No | Number of top-importance entries from LTM (default: 10). |
include_profile |
bool |
No | Whether to inject the user profile section (default: True). |
Returns: BuiltContext with fields:
| Field | Type | Description |
|---|---|---|
full_text |
str |
The complete assembled context string. Inject directly into your prompt. |
token_estimate |
int |
Estimated token count of full_text. |
sections |
Dict[str, str] |
Individual sections: user_profile, memories, documents, history. |
truncated |
bool |
True if any section was truncated to fit its budget. |
document_count |
int |
Number of documents included. |
memory_count |
int |
Number of memory entries included. |
Internal steps:
- Semantic search over STM + LTM embeddings.
- Top-K retrieval from LTM by importance.
- Merge and deduplicate all candidates.
MemorySelectorscores and filters by composite score and token budget.- Selected entries loaded into
WorkingMemory. - User profile text retrieved.
ContextBuilderassembles the finalBuiltContext.
from fennec_memory.memory import Document
ctx = mgr.build_context(
user_id="alice",
tenant_id="acme",
query="How do I implement a caching layer?",
documents=[
Document(page_content="Redis supports LRU eviction ...", metadata={"source":"redis_docs"}, doc_id="doc_001")
],
top_k_semantic=5,
)
# Use ctx.full_text as the LLM context block
print(ctx.full_text)
print(f"Memories used: {ctx.memory_count}, truncated: {ctx.truncated}")build_prompt
Convenience wrapper that returns a complete, ready-to-send prompt string.
def build_prompt(
self,
user_id: str,
tenant_id: str,
query: str,
*,
documents: Optional[List[Document]] = None,
system_instruction: str = "You are a helpful, knowledgeable assistant.",
) -> strprompt = mgr.build_prompt(
user_id="alice",
tenant_id="acme",
query="Explain async/await in Python.",
system_instruction="You are an expert Python engineer.",
)
response = my_llm.generate(prompt)run_maintenance
Executes a full maintenance pass for a tenant: decay, eviction, compression, forgetting.
def run_maintenance(self, tenant_id: str) -> Dict[str, Any]Returns a report dict with keys:
| Key | Description |
|---|---|
stm_evicted |
STM entries removed for falling below min_importance. |
ltm_decay_updated |
LTM entries that had their importance updated. |
ltm_deleted |
LTM entries deleted for falling below min_importance. |
working_forgotten |
WorkingMemory entries marked forgotten. |
ltm_compression |
Dict with merges and noise_removed counts (only present when LTM is >80% full). |
Schedule this to run periodically (e.g., once per hour via APScheduler or a background task):
import asyncio
async def maintenance_loop(mgr, tenant_id, interval_seconds=3600):
while True:
report = await mgr.arun_maintenance(tenant_id)
print(f"Maintenance: {report}")
await asyncio.sleep(interval_seconds)grant / revoke
Manage user access roles within a tenant.
mgr.grant(user_id: str, tenant_id: str, role: str = "writer") -> None
mgr.revoke(user_id: str, tenant_id: str, role: str) -> NoneBuilt-in roles:
| Role | Permissions |
|---|---|
reader |
READ |
writer |
READ, WRITE |
owner |
READ, WRITE, DELETE |
admin |
READ, WRITE, DELETE, ADMIN |
mgr.grant("alice", "acme", role="owner")
mgr.grant("bob", "acme", role="writer")
mgr.grant("guest", "acme", role="reader")Default behaviour: If no roles have been assigned for a given tenant, all write operations are permitted by default. The moment
grant()is called for any user in a tenant, strict enforcement activates for all users in that tenant. Explicitly assign roles to all users before making the firstgrant()call if you require fine-grained control.
stats
Returns a comprehensive statistics snapshot for a tenant.
def stats(self, tenant_id: str) -> Dict[str, Any]s = mgr.stats("acme")
# s["stm"] → ShortTermMemory stats
# s["working"] → WorkingMemory stats
# s["ltm"] → LongTermMemory stats
# s["semantic"]→ SemanticMemory stats
# s["profile_manager"] → aggregate user profile stats
# s["encryption_active"] → bool
# s["privacy_masking_active"] → boolclear_tenant
Wipes all memory for a tenant across every layer. Irreversible.
def clear_tenant(self, tenant_id: str) -> NoneAsync API
All three primary operations have async counterparts:
entry = await mgr.asave_interaction(user_id, tenant_id, user_input, assistant_output, **kwargs)
ctx = await mgr.abuild_context(user_id, tenant_id, query, **kwargs)
report = await mgr.arun_maintenance(tenant_id)These run the synchronous methods in the internal ThreadPoolExecutor and are safe to await from any async context.
Note: The async methods are not native coroutines — they dispatch to a
ThreadPoolExecutorinternally. Blocking I/O inside the thread pool may become a bottleneck ifmax_workersis low under high concurrency. Increasemax_workersaccordingly.
5.2 MemoryConfig
The top-level configuration dataclass.
from fennec_memory.memory import MemoryConfig
config = MemoryConfig(
max_short_term=1000,
max_long_term=1000,
max_working=1000,
importance_threshold=0.7,
similarity_threshold=0.85,
retrieval_limit=10,
embedding_model="all-MiniLM-L6-v2",
embedding_batch_size=32,
embedding_cache_size=1000,
normalize_text=False,
preserve_case=True,
enable_persistence=True,
persistence_path="./memory_storage",
auto_save_interval=300,
enable_decay=True,
decay_rate=0.1,
min_importance=0.1,
enable_consolidation=True,
consolidation_interval=3600,
max_tokens=2000,
window_size=5,
log_level="INFO",
enable_stats=True,
)Full parameter reference:
| Parameter | Type | Default | Description |
|---|---|---|---|
max_short_term |
int |
1000 |
Maximum entries in ShortTermMemory (sliding window). |
max_long_term |
int |
1000 |
Maximum entries tracked in LTM (database has no hard cap). |
max_working |
int |
1000 |
Maximum entries in WorkingMemory per turn. |
max_episodic |
int |
1000 |
Capacity hint for episodic memory. |
importance_threshold |
float |
0.7 |
Minimum importance for LTM promotion. Must be 0–1. |
similarity_threshold |
float |
0.85 |
Cosine similarity threshold for duplicate detection. |
retrieval_limit |
int |
10 |
Default top_k for MemorySelector. |
embedding_model |
str |
"all-MiniLM-L6-v2" |
sentence-transformers model name. |
embedding_batch_size |
int |
32 |
Batch size for bulk embedding generation. |
embedding_cache_size |
int |
1000 |
LRU cache size for the embedding model. |
normalize_text |
bool |
False |
Lowercase text before storage. |
preserve_case |
bool |
True |
Keep original casing. |
enable_persistence |
bool |
True |
Enable SQLite LTM and JSON profile persistence. |
persistence_path |
str |
"./memory_storage" |
Base directory for all on-disk data. |
auto_save_interval |
int |
300 |
Auto-save interval in seconds (used by profile manager). |
enable_decay |
bool |
True |
Enable time-based importance decay. |
decay_rate |
float |
0.1 |
Base importance loss per day (at zero access count). |
min_importance |
float |
0.1 |
Entries below this floor are evicted during maintenance. |
enable_consolidation |
bool |
True |
Enable periodic memory consolidation. |
consolidation_interval |
int |
3600 |
Consolidation interval in seconds. |
max_tokens |
int |
2000 |
Token budget for ContextBuilder and WorkingMemory. |
window_size |
int |
5 |
ShortTermMemory sliding window size (number of turns). |
log_level |
str |
"INFO" |
Python logging level. |
Environment Variable Bootstrap
config = MemoryConfig.from_env()Supported environment variables:
| Variable | Corresponds To |
|---|---|
MEMORY_MAX_SHORT_TERM |
max_short_term |
MEMORY_MAX_LONG_TERM |
max_long_term |
MEMORY_EMBEDDING_MODEL |
embedding_model |
MEMORY_PERSISTENCE_PATH |
persistence_path |
MEMORY_ENABLE_PERSISTENCE |
enable_persistence (any truthy string) |
to_dict() — Serialisation
d = config.to_dict()Converts the full MemoryConfig to a plain dictionary. Useful for logging, auditing, or persisting configuration state.
5.3 Conversation Memory Classes
These are lightweight, LangChain-compatible memory classes for direct conversational use. They implement BaseMemory with save_context / load_memory_variables / clear, and expose async variants (asave_context, aload_memory_variables).
ConversationBufferMemory
Stores the full conversation history without any truncation.
from fennec_memory.memory import ConversationBufferMemory
mem = ConversationBufferMemory(
return_messages=False, # True → returns List[dict], False → returns text string
input_key="input",
output_key="output",
memory_key="history",
)
mem.save_context({"input": "Hello"}, {"output": "Hi there!"})
variables = mem.load_memory_variables({})
# variables["history"] → "user: Hello\n assistant: Hi there!"Best for: short conversations where full context fits in the LLM's window.
ConversationBufferWindowMemory
Keeps only the last k interactions.
from fennec_memory.memory import ConversationBufferWindowMemory
mem = ConversationBufferWindowMemory(k=5) # remembers last 5 turns
info = mem.get_window_info()
# {"current_size": 3, "max_size": 5, "available_slots": 2, "is_full": False}Best for: long conversations where only recent context matters.
ConversationSummaryMemory
Automatically summarises old turns using an LLM when the estimated token count exceeds max_token_limit.
from fennec_memory.memory import ConversationSummaryMemory
mem = ConversationSummaryMemory(
llm=my_llm,
max_token_limit=2000,
)
mem.save_context({"input": "..."}, {"output": "..."}) # triggers summarisation when needed
summary = mem.get_summary() # retrieve the current running summary
mem.force_summarize() # trigger summarisation immediatelyThe LLM is called with a summary_prompt_template that includes the previous summary and the new turns. On failure, the last 5 turns are preserved as a fallback.
Best for: very long conversations that must stay within a strict token budget.
ConversationEntityMemory
Tracks named entities (people, places, organisations, products) mentioned in conversation and provides relevant entity context on retrieval.
from fennec_memory.memory import ConversationEntityMemory
mem = ConversationEntityMemory(
llm=my_llm, # used for entity extraction; falls back to stanza NLP
memory_key="entity_info",
lang="en", # stanza language code (used in fallback)
)
mem.save_context(
{"input": "I was talking to Dr. Smith at OpenAI."},
{"output": "That's interesting."},
)
entities = mem.list_entities() # ["Dr. Smith", "OpenAI"]
info = mem.get_entity_info("OpenAI") # {"name": "OpenAI", "contexts_count": 1, ...}Entity extraction uses the LLM if available, otherwise falls back to Stanza NLP for NER.
Best for: conversations involving multiple referenced people, organisations, or places.
BaseMemory Interface
All memory classes implement:
class BaseMemory(ABC):
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, Any]) -> None: ...
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]: ...
def clear(self) -> None: ...
def get_memory_stats(self) -> Dict[str, Any]: ...
# Async variants (via asyncio.to_thread)
async def asave_context(self, inputs, outputs): ...
async def aload_memory_variables(self, inputs): ...5.4 SemanticMemory
FAISS-backed (or NumPy-fallback) vector store for similarity search.
from fennec_memory.memory import SemanticMemory
sem = SemanticMemory(
model_name="all-MiniLM-L6-v2",
cache_size=1000,
tenant_id="acme",
)| Method | Signature | Description |
|---|---|---|
add |
(entry: MemoryEntry) -> None |
Embed and index one entry. Sets entry.embedding in place. |
add_batch |
(entries: List[MemoryEntry]) -> None |
Batch embed for performance. |
remove |
(entry_id: str) -> None |
Remove entry from index and store. |
search |
(query: str, k: int = 5, min_score: float = 0.0) -> List[Tuple[MemoryEntry, float]] |
Return top-K entries with cosine scores. Calls entry.access() on each result. |
search_entries |
(query: str, k: int = 5, min_score: float = 0.0) -> List[MemoryEntry] |
Returns entries only (no scores). |
clear |
() -> None |
Wipe the entire index and store. |
stats |
() -> Dict[str, Any] |
Returns backend (faiss or numpy), model name, entry count, tenant ID. |
results = sem.search("Python async patterns", k=5, min_score=0.3)
for entry, score in results:
print(f"Score: {score:.3f} Content: {str(entry.content)[:60]}")5.5 LongTermMemory (Storage)
SQLite-backed persistent store. One database file per tenant, located at {persistence_path}/{tenant_id}_ltm.db.
from fennec_memory.memory import LongTermMemory
ltm = LongTermMemory(
db_path="./memory_storage/acme_ltm.db",
tenant_id="acme",
)| Method | Signature | Description |
|---|---|---|
store |
(entry: MemoryEntry) -> None |
Upsert one entry (INSERT OR UPDATE on conflict). |
store_many |
(entries: List[MemoryEntry]) -> None |
Batch upsert for efficiency. |
get |
(entry_id: str) -> Optional[MemoryEntry] |
Fetch by ID; calls entry.access() and persists updated access stats. |
get_top_by_importance |
(limit: int = 20, min_importance: float = 0.0) -> List[MemoryEntry] |
Ordered by importance DESC. |
get_recent |
(limit: int = 20) -> List[MemoryEntry] |
Ordered by timestamp DESC. |
count |
() -> int |
Total entry count for this tenant. |
delete |
(entry_id: str) -> None |
Remove a single entry. |
delete_below_importance |
(threshold: float) -> int |
Bulk delete; returns row count. |
apply_global_decay |
(decay_rate: float = 0.1) -> int |
Apply time-based decay to all entries; returns updated count. |
clear |
() -> None |
Delete all entries for this tenant. |
stats |
() -> Dict[str, Any] |
Returns db_path, tenant_id, total_entries. |
The SQLite schema includes indices on (tenant_id), (tenant_id, importance DESC), and (tenant_id, timestamp DESC) for efficient queries. All writes are protected by a threading.Lock.
5.6 MemorySelector
Scores and ranks candidate memories for context injection.
from fennec_memory.memory import MemorySelector, SelectionConfig
config = SelectionConfig(
relevance_weight=0.50,
recency_weight=0.20,
importance_weight=0.30,
recency_half_life_hours=24.0,
min_composite_score=0.10,
top_k=10,
token_budget=3000,
)
selector = MemorySelector(config=config)Constraint:
relevance_weight + recency_weight + importance_weightmust equal 1.0 (±0.1%).
| Method | Signature | Description |
|---|---|---|
select |
(query, candidates, similarity_scores=None) -> List[ScoredMemory] |
Return scored and filtered entries. |
select_entries |
(query, candidates, similarity_scores=None) -> List[MemoryEntry] |
Same as select but returns unwrapped entries. |
explain |
(query, candidates, similarity_scores=None) -> List[Dict] |
Human-readable scoring breakdown (useful for debugging). |
explanation = selector.explain(
query="Python async patterns",
candidates=stm.get_all(),
)
for row in explanation:
print(f"id={row['id']} composite={row['composite']:.3f} "
f"relevance={row['relevance']:.3f} recency={row['recency']:.3f}")ScoredMemory fields: entry, relevance_score, recency_score, importance_score, composite_score.
5.7 MemoryCompressor
LLM-based noise removal, duplicate merging, and summarisation.
from fennec_memory.memory import MemoryCompressor
compressor = MemoryCompressor(
llm=my_llm,
merge_similarity_threshold=0.92,
tenant_id="acme",
)Note:
merge_duplicatesandcompress_batch(whenmerge=True) require entries to have a populatedembeddingfield. LLM-based summarisation requiresllmto be supplied in the constructor.
| Method | Signature | Description |
|---|---|---|
remove_noise |
(entries) -> Tuple[List[MemoryEntry], int] |
Drop entries with fewer than 10 chars, fewer than 3 words, or <30% alphanumeric content. Returns (cleaned, n_removed). |
merge_duplicates |
(entries) -> Tuple[List[MemoryEntry], int] |
Merge entry pairs with cosine similarity ≥ merge_similarity_threshold. Returns (merged_list, n_merges). |
summarise |
(entries, max_tokens=256, target_type=LONG_TERM) -> Optional[MemoryEntry] |
LLM-summarise a list into one compact entry. Falls back to concatenation if LLM is unavailable. |
compress_batch |
(entries, *, remove_noise=True, merge=True, summarise=False, summarise_threshold=20) -> Dict |
Full pipeline. Returns dict with keys entries, noise_removed, merges, summarised. |
result = compressor.compress_batch(
candidates,
remove_noise=True,
merge=True,
summarise=True,
summarise_threshold=5,
)
print(f"Reduced {len(candidates)} → {len(result['entries'])} entries")
print(f"Noise removed: {result['noise_removed']}, Merges: {result['merges']}")5.8 ForgettingMechanism
Biologically-inspired forgetting based on the Ebbinghaus curve.
from fennec_memory.memory import ForgettingMechanism, ForgettingConfig
forgetter = ForgettingMechanism(ForgettingConfig(
base_decay_rate=0.15, # importance lost per day at zero accesses
recency_boost_per_access=0.05, # each access multiplies effective half-life
max_recency_boost=2.0, # cap on recency boost multiplier
min_importance=0.05, # minimum floor before removal
deletion_threshold=0.05, # entries at or below this are "forgotten"
high_frequency_threshold=5, # ≥5 accesses → "well-rehearsed"
apply_decay_on_read=False, # if True, decay runs on every load
))| Method | Signature | Description |
|---|---|---|
apply |
(entries: List[MemoryEntry]) -> Tuple[List, List] |
Returns (alive, forgotten). Mutates importance and decay_factor in place. |
apply_to_single |
(entry: MemoryEntry) -> bool |
Decays one entry; returns True if it should be kept. |
score_retention |
(entry: MemoryEntry) -> float |
Non-mutating retention score 0–1. Use for previewing at-risk entries. |
report |
(entries: List[MemoryEntry]) -> Dict |
Diagnostic dict: total, at_risk_count, at_risk_ids. |
alive, forgotten = forgetter.apply(entries)
print(f"{len(alive)} entries retained, {len(forgotten)} forgotten")
# Preview without mutating
report = forgetter.report(entries)
print(f"At risk: {report['at_risk_count']} of {report['total']}")Known behaviour: When
entriesis an empty list,apply()returns the string"entries is empty"instead of a tuple. Always guard withif entries:before calling. See Section 10 for the recommended pattern.
5.9 ContextBuilder
Assembles a structured, token-budgeted context string from multiple sources.
from fennec_memory.memory import ContextBuilder, Document
builder = ContextBuilder(
total_token_budget=4000,
profile_budget_fraction=0.05,
memory_budget_fraction=0.30,
document_budget_fraction=0.45,
history_budget_fraction=0.20,
)build
def build(
self,
*,
documents: Optional[List[Document]] = None,
memories: Optional[List[MemoryEntry]] = None,
history: Optional[List[Dict[str, str]]] = None,
user_profile_text: Optional[str] = None,
query: Optional[str] = None,
) -> BuiltContextReturns a BuiltContext. Sections are added in priority order: user_profile → memories → documents → history. Each section is individually token-capped and marked truncated=True if content was cut.
build_prompt
def build_prompt(
self,
query: str,
*,
documents=None,
memories=None,
history=None,
user_profile_text=None,
system_instruction="You are a helpful, knowledgeable assistant.",
) -> strReturns a fully assembled prompt string including the system instruction, context, and the current query.
Document
@dataclass
class Document:
page_content: str
metadata: Dict[str, Any] = field(default_factory=dict)
doc_id: Optional[str] = None5.10 UserProfileManager
Manages user profiles across all tenants, with optional JSON persistence.
from fennec_memory.memory import UserProfileManager
manager = UserProfileManager(
persist_dir="./memory_storage/profiles",
auto_save=True,
)| Method | Signature | Description |
|---|---|---|
get_or_create |
(user_id: str) -> UserProfile |
Load from disk or create a new default profile. |
update_from_interaction |
(user_id, query, topics=None, importance=0.5) -> UserProfile |
Record one interaction; updates topic frequency and importance average. |
set_preference |
(user_id, key, value) -> None |
Set an explicit user preference. |
delete |
(user_id: str) -> bool |
Remove from memory and disk. |
list_users |
() -> List[str] |
All known user IDs (in-memory + on-disk). |
aggregate_stats |
() -> Dict |
Cross-user stats: user count, total interactions, top global topics. |
UserProfile key properties and methods:
| Member | Type | Description |
|---|---|---|
user_id |
str |
— |
preferred_language |
str |
Default "en". |
verbosity |
str |
"concise", "medium", or "detailed". |
topics_of_interest |
List[str] |
Top-10 inferred from topic frequency. |
memory_retention_boost |
float |
0–0.5. Added to importance on every save_interaction. |
get_preference(key, default=None) |
Any |
Retrieve a custom preference value. |
set_preference(key, value) |
None |
Set a custom preference on the profile object. |
to_context_string() |
str |
Compact profile text for injection into prompts. |
start_session() |
None |
Record the start of a new session (increments session counter). |
update_retention_boost(boost: float) |
None |
Update memory_retention_boost; clamped to [0, 0.5]. |
profile = manager.get_or_create("alice")
manager.set_preference("alice", "verbosity", "detailed")
manager.set_preference("alice", "preferred_language", "en")
# Inject into prompt
profile_text = profile.to_context_string()6. Configuration System
Full MemoryConfig with All Defaults
from fennec_memory.memory import MemoryConfig
config = MemoryConfig(
# Memory layer capacities
max_short_term=1000,
max_long_term=1000,
max_working=1000,
max_episodic=1000,
max_semantic=1000,
max_procedral=1000,
# Retrieval
importance_threshold=0.7, # LTM promotion cutoff
similarity_threshold=0.85, # duplicate detection
retrieval_limit=10, # default top_k
# Embeddings
embedding_model="all-MiniLM-L6-v2",
embedding_batch_size=32,
embedding_cache_size=1000,
# Text processing
normalize_text=False,
preserve_case=True,
remove_duplicates=True,
# Persistence
enable_persistence=True,
persistence_path="./memory_storage",
auto_save_interval=300,
# LangChain-style keys
return_messages=False,
input_key="input",
output_key="output",
memory_key="history",
max_token_limit=2000,
window_size=5,
# Decay
enable_decay=True,
decay_rate=0.1, # 10% per day at zero accesses
min_importance=0.1,
# Consolidation
enable_consolidation=True,
consolidation_interval=3600,
# LLM token budget
max_tokens=2000,
# Logging
log_level="INFO",
enable_stats=True,
)SelectionConfig
Controls the MemorySelector scoring weights:
from fennec_memory.memory import SelectionConfig
selection_config = SelectionConfig(
relevance_weight=0.50, # cosine similarity fraction
recency_weight=0.20, # recency exponential decay fraction
importance_weight=0.30, # effective_importance fraction
recency_half_life_hours=24.0, # score halves every 24 hours
min_composite_score=0.10, # minimum score to be included
top_k=10, # maximum entries to select
token_budget=3000, # maximum tokens across all selected entries
)ForgettingConfig
from fennec_memory.memory import ForgettingConfig
forgetting_config = ForgettingConfig(
base_decay_rate=0.15,
recency_boost_per_access=0.05,
max_recency_boost=2.0,
min_importance=0.05,
deletion_threshold=0.05,
high_frequency_threshold=5,
apply_decay_on_read=False,
)Environment Variable Reference
export MEMORY_MAX_SHORT_TERM=500
export MEMORY_MAX_LONG_TERM=5000
export MEMORY_EMBEDDING_MODEL="all-MiniLM-L6-v2"
export MEMORY_PERSISTENCE_PATH="/var/data/memory"
export MEMORY_ENABLE_PERSISTENCE="true"Load with:
config = MemoryConfig.from_env()7. Security Model
7.1 PII Detection and Masking
SensitiveDataMasker applies regex patterns to redact sensitive data before any storage operation.
Built-in patterns:
| Label | What It Matches |
|---|---|
EMAIL |
Standard email addresses |
CREDIT_CARD |
13–16 digit card numbers |
SSN |
US Social Security Numbers (###-##-####) |
PHONE_INTL |
International phone numbers |
IBAN |
International Bank Account Numbers |
IP_V4 |
IPv4 addresses |
URL_WITH_AUTH |
URLs containing credentials (user:pass@host) |
JWT |
JSON Web Tokens (eyJ...) |
API_KEY |
Keys starting with sk-, pk-, rk-, ak-, token- |
PASSWORD_KV |
password=, passwd:, pwd= key-value pairs |
from fennec_memory.memory import SensitiveDataMasker
masker = SensitiveDataMasker(
custom_patterns=[
("INTERNAL_ID", r"INT-\d{6}"), # add your own
],
placeholder_fmt="[{label}]",
)
cleaned, report = masker.mask("Contact me at ceo@company.com, card: 4111111111111111")
# cleaned → "Contact me at [EMAIL], card: [CREDIT_CARD]"
# report → {"EMAIL": 1, "CREDIT_CARD": 1}
# Check for sensitive data without masking
has_pii = masker.has_sensitive_data("Contact me at ceo@company.com")
# → True
# Recursively mask nested dicts or lists
cleaned_content, report = masker.mask_entry_content({"input": "my SSN is 123-45-6789"})When enable_privacy_masking=True on AIMemoryManager, both user_input and assistant_output are masked before any storage or encryption step.
7.2 Encryption
MemoryEncryptor uses Fernet symmetric encryption (AES-128-CBC + HMAC-SHA256). The user-supplied secret key is hashed with SHA-256 to derive a 32-byte Fernet key.
In practice, you do not interact with MemoryEncryptor directly — pass encryption_key to the AIMemoryManager constructor and encryption is applied automatically on every save_interaction call.
from fennec_memory.memory import MemoryEncryptor
enc = MemoryEncryptor(secret_key="my-production-secret")
token = enc.encrypt({"input": "sensitive data", "output": "sensitive answer"})
data = enc.decrypt(token)
print(enc.is_real_encryption) # True if cryptography package is installed
print(enc.key_fingerprint) # SHA-256 fingerprint of the key (first 16 hex chars)If the cryptography package is not installed, the encryptor falls back to base64 encoding (not secure) and logs a warning. In this fallback mode, is_real_encryption is False.
Warning: If the
encryption_keyis lost, all encrypted entries are permanently unrecoverable.
7.3 Role-Based Access Control
AccessController implements per-tenant RBAC with built-in and custom roles.
from fennec_memory.memory import AccessController, Permission
ac = AccessController()
# Built-in role assignment
ac.assign_role("alice", "acme", "owner")
ac.assign_role("bob", "acme", "writer")
ac.assign_role("guest", "acme", "reader")
# Custom role
ac.define_role("analyst", {Permission.READ})
ac.assign_role("carol", "acme", "analyst")
# Check and enforce
if ac.has_permission("bob", "acme", Permission.WRITE):
# proceed
pass
ac.require("guest", "acme", Permission.DELETE) # raises PermissionError
# Inspect
perms = ac.list_permissions("alice", "acme") # {Permission.READ, Permission.WRITE, Permission.DELETE}
# Revoke
ac.revoke_role("bob", "acme", "writer")7.4 Tenant Isolation
Memory namespacing is enforced at every layer:
- ShortTermMemory tags each entry with
tenant_idin its metadata. - LongTermMemory includes
tenant_idin all SQL WHERE clauses and indices. - SemanticMemory prefixes every vector index ID as
"{tenant_id}::{entry_id}". - UserProfileManager is tenant-agnostic but user-specific; user data is never cross-contaminated between tenants.
There is no "global" query. You always specify tenant_id explicitly.
8. Storage Backends
ShortTermMemory (in-process deque)
When to use: Always. Present in every deployment as the primary write target.
from fennec_memory.memory import ShortTermMemory
stm = ShortTermMemory(window_size=20, tenant_id="acme")Backed by collections.deque(maxlen=window_size). Oldest entries are automatically evicted when full (FIFO). Supports:
| Method | Signature | Description |
|---|---|---|
add |
(content, *, importance=0.5, tags=None, metadata=None) -> MemoryEntry |
Add one entry; returns the created MemoryEntry. |
get_all |
() -> List[MemoryEntry] |
All current entries. |
get_recent |
(n: int) -> List[MemoryEntry] |
Last N entries. |
apply_decay |
(decay_rate: float = 0.1) -> None |
Apply one decay cycle to all entries in place. |
evict_below |
(min_importance: float) -> List[MemoryEntry] |
Remove low-importance entries; returns evicted list. |
clear |
() -> None |
Empty the window. |
No persistence. Lost on process restart. Use LTM for durability.
WorkingMemory (in-process dict)
When to use: Always. Manages the active-context entries for the current reasoning turn.
from fennec_memory.memory import WorkingMemory
wm = WorkingMemory(capacity=10, token_budget=3000, tenant_id="acme")| Method | Signature | Description |
|---|---|---|
load |
(entries: List[MemoryEntry]) -> None |
Replace current contents with new entries, sorted by importance, respecting token_budget. |
add |
(content, *, importance=0.6, tags=None, metadata=None) -> MemoryEntry |
Add one entry; evicts the lowest-importance entry if capacity is exceeded. |
remove |
(entry_id: str) -> Optional[MemoryEntry] |
Remove and return an entry by ID. |
get_all |
() -> List[MemoryEntry] |
All entries, sorted descending by effective_importance. |
get_as_text |
() -> str |
Serialise all entries to a newline-delimited text string, ready for prompt injection. |
clear |
() -> None |
Empty the working memory. |
LongTermMemory (SQLite)
When to use: Any deployment that needs memory to survive process restarts or scale across sessions. Default persistence layer.
One .db file per tenant, stored at {persistence_path}/{tenant_id}_ltm.db. Thread-safe via threading.Lock. Supports full CRUD, bulk inserts, importance filtering, timestamp ordering, and global decay.
Suitable for single-instance deployments, development, and moderate production workloads.
SemanticMemory (FAISS / NumPy)
When to use: Always. Provides the semantic search capability that distinguishes this system from simple history-based retrieval.
FAISS IndexFlatIP is the default when faiss-cpu (or faiss-gpu) is installed. Automatically falls back to a pure-NumPy brute-force O(n) cosine search when FAISS is absent.
FAISS is recommended for production with more than ~1,000 entries per tenant. NumPy fallback is suitable for testing and small datasets.
Backend Comparison
| Aspect | ShortTermMemory | LongTermMemory | SemanticMemory |
|---|---|---|---|
| Persistence | None (in-memory) | SQLite on disk | In-memory (rebuilt on restart) |
| Capacity | window_size entries |
Unbounded | Unbounded |
| Query type | Recency (FIFO) | Importance / timestamp | Semantic similarity |
| Concurrency | Single-threaded | Thread-safe (Lock) |
Single-threaded |
| Startup cost | None | Schema init | Model load on first embed |
| Recommended use | Recent context | Long-lived facts | Semantic retrieval |
For production Redis-backed LTM, implement the LongTermMemory interface (same store, get, get_top_by_importance contract) and swap the instance in the _TenantBundle.
9. Observability & Metrics
Per-Tenant Stats
s = mgr.stats("acme")Returns:
{
"tenant_id": "acme",
"stm": {
"type": "ShortTermMemory",
"window_size": 20,
"current_size": 14,
"is_full": false,
"tenant_id": "acme"
},
"working": {
"type": "WorkingMemory",
"capacity": 10,
"token_budget": 4000,
"entries": 6,
"estimated_tokens_used": 850,
"tenant_id": "acme"
},
"ltm": {
"type": "LongTermMemory",
"db_path": "./memory_storage/acme_ltm.db",
"tenant_id": "acme",
"total_entries": 342
},
"semantic": {
"type": "SemanticMemory",
"backend": "faiss",
"model": "all-MiniLM-L6-v2",
"entries": 356,
"tenant_id": "acme"
},
"profile_manager": {
"user_count": 3,
"total_interactions": 1204,
"avg_interactions_per_user": 401.3,
"top_global_topics": [["python", 340], ["AI", 210]]
},
"encryption_active": true,
"privacy_masking_active": true
}Maintenance Reports
report = mgr.run_maintenance("acme")
# {
# "tenant_id": "acme",
# "timestamp": 1714000000.0,
# "stm_evicted": 2,
# "ltm_decay_updated": 120,
# "ltm_deleted": 5,
# "working_forgotten": 1,
# "ltm_compression": {"merges": 3, "noise_removed": 7}
# }Forgetting Diagnostics
Use ForgettingMechanism.report for a non-mutating at-risk preview before committing a maintenance cycle:
report = mgr._forgetter.report(bundle.stm.get_all())
print(f"At-risk: {report['at_risk_count']} / {report['total']}")
print(f"At-risk IDs: {report['at_risk_ids']}")Selector Scoring Breakdown
Use MemorySelector.explain to understand why specific memories were or were not selected:
explanation = mgr._selector.explain(
query="Python caching strategies",
candidates=bundle.stm.get_all() + bundle.ltm.get_top_by_importance(20),
)
for row in explanation:
print(f"{row['id'][:10]} composite={row['composite']:.3f} "
f"rel={row['relevance']:.3f} rec={row['recency']:.3f} "
f"imp={row['importance']:.3f} | {row['content'][:60]}")User Profile Analytics
mgr._profile_manager.aggregate_stats()
# {
# "user_count": 15,
# "total_interactions": 5230,
# "avg_interactions_per_user": 348.7,
# "top_global_topics": [["python", 1200], ...]
# }10. Edge Cases & Failure Handling
Embedding Service Failure
If sentence-transformers is not installed or the model files are unavailable, _EmbeddingModel falls back to deterministic stub embeddings: hash-seeded random 384-dimensional vectors. These are consistent across calls for the same text (same hash → same seed → same vector), so similarity search still functions — but results will not be semantically meaningful.
Log warning: "sentence-transformers unavailable — using deterministic stub embeddings."
Mitigation: Pre-install the model locally and set TRANSFORMERS_OFFLINE=1 to prevent accidental remote downloads in production.
FAISS Unavailable
If faiss is not installed, SemanticMemory silently switches to _NumpyIndex (brute-force O(n) cosine search). This is transparent to callers but significantly slower at scale.
Log warning: "FAISS not installed — using pure-numpy fallback index."
Mitigation: Install faiss-cpu in production. The NumPy fallback is acceptable for fewer than ~1,000 entries.
SQLite / LTM Failure
LongTermMemory wraps all operations in a context manager that calls conn.rollback() on exception and re-raises. If the SQLite file is corrupted or the path is inaccessible:
store()andget()will raise an exception.- The
AIMemoryManagerthread pool submits LTM writes asynchronously; failures are logged but do not crash the main thread.
Mitigation: Ensure persistence_path is writable before instantiation. Back up the .db files regularly.
Quota / Capacity Exceeded
ShortTermMemory uses a deque(maxlen=window_size) — the oldest entry is automatically evicted when full. No exception is raised.
WorkingMemory evicts the lowest-effective_importance entry when capacity is exceeded. Entries that would exceed token_budget are silently skipped during load().
LongTermMemory has no hard cap. Use run_maintenance() and delete_below_importance() to manage growth.
Corrupted or Missing Entries
LongTermMemory._row_to_entry uses json.loads on stored content; a corrupted JSON string will raise json.JSONDecodeError. Individual corrupted rows will surface as exceptions from get() or get_top_by_importance(). The calling code in build_context does not catch these — add a try/except wrapper around build_context if corruption is a concern.
Missing Tenant
AIMemoryManager._bundle(tenant_id) creates a new _TenantBundle on first access for any unknown tenant_id. There is no explicit register_tenant step required. The bundle contains fresh, empty memory layers.
Async/Sync Mismatch
asave_interaction, abuild_context, and arun_maintenance use asyncio.get_event_loop().run_in_executor internally. They must be awaited from within a running asyncio event loop. Do not call them from synchronous code without asyncio.run(...):
# Correct in async context
entry = await mgr.asave_interaction(...)
# Correct in sync context
entry = asyncio.run(mgr.asave_interaction(...))
# Wrong — returns a coroutine object, not the result
entry = mgr.asave_interaction(...) # ← missing awaitForgettingMechanism.apply on Empty List
When entries is empty, apply() returns the string "entries is empty" instead of a tuple. This is a known implementation inconsistency. Always guard:
if entries:
alive, forgotten = forgetter.apply(entries)
else:
alive, forgotten = [], []LLM Unavailable for Compression/Summarisation
If llm=None or the LLM call raises an exception:
MemoryCompressor.summarise()falls back to simple string concatenation (truncated to 500 chars).MemoryCompressor._merge_pair()falls back to concatenation with a|separator.ConversationSummaryMemory._summarize()logs the error and retains only the last 5 messages.
The system never hard-fails due to an absent LLM.
Encryption: cryptography Package Missing
MemoryEncryptor degrades to base64 encoding with a logged warning. is_real_encryption returns False. Never use the base64 fallback for sensitive data in production — it provides no security.
11. Advanced Usage
Multi-Tenant Setup
from memory import AIMemoryManager, MemoryConfig
config = MemoryConfig(
persistence_path="/var/data/memory",
importance_threshold=0.6,
max_tokens=4000,
enable_persistence=True,
)
mgr = AIMemoryManager(
config=config,
llm=my_llm,
enable_privacy_masking=True,
encryption_key=os.environ["MEMORY_ENCRYPTION_KEY"],
)
# Tenant A — full team roles
mgr.grant("admin_user", "tenant_a", role="admin")
mgr.grant("agent_1", "tenant_a", role="writer")
mgr.grant("readonly", "tenant_a", role="reader")
# Tenant B — separate namespace, no data overlap
mgr.grant("agent_2", "tenant_b", role="owner")
# Tenant A and B memories are fully isolated
mgr.save_interaction("agent_1", "tenant_a", "secret A", "answer A", importance=0.9)
mgr.save_interaction("agent_2", "tenant_b", "secret B", "answer B", importance=0.9)
ctx_a = mgr.build_context("agent_1", "tenant_a", "secret B")
# ctx_a.full_text will NOT contain "secret B" — tenant_b data is invisibleRL Feedback Loop (Importance Signals)
The system does not include an explicit RL module, but the importance parameter functions as the feedback signal. Close the loop by updating importance based on downstream signals such as user upvotes, follow-up questions, or task success:
# High-quality responses: store with elevated importance
mgr.save_interaction(..., importance=0.9)
# Routine exchanges: store with lower importance
mgr.save_interaction(..., importance=0.4)
# Profile-based boost: users whose memories are consistently useful
# get a `memory_retention_boost` applied automatically
mgr._profile_manager.get_or_create("alice").update_retention_boost(0.15)Combining with a RAG Document Retriever
from memory import Document
def answer_with_memory_and_rag(mgr, retriever, user_id, tenant_id, query):
# 1. Retrieve documents from your vector / BM25 store
raw_docs = retriever.search(query, k=5)
documents = [
Document(content=d.text, source=d.url, score=d.score)
for d in raw_docs
]
# 2. Build memory-enriched context
ctx = mgr.build_context(
user_id=user_id,
tenant_id=tenant_id,
query=query,
documents=documents,
top_k_semantic=5,
)
# 3. Generate response
response = my_llm.generate(ctx.full_text + f"\n\nUser: {query}\nAssistant:")
# 4. Persist the interaction
mgr.save_interaction(user_id, tenant_id, query, response, importance=0.7)
return responseCustom User Preferences
mgr._profile_manager.set_preference("alice", "verbosity", "concise")
mgr._profile_manager.set_preference("alice", "preferred_language", "fr")
mgr._profile_manager.set_preference("alice", "domain", "finance")
profile = mgr._profile_manager.get_or_create("alice")
print(profile.to_context_string())
# User language: fr
# Response style: concise / neutral
# Interests: finance, python, ...
# domain: financeProduction Deployment Notes
Memory isolation: Run one AIMemoryManager instance per application process. The internal ThreadPoolExecutor is not shared across processes. For multi-process deployments, consider a shared SQLite file (acceptable for moderate load) or implement a Redis-backed LongTermMemory.
Embedding model cold start: The first save_interaction call triggers model loading (1–3 seconds). Pre-warm by calling mgr._bundle("default").semantic._embedder._load() at startup.
Maintenance scheduling: Run run_maintenance(tenant_id) on a background schedule (e.g., APScheduler or a cron job). Recommended interval: once per hour per active tenant.
Disk growth: LongTermMemory has no automatic size cap. Monitor ltm.count() and configure maintenance to delete entries below a target min_importance. For long-running production systems, set enable_consolidation=True and tune decay_rate to match your data retention requirements.
Thread safety: LongTermMemory is thread-safe via threading.Lock. ShortTermMemory and WorkingMemory are not thread-safe — use one AIMemoryManager per request thread or protect shared instances with an external lock.
Encryption key rotation: There is no built-in key rotation. To rotate keys: decrypt all existing entries with the old key, re-encrypt with the new key, and update encryption_key in the constructor.
Async FastAPI Integration Example
from fastapi import FastAPI
from contextlib import asynccontextmanager
from fennec_memory.memory import AIMemoryManager, MemoryConfig
memory_manager: AIMemoryManager = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global memory_manager
config = MemoryConfig(persistence_path="/var/data/memory", max_tokens=4000)
memory_manager = AIMemoryManager(config=config, llm=my_llm)
yield
# Cleanup if needed
app = FastAPI(lifespan=lifespan)
@app.post("/chat")
async def chat(user_id: str, tenant_id: str, message: str):
ctx = await memory_manager.abuild_context(user_id, tenant_id, message)
response = await my_async_llm.generate(ctx.full_text + "\n\n" + message)
await memory_manager.asave_interaction(
user_id, tenant_id, message, response, importance=0.7
)
return {"response": response}Example With Rag
from fennec_community.llm import GeminiInterface
from fennec_community.document_loaders import TextLoader
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem
from fennec_memory.memory import (
ConversationBufferMemory,
ConversationBufferWindowMemory,
ConversationSummaryMemory,
ConversationEntityMemory,
)
loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = GeminiInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
rag_system.add_documents(loader_1)
rag = rag_system
def conversational_rag_query(
query: str,
memory,
memory_key: str = "history",
) -> str:
mem_vars = memory.load_memory_variables({"input": query})
history = mem_vars.get(memory_key, "")
retrieved = rag.retrieve(query, top_k=2)
context = rag.context_manager.build(query, retrieved)
if history:
full_prompt = (
f"سياق المحادثة السابقة:\n{str(history)[:400]}\n\n"
f"معلومات مسترجعة:\n{context}\n\n"
f"السؤال الحالي: {query}\nالإجابة:"
)
else:
full_prompt = f"معلومات مسترجعة:\n{context}\n\nالسؤال: {query}\nالإجابة:"
answer = rag.llm.generate(full_prompt)
memory.save_context({"input": query}, {"output": answer})
return answer
print("\n [5a] ConversationBufferMemory ")
buffer_mem = ConversationBufferMemory(
return_messages=False, input_key="input", output_key="output", memory_key="history"
)
conversation = [
"ما هي طرق الدفع المتاحه ",
"ما هي اعدادهم",
]
print("\n 💬 conversion with RAG (Buffer Memory):")
for turn, q in enumerate(conversation, 1):
answer = conversational_rag_query(q, buffer_mem)
print(f" [{turn}] 👤 {q}")
print(f" 🤖 {answer[:80]}...")
print()
print(f" 📝 Buffer memory: {len(buffer_mem.chat_memory)} saved message")
# ── 5b: Window Memory ──────────────── #
print("\n [5b] ConversationBufferWindowMemory — نافذة K=2")
window_mem = ConversationBufferWindowMemory(k=2)
long_conversation = [
"ما هو طرق التواصل مع فريق الدعم ",
"اعطني مثال عليها",
]
for q in long_conversation:
conversational_rag_query(q, window_mem)
print(f" 📝 Window memory (k=2): {len(window_mem.chat_memory)} رسائل محفوظة (آخر 2)")
if window_mem.chat_memory:
print(f" last quesion: {window_mem.chat_memory[-1].get('input', '')}")memory/memory_module_docs.md