Fennec Memory memory/memory_module_docs.md

Fennec Memory Module

Production-Grade Intelligent Memory System for LLM Applications

Overview
System Architecture
Core Concepts
Quick Start Guide
Public API Reference
Configuration System
Security Model
Storage Backends
Observability & Metrics
Edge Cases & Failure Handling
Advanced Usage

1. Overview

The Fennec Memory Module is a multi-layer intelligent memory system designed to give LLM applications and AI agents persistent, semantically searchable, and privacy-aware memory across conversations and sessions.

It mirrors human cognitive memory architecture — short-term recall, active working memory, and long-term persistence — and ties them together with vector similarity search, biologically-inspired forgetting, LLM-assisted compression, and per-tenant isolation.

Why It Exists

Standard LLM applications are stateless. Each request starts from zero, forcing developers to manually manage conversation history, hit context limits, and lose important context between sessions. The Fennec Memory Module solves this by:

Automatically promoting important interactions to long-term persistent storage
Retrieving the most relevant memories for any query using semantic search
Keeping context within token budgets via intelligent selection and compression
Isolating memory per tenant in multi-user systems
Scrubbing PII before storage with an optional encryption layer

Real-World Use Cases

Use Case	How the Module Helps
Multi-session chatbots	Remembers past preferences, decisions, and context across sessions
AI agents & planners	Working memory holds the active reasoning context for the current turn
RAG pipelines	Combines retrieved documents with relevant personal memories in a single context
Multi-tenant SaaS	Complete tenant isolation; each organization's memory is fully private
Personalized assistants	User profiles adapt over time to improve retrieval and response quality
Long conversation summarization	LLM-based compressor condenses hundreds of turns into compact summaries

2. System Architecture

Pipeline Overview

┌──────────────────────────────────────────────────────────────────┐
│                          WRITE PATH                              │
│                                                                  │
│  user_input + assistant_output                                   │
│        │                                                         │
│        ▼                                                         │
│  [SensitiveDataMasker]  ← optional PII redaction                │
│        │                                                         │
│        ▼                                                         │
│  [MemoryEncryptor]      ← optional Fernet encryption            │
│        │                                                         │
│        ├──────────────────────────────────────────────────────┐  │
│        ▼                                                      │  │
│  [ShortTermMemory]      ← in-memory sliding window            │  │
│        │                                                      │  │
│        ├─── background thread ──► [SemanticMemory]            │  │
│        │                          FAISS / NumPy vector index  │  │
│        │                                                      │  │
│        └─── if importance ≥ threshold ──► [LongTermMemory]    │  │
│                                           SQLite persistence  │  │
│                                                               │  │
│  [UserProfileManager]   ← updated with topics & importance   ◄─┘ │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│                          READ PATH                               │
│                                                                  │
│  query string                                                    │
│        │                                                         │
│        ├──► [SemanticMemory.search]     cosine similarity top-K │
│        ├──► [LongTermMemory.get_top_*]  importance-ranked       │
│        └──► [ShortTermMemory.get_all]   recent window           │
│                                                                  │
│  ───── merge + deduplicate ─────                                 │
│        │                                                         │
│        ▼                                                         │
│  [MemorySelector]   ← composite score: relevance × recency × importance
│        │                                                         │
│        ▼                                                         │
│  [WorkingMemory]    ← token-budgeted active context             │
│        │                                                         │
│        ▼                                                         │
│  [ContextBuilder]   ← assembles: profile + memories + docs + history
│        │                                                         │
│        ▼                                                         │
│  BuiltContext.full_text  → inject into LLM prompt               │
└──────────────────────────────────────────────────────────────────┘

Component Responsibilities

Component	Responsibility
`ShortTermMemory`	Sliding deque of the most recent N interactions. Fast, in-memory, FIFO eviction.
`WorkingMemory`	Active-context store for the current reasoning turn. Rebuilt each call by `MemorySelector`. Enforces a token budget.
`LongTermMemory`	SQLite-backed persistent store. Promotes high-importance entries across sessions. Supports global decay.
`SemanticMemory`	FAISS (or NumPy fallback) vector index. Embeds content with `sentence-transformers` and retrieves by cosine similarity.
`MemorySelector`	Scores every candidate memory on three axes (relevance, recency, importance) and returns the top-K within a token budget.
`MemoryCompressor`	Noise removal, near-duplicate merging (cosine threshold), and LLM-based summarisation. Runs during maintenance.
`ForgettingMechanism`	Applies Ebbinghaus-inspired exponential decay modulated by access frequency. Marks low-importance entries for deletion.
`ContextBuilder`	Assembles the final context string from profile, memories, RAG documents, and conversation history. Enforces per-section token budgets.
`UserProfileManager`	Tracks per-user preferences, topic frequency, and a `memory_retention_boost` factor. Persists profiles as JSON.
`SensitiveDataMasker`	Regex-based PII redaction (email, credit card, SSN, phone, IBAN, IP, JWT, API keys, passwords).
`MemoryEncryptor`	Fernet (AES-128-CBC + HMAC-SHA256) symmetric encryption. Degrades to base64 if `cryptography` is absent.
`AccessController`	RBAC system with built-in `reader`, `writer`, `owner`, `admin` roles per `(user_id, tenant_id)` pair.

Design Philosophy

Layered degradation. Every component falls back gracefully: FAISS → NumPy, real Fernet → base64, sentence-transformers → deterministic stub embeddings. The system never hard-fails at import time.
Async-friendly. All core methods have async wrappers (asave_interaction, abuild_context, arun_maintenance) backed by a thread pool, so they integrate cleanly into asyncio applications.
Tenant-first. Memory is namespaced at every layer. Cross-tenant data leakage is architecturally prevented.
Importance-driven. A single importance float (0–1) controls promotion to LTM, ordering in context, and eviction priority. It decays over time and is boosted by access.

3. Core Concepts

3.1 Memory Types (`MemoryType`)

from memory.core import MemoryType

Value	String Aliases	Description
`MemoryType.SHORT_TERM`	`"short_term"`, `"short"`, `"st"`	Temporary; limited to `window_size` recent entries.
`MemoryType.LONG_TERM`	`"long_term"`, `"long"`, `"lt"`	Persistent across sessions via SQLite.
`MemoryType.WORKING`	`"working"`, `"work"`, `"wm"`	Active-context; rebuilt each turn.
`MemoryType.EPISODIC`	`"episodic"`, `"episode"`, `"ep"`	Event-bound memories with temporal context.
`MemoryType.SEMANTIC`	`"semantic"`, `"sem"`	General facts without temporal binding.
`MemoryType.PROCEDURAL`	`"procedural"`, `"proc"`	Skills and repeatable procedures.

3.2 MemoryEntry

Every piece of stored information is a MemoryEntry dataclass.

@dataclass
class MemoryEntry:
    content: Any                    # The stored data (str, dict, or any serialisable type)
    timestamp: float                # Unix creation time
    memory_type: MemoryType
    importance: float = 0.5         # 0.0–1.0; controls promotion, ordering, eviction
    access_count: int = 0           # Incremented on every read
    tags: List[str] = []
    metadata: Dict[str, Any] = {}
    embedding: Optional[List[float]] = None  # Set by SemanticMemory
    last_access: float = time.time()
    decay_factor: float = 1.0       # Multiplied into importance on decay cycles
    original_importance: Optional[float] = None  # Preserved from creation for audit/reset

Key computed properties:

Property	Type	Description
`entry.id`	`str`	MD5-based 16-char hex ID derived from content + timestamp + type.
`entry.effective_importance`	`float`	`importance × decay_factor`. Used for all ranking decisions.
`entry.priority`	`MemoryPriority`	`MemoryPriority` enum derived from `effective_importance` (see table below).
`entry.age_seconds`	`float`	Floating-point age since creation in seconds.
`entry.age_days`	`float`	Floating-point age since creation in days.
`entry.time_since_access_seconds`	`float`	Elapsed seconds since the last `entry.access()` call.
`entry.has_embedding`	`bool`	`True` if a vector embedding has been computed.

MemoryPriority Thresholds

`effective_importance`	`MemoryPriority`
≥ 0.9	`CRITICAL`
≥ 0.7	`HIGH`
≥ 0.5	`MEDIUM`
≥ 0.3	`LOW`
< 0.3	`MINIMAL`

Key instance methods:

entry.access()                          # Increments access_count, updates last_access,
                                        # and applies a small importance boost ∝ (1 − importance).
entry.apply_decay(decay_rate: float = 0.1)   # Manually apply one decay cycle in place.
entry.add_tag(tag: str)                 # Append a tag if not already present.
entry.has_tag(tag: str) -> bool         # Check for tag membership.
entry.to_dict(include_embedding: bool = False) -> dict  # Serialise to plain dict.
entry.to_json() -> str                  # Serialise to JSON string.

3.3 Importance & Decay

Each MemoryEntry starts with a user-supplied importance value (0–1). Over time:

ForgettingMechanism computes an effective decay rate: base_decay_rate / (1 + recency_boost_per_access × access_count). More accesses → slower decay.
It applies exponential decay: I(t) = I₀ × exp(−λ × hours_since_last_access).
Entries whose effective_importance falls below deletion_threshold are marked for removal.

This mirrors the Ebbinghaus forgetting curve modulated by spaced repetition.

3.4 Semantic Search

SemanticMemory uses sentence-transformers (default model: all-MiniLM-L6-v2) to convert text to L2-normalised float32 vectors. All vectors are stored in a FAISS IndexFlatIP (inner-product index, equivalent to cosine similarity on normalised vectors). When FAISS is unavailable, a pure-NumPy brute-force fallback is used transparently.

Retrieval: SemanticMemory.search(query, k) embeds the query and returns the top-K entries with their cosine similarity scores.

3.5 Multi-Tenancy

Every memory layer accepts a tenant_id. Entries are namespaced so a query for tenant_id="acme" can never surface entries belonging to tenant_id="startup". The AIMemoryManager maintains a _TenantBundle per tenant containing independent STM, WorkingMemory, LTM, and SemanticMemory instances.

3.6 Memory Selection (Composite Scoring)

MemorySelector scores each candidate on three axes and combines them linearly:

composite = w_relevance × cosine_sim
          + w_recency   × exp(−λ × hours_old)
          + w_importance × entry.effective_importance

Default weights: relevance=0.50, recency=0.20, importance=0.30. All weights must sum to 1.0. Only entries above min_composite_score (default 0.10) are selected, and the set is capped by top_k and a token budget.

3.7 Context Assembly

ContextBuilder assembles a structured context string from four prioritised sections:

Section	Default Token Fraction	Content
`user_profile`	5%	Compact user preferences and interests
`memories`	30%	Selected `MemoryEntry` objects formatted as text
`documents`	45%	RAG-retrieved `Document` chunks
`history`	20%	Recent conversation turns from STM

Each section is individually token-budgeted and truncated (never dropped) if it exceeds its allocation.

4. Quick Start Guide

Installation Dependencies

pip install fennec-memory
pip install sentence-transformers  # embeddings
pip install faiss-cpu              # fast vector search (optional, falls back to NumPy)
pip install cryptography           # Fernet encryption (optional)

Minimal Working Example

from fennec_memory.memory import AIMemoryManager, MemoryConfig

# 1. Configure
config = MemoryConfig(
    persistence_path="./my_memory",
    importance_threshold=0.6,
    max_tokens=4000,
)

# 2. Initialise
mgr = AIMemoryManager(config=config)

# 3. Save an interaction
mgr.save_interaction(
    user_id="alice",
    tenant_id="acme",
    user_input="How do Python decorators work?",
    assistant_output="Decorators are higher-order functions that wrap another function.",
    importance=0.8,
    topics=["python"],
)

# 4. Build context for the next query
ctx = mgr.build_context(
    user_id="alice",
    tenant_id="acme",
    query="Show me a decorator example.",
)

# 5. Inject into your LLM
prompt = ctx.full_text + "\n\nUser: Show me a decorator example.\nAssistant:"

Get → Fallback → Put Pattern

def get_response(mgr, user_id, tenant_id, query):
    # Build memory-enriched context
    ctx = mgr.build_context(user_id, tenant_id, query)
    
    # Call your LLM
    response = my_llm.generate(ctx.full_text + "\n\n" + query)
    
    # Store the interaction back
    mgr.save_interaction(
        user_id=user_id,
        tenant_id=tenant_id,
        user_input=query,
        assistant_output=response,
        importance=0.7,
    )
    return response

5. Public API Reference

5.1 `AIMemoryManager`

The central orchestrator. Instantiate once per application and share across request handlers.

from fennec_memory.memory import AIMemoryManager

Constructor

AIMemoryManager(
    config: Optional[MemoryConfig] = None,
    llm: Optional[LLMProtocol] = None,
    *,
    encryption_key: Optional[str] = None,
    enable_privacy_masking: bool = True,
    max_workers: int = 4,
)

Parameter	Type	Required	Description
`config`	`MemoryConfig`	No	System configuration. If `None`, defaults are used.
`llm`	`LLMProtocol`	No	Any object with `.generate(prompt: str, max_tokens: int) -> str`. Required for LLM-based compression and summarisation.
`encryption_key`	`str`	No	Secret key for Fernet symmetric encryption of stored content. If `None`, content is stored in plaintext.
`enable_privacy_masking`	`bool`	No	When `True` (default), PII is redacted before storage.
`max_workers`	`int`	No	Thread pool size for background semantic indexing and async wrappers (default: 4).

`save_interaction`

Persists one conversation turn through all memory layers.

def save_interaction(
    self,
    user_id: str,
    tenant_id: str,
    user_input: str,
    assistant_output: str,
    *,
    importance: float = 0.5,
    tags: Optional[List[str]] = None,
    metadata: Optional[Dict[str, Any]] = None,
    topics: Optional[List[str]] = None,
) -> MemoryEntry

Parameter	Type	Required	Description
`user_id`	`str`	Yes	Identifies the user within the tenant.
`tenant_id`	`str`	Yes	Tenant namespace for isolation.
`user_input`	`str`	Yes	The user's message text.
`assistant_output`	`str`	Yes	The assistant's response text.
`importance`	`float`	No	Initial importance score 0–1 (default: 0.5). Values ≥ `config.importance_threshold` trigger LTM promotion.
`tags`	`List[str]`	No	Searchable tags attached to the entry.
`metadata`	`Dict[str, Any]`	No	Arbitrary key-value metadata (e.g., session ID, request ID).
`topics`	`List[str]`	No	Topic labels used to update the user's profile topic frequency map.

Returns: MemoryEntry — the stored entry with its generated id.

Internal steps:

Access check (write permission for user_id on tenant_id).
PII masking of both user_input and assistant_output.
Optional Fernet encryption.
User profile memory_retention_boost applied to importance.
Entry added to ShortTermMemory.
Semantic embedding and indexing submitted to background thread pool.
If importance ≥ config.importance_threshold, entry promoted to LongTermMemory.
UserProfileManager updated with topics and importance.

entry = mgr.save_interaction(
    user_id="alice",
    tenant_id="acme",
    user_input="What is RAG?",
    assistant_output="RAG is Retrieval-Augmented Generation ...",
    importance=0.9,
    tags=["AI", "RAG"],
    topics=["AI", "LLM"],
    metadata={"session_id": "session_001"},
)
print(entry.id)  # e.g. "3f2a1b9c7e4d5a6f"

`build_context`

Retrieves relevant memories and assembles a BuiltContext ready to inject into an LLM prompt.

def build_context(
    self,
    user_id: str,
    tenant_id: str,
    query: str,
    *,
    documents: Optional[List[Document]] = None,
    top_k_semantic: int = 8,
    top_k_ltm: int = 10,
    include_profile: bool = True,
) -> BuiltContext

Parameter	Type	Required	Description
`user_id`	`str`	Yes	User whose profile and access rights are applied.
`tenant_id`	`str`	Yes	Tenant to query memories from.
`query`	`str`	Yes	The current user question or task description.
`documents`	`List[Document]`	No	External RAG-retrieved documents to include in context.
`top_k_semantic`	`int`	No	Number of results from semantic search (default: 8).
`top_k_ltm`	`int`	No	Number of top-importance entries from LTM (default: 10).
`include_profile`	`bool`	No	Whether to inject the user profile section (default: `True`).

Returns: BuiltContext with fields:

Field	Type	Description
`full_text`	`str`	The complete assembled context string. Inject directly into your prompt.
`token_estimate`	`int`	Estimated token count of `full_text`.
`sections`	`Dict[str, str]`	Individual sections: `user_profile`, `memories`, `documents`, `history`.
`truncated`	`bool`	`True` if any section was truncated to fit its budget.
`document_count`	`int`	Number of documents included.
`memory_count`	`int`	Number of memory entries included.

Internal steps:

Semantic search over STM + LTM embeddings.
Top-K retrieval from LTM by importance.
Merge and deduplicate all candidates.
MemorySelector scores and filters by composite score and token budget.
Selected entries loaded into WorkingMemory.
User profile text retrieved.
ContextBuilder assembles the final BuiltContext.

from fennec_memory.memory import Document

ctx = mgr.build_context(
    user_id="alice",
    tenant_id="acme",
    query="How do I implement a caching layer?",
    documents=[
        Document(page_content="Redis supports LRU eviction ...", metadata={"source":"redis_docs"}, doc_id="doc_001")
    ],
    top_k_semantic=5,
)

# Use ctx.full_text as the LLM context block
print(ctx.full_text)
print(f"Memories used: {ctx.memory_count}, truncated: {ctx.truncated}")

`build_prompt`

Convenience wrapper that returns a complete, ready-to-send prompt string.

def build_prompt(
    self,
    user_id: str,
    tenant_id: str,
    query: str,
    *,
    documents: Optional[List[Document]] = None,
    system_instruction: str = "You are a helpful, knowledgeable assistant.",
) -> str

prompt = mgr.build_prompt(
    user_id="alice",
    tenant_id="acme",
    query="Explain async/await in Python.",
    system_instruction="You are an expert Python engineer.",
)
response = my_llm.generate(prompt)

`run_maintenance`

Executes a full maintenance pass for a tenant: decay, eviction, compression, forgetting.

def run_maintenance(self, tenant_id: str) -> Dict[str, Any]

Returns a report dict with keys:

Key	Description
`stm_evicted`	STM entries removed for falling below `min_importance`.
`ltm_decay_updated`	LTM entries that had their importance updated.
`ltm_deleted`	LTM entries deleted for falling below `min_importance`.
`working_forgotten`	WorkingMemory entries marked forgotten.
`ltm_compression`	Dict with `merges` and `noise_removed` counts (only present when LTM is >80% full).

Schedule this to run periodically (e.g., once per hour via APScheduler or a background task):

import asyncio

async def maintenance_loop(mgr, tenant_id, interval_seconds=3600):
    while True:
        report = await mgr.arun_maintenance(tenant_id)
        print(f"Maintenance: {report}")
        await asyncio.sleep(interval_seconds)

`grant` / `revoke`

Manage user access roles within a tenant.

mgr.grant(user_id: str, tenant_id: str, role: str = "writer") -> None
mgr.revoke(user_id: str, tenant_id: str, role: str) -> None

Built-in roles:

Role	Permissions
`reader`	`READ`
`writer`	`READ`, `WRITE`
`owner`	`READ`, `WRITE`, `DELETE`
`admin`	`READ`, `WRITE`, `DELETE`, `ADMIN`

mgr.grant("alice", "acme", role="owner")
mgr.grant("bob",   "acme", role="writer")
mgr.grant("guest", "acme", role="reader")

Default behaviour: If no roles have been assigned for a given tenant, all write operations are permitted by default. The moment grant() is called for any user in a tenant, strict enforcement activates for all users in that tenant. Explicitly assign roles to all users before making the first grant() call if you require fine-grained control.

`stats`

Returns a comprehensive statistics snapshot for a tenant.

def stats(self, tenant_id: str) -> Dict[str, Any]

s = mgr.stats("acme")
# s["stm"]     → ShortTermMemory stats
# s["working"] → WorkingMemory stats
# s["ltm"]     → LongTermMemory stats
# s["semantic"]→ SemanticMemory stats
# s["profile_manager"] → aggregate user profile stats
# s["encryption_active"] → bool
# s["privacy_masking_active"] → bool

`clear_tenant`

Wipes all memory for a tenant across every layer. Irreversible.

def clear_tenant(self, tenant_id: str) -> None

Async API

All three primary operations have async counterparts:

entry = await mgr.asave_interaction(user_id, tenant_id, user_input, assistant_output, **kwargs)
ctx   = await mgr.abuild_context(user_id, tenant_id, query, **kwargs)
report = await mgr.arun_maintenance(tenant_id)

These run the synchronous methods in the internal ThreadPoolExecutor and are safe to await from any async context.

Note: The async methods are not native coroutines — they dispatch to a ThreadPoolExecutor internally. Blocking I/O inside the thread pool may become a bottleneck if max_workers is low under high concurrency. Increase max_workers accordingly.

5.2 `MemoryConfig`

The top-level configuration dataclass.

from fennec_memory.memory import MemoryConfig

config = MemoryConfig(
    max_short_term=1000,
    max_long_term=1000,
    max_working=1000,
    importance_threshold=0.7,
    similarity_threshold=0.85,
    retrieval_limit=10,
    embedding_model="all-MiniLM-L6-v2",
    embedding_batch_size=32,
    embedding_cache_size=1000,
    normalize_text=False,
    preserve_case=True,
    enable_persistence=True,
    persistence_path="./memory_storage",
    auto_save_interval=300,
    enable_decay=True,
    decay_rate=0.1,
    min_importance=0.1,
    enable_consolidation=True,
    consolidation_interval=3600,
    max_tokens=2000,
    window_size=5,
    log_level="INFO",
    enable_stats=True,
)

Full parameter reference:

Parameter	Type	Default	Description
`max_short_term`	`int`	`1000`	Maximum entries in ShortTermMemory (sliding window).
`max_long_term`	`int`	`1000`	Maximum entries tracked in LTM (database has no hard cap).
`max_working`	`int`	`1000`	Maximum entries in WorkingMemory per turn.
`max_episodic`	`int`	`1000`	Capacity hint for episodic memory.
`importance_threshold`	`float`	`0.7`	Minimum importance for LTM promotion. Must be 0–1.
`similarity_threshold`	`float`	`0.85`	Cosine similarity threshold for duplicate detection.
`retrieval_limit`	`int`	`10`	Default `top_k` for `MemorySelector`.
`embedding_model`	`str`	`"all-MiniLM-L6-v2"`	`sentence-transformers` model name.
`embedding_batch_size`	`int`	`32`	Batch size for bulk embedding generation.
`embedding_cache_size`	`int`	`1000`	LRU cache size for the embedding model.
`normalize_text`	`bool`	`False`	Lowercase text before storage.
`preserve_case`	`bool`	`True`	Keep original casing.
`enable_persistence`	`bool`	`True`	Enable SQLite LTM and JSON profile persistence.
`persistence_path`	`str`	`"./memory_storage"`	Base directory for all on-disk data.
`auto_save_interval`	`int`	`300`	Auto-save interval in seconds (used by profile manager).
`enable_decay`	`bool`	`True`	Enable time-based importance decay.
`decay_rate`	`float`	`0.1`	Base importance loss per day (at zero access count).
`min_importance`	`float`	`0.1`	Entries below this floor are evicted during maintenance.
`enable_consolidation`	`bool`	`True`	Enable periodic memory consolidation.
`consolidation_interval`	`int`	`3600`	Consolidation interval in seconds.
`max_tokens`	`int`	`2000`	Token budget for `ContextBuilder` and `WorkingMemory`.
`window_size`	`int`	`5`	`ShortTermMemory` sliding window size (number of turns).
`log_level`	`str`	`"INFO"`	Python logging level.

Environment Variable Bootstrap

config = MemoryConfig.from_env()

Supported environment variables:

Variable	Corresponds To
`MEMORY_MAX_SHORT_TERM`	`max_short_term`
`MEMORY_MAX_LONG_TERM`	`max_long_term`
`MEMORY_EMBEDDING_MODEL`	`embedding_model`
`MEMORY_PERSISTENCE_PATH`	`persistence_path`
`MEMORY_ENABLE_PERSISTENCE`	`enable_persistence` (any truthy string)

`to_dict()` — Serialisation

d = config.to_dict()

Converts the full MemoryConfig to a plain dictionary. Useful for logging, auditing, or persisting configuration state.

5.3 Conversation Memory Classes

These are lightweight, LangChain-compatible memory classes for direct conversational use. They implement BaseMemory with save_context / load_memory_variables / clear, and expose async variants (asave_context, aload_memory_variables).

`ConversationBufferMemory`

Stores the full conversation history without any truncation.

from fennec_memory.memory import ConversationBufferMemory

mem = ConversationBufferMemory(
    return_messages=False,   # True → returns List[dict], False → returns text string
    input_key="input",
    output_key="output",
    memory_key="history",
)

mem.save_context({"input": "Hello"}, {"output": "Hi there!"})
variables = mem.load_memory_variables({})
# variables["history"] → "user: Hello\n assistant: Hi there!"

Best for: short conversations where full context fits in the LLM's window.

`ConversationBufferWindowMemory`

Keeps only the last k interactions.

from fennec_memory.memory import ConversationBufferWindowMemory

mem = ConversationBufferWindowMemory(k=5)  # remembers last 5 turns
info = mem.get_window_info()
# {"current_size": 3, "max_size": 5, "available_slots": 2, "is_full": False}

Best for: long conversations where only recent context matters.

`ConversationSummaryMemory`

Automatically summarises old turns using an LLM when the estimated token count exceeds max_token_limit.

from fennec_memory.memory import ConversationSummaryMemory

mem = ConversationSummaryMemory(
    llm=my_llm,
    max_token_limit=2000,
)

mem.save_context({"input": "..."}, {"output": "..."})  # triggers summarisation when needed
summary = mem.get_summary()       # retrieve the current running summary
mem.force_summarize()             # trigger summarisation immediately

The LLM is called with a summary_prompt_template that includes the previous summary and the new turns. On failure, the last 5 turns are preserved as a fallback.

Best for: very long conversations that must stay within a strict token budget.

`ConversationEntityMemory`

Tracks named entities (people, places, organisations, products) mentioned in conversation and provides relevant entity context on retrieval.

from fennec_memory.memory import ConversationEntityMemory

mem = ConversationEntityMemory(
    llm=my_llm,         # used for entity extraction; falls back to stanza NLP
    memory_key="entity_info",
    lang="en",          # stanza language code (used in fallback)
)

mem.save_context(
    {"input": "I was talking to Dr. Smith at OpenAI."},
    {"output": "That's interesting."},
)

entities = mem.list_entities()           # ["Dr. Smith", "OpenAI"]
info = mem.get_entity_info("OpenAI")     # {"name": "OpenAI", "contexts_count": 1, ...}

Entity extraction uses the LLM if available, otherwise falls back to Stanza NLP for NER.

Best for: conversations involving multiple referenced people, organisations, or places.

`BaseMemory` Interface

All memory classes implement:

class BaseMemory(ABC):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, Any]) -> None: ...
    def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]: ...
    def clear(self) -> None: ...
    def get_memory_stats(self) -> Dict[str, Any]: ...

    # Async variants (via asyncio.to_thread)
    async def asave_context(self, inputs, outputs): ...
    async def aload_memory_variables(self, inputs): ...

5.4 `SemanticMemory`

FAISS-backed (or NumPy-fallback) vector store for similarity search.

from fennec_memory.memory import SemanticMemory

sem = SemanticMemory(
    model_name="all-MiniLM-L6-v2",
    cache_size=1000,
    tenant_id="acme",
)

Method	Signature	Description
`add`	`(entry: MemoryEntry) -> None`	Embed and index one entry. Sets `entry.embedding` in place.
`add_batch`	`(entries: List[MemoryEntry]) -> None`	Batch embed for performance.
`remove`	`(entry_id: str) -> None`	Remove entry from index and store.
`search`	`(query: str, k: int = 5, min_score: float = 0.0) -> List[Tuple[MemoryEntry, float]]`	Return top-K entries with cosine scores. Calls `entry.access()` on each result.
`search_entries`	`(query: str, k: int = 5, min_score: float = 0.0) -> List[MemoryEntry]`	Returns entries only (no scores).
`clear`	`() -> None`	Wipe the entire index and store.
`stats`	`() -> Dict[str, Any]`	Returns backend (`faiss` or `numpy`), model name, entry count, tenant ID.

results = sem.search("Python async patterns", k=5, min_score=0.3)
for entry, score in results:
    print(f"Score: {score:.3f}  Content: {str(entry.content)[:60]}")

5.5 `LongTermMemory` (Storage)

SQLite-backed persistent store. One database file per tenant, located at {persistence_path}/{tenant_id}_ltm.db.

from fennec_memory.memory import LongTermMemory

ltm = LongTermMemory(
    db_path="./memory_storage/acme_ltm.db",
    tenant_id="acme",
)

Method	Signature	Description
`store`	`(entry: MemoryEntry) -> None`	Upsert one entry (INSERT OR UPDATE on conflict).
`store_many`	`(entries: List[MemoryEntry]) -> None`	Batch upsert for efficiency.
`get`	`(entry_id: str) -> Optional[MemoryEntry]`	Fetch by ID; calls `entry.access()` and persists updated access stats.
`get_top_by_importance`	`(limit: int = 20, min_importance: float = 0.0) -> List[MemoryEntry]`	Ordered by `importance DESC`.
`get_recent`	`(limit: int = 20) -> List[MemoryEntry]`	Ordered by `timestamp DESC`.
`count`	`() -> int`	Total entry count for this tenant.
`delete`	`(entry_id: str) -> None`	Remove a single entry.
`delete_below_importance`	`(threshold: float) -> int`	Bulk delete; returns row count.
`apply_global_decay`	`(decay_rate: float = 0.1) -> int`	Apply time-based decay to all entries; returns updated count.
`clear`	`() -> None`	Delete all entries for this tenant.
`stats`	`() -> Dict[str, Any]`	Returns db_path, tenant_id, total_entries.

The SQLite schema includes indices on (tenant_id), (tenant_id, importance DESC), and (tenant_id, timestamp DESC) for efficient queries. All writes are protected by a threading.Lock.

5.6 `MemorySelector`

Scores and ranks candidate memories for context injection.

from fennec_memory.memory import MemorySelector, SelectionConfig

config = SelectionConfig(
    relevance_weight=0.50,
    recency_weight=0.20,
    importance_weight=0.30,
    recency_half_life_hours=24.0,
    min_composite_score=0.10,
    top_k=10,
    token_budget=3000,
)
selector = MemorySelector(config=config)

Constraint: relevance_weight + recency_weight + importance_weight must equal 1.0 (±0.1%).

Method	Signature	Description
`select`	`(query, candidates, similarity_scores=None) -> List[ScoredMemory]`	Return scored and filtered entries.
`select_entries`	`(query, candidates, similarity_scores=None) -> List[MemoryEntry]`	Same as `select` but returns unwrapped entries.
`explain`	`(query, candidates, similarity_scores=None) -> List[Dict]`	Human-readable scoring breakdown (useful for debugging).

explanation = selector.explain(
    query="Python async patterns",
    candidates=stm.get_all(),
)
for row in explanation:
    print(f"id={row['id']}  composite={row['composite']:.3f}  "
          f"relevance={row['relevance']:.3f}  recency={row['recency']:.3f}")

ScoredMemory fields: entry, relevance_score, recency_score, importance_score, composite_score.

5.7 `MemoryCompressor`

LLM-based noise removal, duplicate merging, and summarisation.

from fennec_memory.memory import MemoryCompressor

compressor = MemoryCompressor(
    llm=my_llm,
    merge_similarity_threshold=0.92,
    tenant_id="acme",
)

Note: merge_duplicates and compress_batch (when merge=True) require entries to have a populated embedding field. LLM-based summarisation requires llm to be supplied in the constructor.

Method	Signature	Description
`remove_noise`	`(entries) -> Tuple[List[MemoryEntry], int]`	Drop entries with fewer than 10 chars, fewer than 3 words, or <30% alphanumeric content. Returns `(cleaned, n_removed)`.
`merge_duplicates`	`(entries) -> Tuple[List[MemoryEntry], int]`	Merge entry pairs with cosine similarity ≥ `merge_similarity_threshold`. Returns `(merged_list, n_merges)`.
`summarise`	`(entries, max_tokens=256, target_type=LONG_TERM) -> Optional[MemoryEntry]`	LLM-summarise a list into one compact entry. Falls back to concatenation if LLM is unavailable.
`compress_batch`	`(entries, *, remove_noise=True, merge=True, summarise=False, summarise_threshold=20) -> Dict`	Full pipeline. Returns dict with keys `entries`, `noise_removed`, `merges`, `summarised`.

result = compressor.compress_batch(
    candidates,
    remove_noise=True,
    merge=True,
    summarise=True,
    summarise_threshold=5,
)
print(f"Reduced {len(candidates)} → {len(result['entries'])} entries")
print(f"Noise removed: {result['noise_removed']}, Merges: {result['merges']}")

5.8 `ForgettingMechanism`

Biologically-inspired forgetting based on the Ebbinghaus curve.

from fennec_memory.memory import ForgettingMechanism, ForgettingConfig

forgetter = ForgettingMechanism(ForgettingConfig(
    base_decay_rate=0.15,           # importance lost per day at zero accesses
    recency_boost_per_access=0.05,  # each access multiplies effective half-life
    max_recency_boost=2.0,          # cap on recency boost multiplier
    min_importance=0.05,            # minimum floor before removal
    deletion_threshold=0.05,        # entries at or below this are "forgotten"
    high_frequency_threshold=5,     # ≥5 accesses → "well-rehearsed"
    apply_decay_on_read=False,      # if True, decay runs on every load
))

Method	Signature	Description
`apply`	`(entries: List[MemoryEntry]) -> Tuple[List, List]`	Returns `(alive, forgotten)`. Mutates `importance` and `decay_factor` in place.
`apply_to_single`	`(entry: MemoryEntry) -> bool`	Decays one entry; returns `True` if it should be kept.
`score_retention`	`(entry: MemoryEntry) -> float`	Non-mutating retention score 0–1. Use for previewing at-risk entries.
`report`	`(entries: List[MemoryEntry]) -> Dict`	Diagnostic dict: `total`, `at_risk_count`, `at_risk_ids`.

alive, forgotten = forgetter.apply(entries)
print(f"{len(alive)} entries retained, {len(forgotten)} forgotten")

# Preview without mutating
report = forgetter.report(entries)
print(f"At risk: {report['at_risk_count']} of {report['total']}")

Known behaviour: When entries is an empty list, apply() returns the string "entries is empty" instead of a tuple. Always guard with if entries: before calling. See Section 10 for the recommended pattern.

5.9 `ContextBuilder`

Assembles a structured, token-budgeted context string from multiple sources.

from fennec_memory.memory import ContextBuilder, Document

builder = ContextBuilder(
    total_token_budget=4000,
    profile_budget_fraction=0.05,
    memory_budget_fraction=0.30,
    document_budget_fraction=0.45,
    history_budget_fraction=0.20,
)

`build`

def build(
    self,
    *,
    documents: Optional[List[Document]] = None,
    memories: Optional[List[MemoryEntry]] = None,
    history: Optional[List[Dict[str, str]]] = None,
    user_profile_text: Optional[str] = None,
    query: Optional[str] = None,
) -> BuiltContext

Returns a BuiltContext. Sections are added in priority order: user_profile → memories → documents → history. Each section is individually token-capped and marked truncated=True if content was cut.

`build_prompt`

def build_prompt(
    self,
    query: str,
    *,
    documents=None,
    memories=None,
    history=None,
    user_profile_text=None,
    system_instruction="You are a helpful, knowledgeable assistant.",
) -> str

Returns a fully assembled prompt string including the system instruction, context, and the current query.

`Document`

@dataclass
class Document:
    page_content: str
    metadata: Dict[str, Any] = field(default_factory=dict)
    doc_id: Optional[str] = None

5.10 `UserProfileManager`

Manages user profiles across all tenants, with optional JSON persistence.

from fennec_memory.memory import UserProfileManager

manager = UserProfileManager(
    persist_dir="./memory_storage/profiles",
    auto_save=True,
)

Method	Signature	Description
`get_or_create`	`(user_id: str) -> UserProfile`	Load from disk or create a new default profile.
`update_from_interaction`	`(user_id, query, topics=None, importance=0.5) -> UserProfile`	Record one interaction; updates topic frequency and importance average.
`set_preference`	`(user_id, key, value) -> None`	Set an explicit user preference.
`delete`	`(user_id: str) -> bool`	Remove from memory and disk.
`list_users`	`() -> List[str]`	All known user IDs (in-memory + on-disk).
`aggregate_stats`	`() -> Dict`	Cross-user stats: user count, total interactions, top global topics.

UserProfile key properties and methods:

Member	Type	Description
`user_id`	`str`	—
`preferred_language`	`str`	Default `"en"`.
`verbosity`	`str`	`"concise"`, `"medium"`, or `"detailed"`.
`topics_of_interest`	`List[str]`	Top-10 inferred from topic frequency.
`memory_retention_boost`	`float`	0–0.5. Added to `importance` on every `save_interaction`.
`get_preference(key, default=None)`	`Any`	Retrieve a custom preference value.
`set_preference(key, value)`	`None`	Set a custom preference on the profile object.
`to_context_string()`	`str`	Compact profile text for injection into prompts.
`start_session()`	`None`	Record the start of a new session (increments session counter).
`update_retention_boost(boost: float)`	`None`	Update `memory_retention_boost`; clamped to [0, 0.5].

profile = manager.get_or_create("alice")
manager.set_preference("alice", "verbosity", "detailed")
manager.set_preference("alice", "preferred_language", "en")

# Inject into prompt
profile_text = profile.to_context_string()

6. Configuration System

Full `MemoryConfig` with All Defaults

from fennec_memory.memory import MemoryConfig

config = MemoryConfig(
    # Memory layer capacities
    max_short_term=1000,
    max_long_term=1000,
    max_working=1000,
    max_episodic=1000,
    max_semantic=1000,
    max_procedral=1000,

    # Retrieval
    importance_threshold=0.7,    # LTM promotion cutoff
    similarity_threshold=0.85,   # duplicate detection
    retrieval_limit=10,          # default top_k

    # Embeddings
    embedding_model="all-MiniLM-L6-v2",
    embedding_batch_size=32,
    embedding_cache_size=1000,

    # Text processing
    normalize_text=False,
    preserve_case=True,
    remove_duplicates=True,

    # Persistence
    enable_persistence=True,
    persistence_path="./memory_storage",
    auto_save_interval=300,

    # LangChain-style keys
    return_messages=False,
    input_key="input",
    output_key="output",
    memory_key="history",
    max_token_limit=2000,
    window_size=5,

    # Decay
    enable_decay=True,
    decay_rate=0.1,        # 10% per day at zero accesses
    min_importance=0.1,

    # Consolidation
    enable_consolidation=True,
    consolidation_interval=3600,

    # LLM token budget
    max_tokens=2000,

    # Logging
    log_level="INFO",
    enable_stats=True,
)

`SelectionConfig`

Controls the MemorySelector scoring weights:

from fennec_memory.memory import SelectionConfig

selection_config = SelectionConfig(
    relevance_weight=0.50,         # cosine similarity fraction
    recency_weight=0.20,           # recency exponential decay fraction
    importance_weight=0.30,        # effective_importance fraction
    recency_half_life_hours=24.0,  # score halves every 24 hours
    min_composite_score=0.10,      # minimum score to be included
    top_k=10,                      # maximum entries to select
    token_budget=3000,             # maximum tokens across all selected entries
)

`ForgettingConfig`

from fennec_memory.memory import ForgettingConfig

forgetting_config = ForgettingConfig(
    base_decay_rate=0.15,
    recency_boost_per_access=0.05,
    max_recency_boost=2.0,
    min_importance=0.05,
    deletion_threshold=0.05,
    high_frequency_threshold=5,
    apply_decay_on_read=False,
)

Environment Variable Reference

export MEMORY_MAX_SHORT_TERM=500
export MEMORY_MAX_LONG_TERM=5000
export MEMORY_EMBEDDING_MODEL="all-MiniLM-L6-v2"
export MEMORY_PERSISTENCE_PATH="/var/data/memory"
export MEMORY_ENABLE_PERSISTENCE="true"

Load with:

config = MemoryConfig.from_env()

7. Security Model

7.1 PII Detection and Masking

SensitiveDataMasker applies regex patterns to redact sensitive data before any storage operation.

Built-in patterns:

Label	What It Matches
`EMAIL`	Standard email addresses
`CREDIT_CARD`	13–16 digit card numbers
`SSN`	US Social Security Numbers (###-##-####)
`PHONE_INTL`	International phone numbers
`IBAN`	International Bank Account Numbers
`IP_V4`	IPv4 addresses
`URL_WITH_AUTH`	URLs containing credentials (`user:pass@host`)
`JWT`	JSON Web Tokens (`eyJ...`)
`API_KEY`	Keys starting with `sk-`, `pk-`, `rk-`, `ak-`, `token-`
`PASSWORD_KV`	`password=`, `passwd:`, `pwd=` key-value pairs

from fennec_memory.memory import SensitiveDataMasker

masker = SensitiveDataMasker(
    custom_patterns=[
        ("INTERNAL_ID", r"INT-\d{6}"),  # add your own
    ],
    placeholder_fmt="[{label}]",
)

cleaned, report = masker.mask("Contact me at ceo@company.com, card: 4111111111111111")
# cleaned → "Contact me at [EMAIL], card: [CREDIT_CARD]"
# report  → {"EMAIL": 1, "CREDIT_CARD": 1}

# Check for sensitive data without masking
has_pii = masker.has_sensitive_data("Contact me at ceo@company.com")
# → True

# Recursively mask nested dicts or lists
cleaned_content, report = masker.mask_entry_content({"input": "my SSN is 123-45-6789"})

When enable_privacy_masking=True on AIMemoryManager, both user_input and assistant_output are masked before any storage or encryption step.

7.2 Encryption

MemoryEncryptor uses Fernet symmetric encryption (AES-128-CBC + HMAC-SHA256). The user-supplied secret key is hashed with SHA-256 to derive a 32-byte Fernet key.

In practice, you do not interact with MemoryEncryptor directly — pass encryption_key to the AIMemoryManager constructor and encryption is applied automatically on every save_interaction call.

from fennec_memory.memory import MemoryEncryptor

enc = MemoryEncryptor(secret_key="my-production-secret")

token = enc.encrypt({"input": "sensitive data", "output": "sensitive answer"})
data  = enc.decrypt(token)

print(enc.is_real_encryption)   # True if cryptography package is installed
print(enc.key_fingerprint)      # SHA-256 fingerprint of the key (first 16 hex chars)

If the cryptography package is not installed, the encryptor falls back to base64 encoding (not secure) and logs a warning. In this fallback mode, is_real_encryption is False.

Warning: If the encryption_key is lost, all encrypted entries are permanently unrecoverable.

7.3 Role-Based Access Control

AccessController implements per-tenant RBAC with built-in and custom roles.

from fennec_memory.memory import AccessController, Permission

ac = AccessController()

# Built-in role assignment
ac.assign_role("alice", "acme", "owner")
ac.assign_role("bob",   "acme", "writer")
ac.assign_role("guest", "acme", "reader")

# Custom role
ac.define_role("analyst", {Permission.READ})
ac.assign_role("carol", "acme", "analyst")

# Check and enforce
if ac.has_permission("bob", "acme", Permission.WRITE):
    # proceed
    pass

ac.require("guest", "acme", Permission.DELETE)  # raises PermissionError

# Inspect
perms = ac.list_permissions("alice", "acme")  # {Permission.READ, Permission.WRITE, Permission.DELETE}

# Revoke
ac.revoke_role("bob", "acme", "writer")

7.4 Tenant Isolation

Memory namespacing is enforced at every layer:

ShortTermMemory tags each entry with tenant_id in its metadata.
LongTermMemory includes tenant_id in all SQL WHERE clauses and indices.
SemanticMemory prefixes every vector index ID as "{tenant_id}::{entry_id}".
UserProfileManager is tenant-agnostic but user-specific; user data is never cross-contaminated between tenants.

There is no "global" query. You always specify tenant_id explicitly.

8. Storage Backends

`ShortTermMemory` (in-process `deque`)

When to use: Always. Present in every deployment as the primary write target.

from fennec_memory.memory import ShortTermMemory

stm = ShortTermMemory(window_size=20, tenant_id="acme")

Backed by collections.deque(maxlen=window_size). Oldest entries are automatically evicted when full (FIFO). Supports:

Method	Signature	Description
`add`	`(content, *, importance=0.5, tags=None, metadata=None) -> MemoryEntry`	Add one entry; returns the created `MemoryEntry`.
`get_all`	`() -> List[MemoryEntry]`	All current entries.
`get_recent`	`(n: int) -> List[MemoryEntry]`	Last N entries.
`apply_decay`	`(decay_rate: float = 0.1) -> None`	Apply one decay cycle to all entries in place.
`evict_below`	`(min_importance: float) -> List[MemoryEntry]`	Remove low-importance entries; returns evicted list.
`clear`	`() -> None`	Empty the window.

No persistence. Lost on process restart. Use LTM for durability.

`WorkingMemory` (in-process `dict`)

When to use: Always. Manages the active-context entries for the current reasoning turn.

from fennec_memory.memory import WorkingMemory

wm = WorkingMemory(capacity=10, token_budget=3000, tenant_id="acme")

Method	Signature	Description
`load`	`(entries: List[MemoryEntry]) -> None`	Replace current contents with new entries, sorted by importance, respecting `token_budget`.
`add`	`(content, *, importance=0.6, tags=None, metadata=None) -> MemoryEntry`	Add one entry; evicts the lowest-importance entry if `capacity` is exceeded.
`remove`	`(entry_id: str) -> Optional[MemoryEntry]`	Remove and return an entry by ID.
`get_all`	`() -> List[MemoryEntry]`	All entries, sorted descending by `effective_importance`.
`get_as_text`	`() -> str`	Serialise all entries to a newline-delimited text string, ready for prompt injection.
`clear`	`() -> None`	Empty the working memory.

`LongTermMemory` (SQLite)

When to use: Any deployment that needs memory to survive process restarts or scale across sessions. Default persistence layer.

One .db file per tenant, stored at {persistence_path}/{tenant_id}_ltm.db. Thread-safe via threading.Lock. Supports full CRUD, bulk inserts, importance filtering, timestamp ordering, and global decay.

Suitable for single-instance deployments, development, and moderate production workloads.

`SemanticMemory` (FAISS / NumPy)

When to use: Always. Provides the semantic search capability that distinguishes this system from simple history-based retrieval.

FAISS IndexFlatIP is the default when faiss-cpu (or faiss-gpu) is installed. Automatically falls back to a pure-NumPy brute-force O(n) cosine search when FAISS is absent.

FAISS is recommended for production with more than ~1,000 entries per tenant. NumPy fallback is suitable for testing and small datasets.

Backend Comparison

Aspect	ShortTermMemory	LongTermMemory	SemanticMemory
Persistence	None (in-memory)	SQLite on disk	In-memory (rebuilt on restart)
Capacity	`window_size` entries	Unbounded	Unbounded
Query type	Recency (FIFO)	Importance / timestamp	Semantic similarity
Concurrency	Single-threaded	Thread-safe (`Lock`)	Single-threaded
Startup cost	None	Schema init	Model load on first embed
Recommended use	Recent context	Long-lived facts	Semantic retrieval

For production Redis-backed LTM, implement the LongTermMemory interface (same store, get, get_top_by_importance contract) and swap the instance in the _TenantBundle.

9. Observability & Metrics

Per-Tenant Stats

s = mgr.stats("acme")

Returns:

{
  "tenant_id": "acme",
  "stm": {
    "type": "ShortTermMemory",
    "window_size": 20,
    "current_size": 14,
    "is_full": false,
    "tenant_id": "acme"
  },
  "working": {
    "type": "WorkingMemory",
    "capacity": 10,
    "token_budget": 4000,
    "entries": 6,
    "estimated_tokens_used": 850,
    "tenant_id": "acme"
  },
  "ltm": {
    "type": "LongTermMemory",
    "db_path": "./memory_storage/acme_ltm.db",
    "tenant_id": "acme",
    "total_entries": 342
  },
  "semantic": {
    "type": "SemanticMemory",
    "backend": "faiss",
    "model": "all-MiniLM-L6-v2",
    "entries": 356,
    "tenant_id": "acme"
  },
  "profile_manager": {
    "user_count": 3,
    "total_interactions": 1204,
    "avg_interactions_per_user": 401.3,
    "top_global_topics": [["python", 340], ["AI", 210]]
  },
  "encryption_active": true,
  "privacy_masking_active": true
}

Maintenance Reports

report = mgr.run_maintenance("acme")
# {
#   "tenant_id": "acme",
#   "timestamp": 1714000000.0,
#   "stm_evicted": 2,
#   "ltm_decay_updated": 120,
#   "ltm_deleted": 5,
#   "working_forgotten": 1,
#   "ltm_compression": {"merges": 3, "noise_removed": 7}
# }

Forgetting Diagnostics

Use ForgettingMechanism.report for a non-mutating at-risk preview before committing a maintenance cycle:

report = mgr._forgetter.report(bundle.stm.get_all())
print(f"At-risk: {report['at_risk_count']} / {report['total']}")
print(f"At-risk IDs: {report['at_risk_ids']}")

Selector Scoring Breakdown

Use MemorySelector.explain to understand why specific memories were or were not selected:

explanation = mgr._selector.explain(
    query="Python caching strategies",
    candidates=bundle.stm.get_all() + bundle.ltm.get_top_by_importance(20),
)
for row in explanation:
    print(f"{row['id'][:10]}  composite={row['composite']:.3f}  "
          f"rel={row['relevance']:.3f}  rec={row['recency']:.3f}  "
          f"imp={row['importance']:.3f}  |  {row['content'][:60]}")

User Profile Analytics

mgr._profile_manager.aggregate_stats()
# {
#   "user_count": 15,
#   "total_interactions": 5230,
#   "avg_interactions_per_user": 348.7,
#   "top_global_topics": [["python", 1200], ...]
# }

10. Edge Cases & Failure Handling

Embedding Service Failure

If sentence-transformers is not installed or the model files are unavailable, _EmbeddingModel falls back to deterministic stub embeddings: hash-seeded random 384-dimensional vectors. These are consistent across calls for the same text (same hash → same seed → same vector), so similarity search still functions — but results will not be semantically meaningful.

Log warning: "sentence-transformers unavailable — using deterministic stub embeddings."

Mitigation: Pre-install the model locally and set TRANSFORMERS_OFFLINE=1 to prevent accidental remote downloads in production.

FAISS Unavailable

If faiss is not installed, SemanticMemory silently switches to _NumpyIndex (brute-force O(n) cosine search). This is transparent to callers but significantly slower at scale.

Log warning: "FAISS not installed — using pure-numpy fallback index."

Mitigation: Install faiss-cpu in production. The NumPy fallback is acceptable for fewer than ~1,000 entries.

SQLite / LTM Failure

LongTermMemory wraps all operations in a context manager that calls conn.rollback() on exception and re-raises. If the SQLite file is corrupted or the path is inaccessible:

store() and get() will raise an exception.
The AIMemoryManager thread pool submits LTM writes asynchronously; failures are logged but do not crash the main thread.

Mitigation: Ensure persistence_path is writable before instantiation. Back up the .db files regularly.

Quota / Capacity Exceeded

ShortTermMemory uses a deque(maxlen=window_size) — the oldest entry is automatically evicted when full. No exception is raised.

WorkingMemory evicts the lowest-effective_importance entry when capacity is exceeded. Entries that would exceed token_budget are silently skipped during load().

LongTermMemory has no hard cap. Use run_maintenance() and delete_below_importance() to manage growth.

Corrupted or Missing Entries

LongTermMemory._row_to_entry uses json.loads on stored content; a corrupted JSON string will raise json.JSONDecodeError. Individual corrupted rows will surface as exceptions from get() or get_top_by_importance(). The calling code in build_context does not catch these — add a try/except wrapper around build_context if corruption is a concern.

Missing Tenant

AIMemoryManager._bundle(tenant_id) creates a new _TenantBundle on first access for any unknown tenant_id. There is no explicit register_tenant step required. The bundle contains fresh, empty memory layers.

Async/Sync Mismatch

asave_interaction, abuild_context, and arun_maintenance use asyncio.get_event_loop().run_in_executor internally. They must be awaited from within a running asyncio event loop. Do not call them from synchronous code without asyncio.run(...):

# Correct in async context
entry = await mgr.asave_interaction(...)

# Correct in sync context
entry = asyncio.run(mgr.asave_interaction(...))

# Wrong — returns a coroutine object, not the result
entry = mgr.asave_interaction(...)  # ← missing await

`ForgettingMechanism.apply` on Empty List

When entries is empty, apply() returns the string "entries is empty" instead of a tuple. This is a known implementation inconsistency. Always guard:

if entries:
    alive, forgotten = forgetter.apply(entries)
else:
    alive, forgotten = [], []

LLM Unavailable for Compression/Summarisation

If llm=None or the LLM call raises an exception:

MemoryCompressor.summarise() falls back to simple string concatenation (truncated to 500 chars).
MemoryCompressor._merge_pair() falls back to concatenation with a | separator.
ConversationSummaryMemory._summarize() logs the error and retains only the last 5 messages.

The system never hard-fails due to an absent LLM.

Encryption: `cryptography` Package Missing

MemoryEncryptor degrades to base64 encoding with a logged warning. is_real_encryption returns False. Never use the base64 fallback for sensitive data in production — it provides no security.

11. Advanced Usage

Multi-Tenant Setup

from memory import AIMemoryManager, MemoryConfig

config = MemoryConfig(
    persistence_path="/var/data/memory",
    importance_threshold=0.6,
    max_tokens=4000,
    enable_persistence=True,
)

mgr = AIMemoryManager(
    config=config,
    llm=my_llm,
    enable_privacy_masking=True,
    encryption_key=os.environ["MEMORY_ENCRYPTION_KEY"],
)

# Tenant A — full team roles
mgr.grant("admin_user", "tenant_a", role="admin")
mgr.grant("agent_1",    "tenant_a", role="writer")
mgr.grant("readonly",   "tenant_a", role="reader")

# Tenant B — separate namespace, no data overlap
mgr.grant("agent_2", "tenant_b", role="owner")

# Tenant A and B memories are fully isolated
mgr.save_interaction("agent_1", "tenant_a", "secret A", "answer A", importance=0.9)
mgr.save_interaction("agent_2", "tenant_b", "secret B", "answer B", importance=0.9)

ctx_a = mgr.build_context("agent_1", "tenant_a", "secret B")
# ctx_a.full_text will NOT contain "secret B" — tenant_b data is invisible

RL Feedback Loop (Importance Signals)

The system does not include an explicit RL module, but the importance parameter functions as the feedback signal. Close the loop by updating importance based on downstream signals such as user upvotes, follow-up questions, or task success:

# High-quality responses: store with elevated importance
mgr.save_interaction(..., importance=0.9)

# Routine exchanges: store with lower importance
mgr.save_interaction(..., importance=0.4)

# Profile-based boost: users whose memories are consistently useful
# get a `memory_retention_boost` applied automatically
mgr._profile_manager.get_or_create("alice").update_retention_boost(0.15)

Combining with a RAG Document Retriever

from memory import Document

def answer_with_memory_and_rag(mgr, retriever, user_id, tenant_id, query):
    # 1. Retrieve documents from your vector / BM25 store
    raw_docs = retriever.search(query, k=5)
    documents = [
        Document(content=d.text, source=d.url, score=d.score)
        for d in raw_docs
    ]

    # 2. Build memory-enriched context
    ctx = mgr.build_context(
        user_id=user_id,
        tenant_id=tenant_id,
        query=query,
        documents=documents,
        top_k_semantic=5,
    )

    # 3. Generate response
    response = my_llm.generate(ctx.full_text + f"\n\nUser: {query}\nAssistant:")

    # 4. Persist the interaction
    mgr.save_interaction(user_id, tenant_id, query, response, importance=0.7)

    return response

Custom User Preferences

mgr._profile_manager.set_preference("alice", "verbosity", "concise")
mgr._profile_manager.set_preference("alice", "preferred_language", "fr")
mgr._profile_manager.set_preference("alice", "domain", "finance")

profile = mgr._profile_manager.get_or_create("alice")
print(profile.to_context_string())
# User language: fr
# Response style: concise / neutral
# Interests: finance, python, ...
# domain: finance

Production Deployment Notes

Memory isolation: Run one AIMemoryManager instance per application process. The internal ThreadPoolExecutor is not shared across processes. For multi-process deployments, consider a shared SQLite file (acceptable for moderate load) or implement a Redis-backed LongTermMemory.

Embedding model cold start: The first save_interaction call triggers model loading (1–3 seconds). Pre-warm by calling mgr._bundle("default").semantic._embedder._load() at startup.

Maintenance scheduling: Run run_maintenance(tenant_id) on a background schedule (e.g., APScheduler or a cron job). Recommended interval: once per hour per active tenant.

Disk growth: LongTermMemory has no automatic size cap. Monitor ltm.count() and configure maintenance to delete entries below a target min_importance. For long-running production systems, set enable_consolidation=True and tune decay_rate to match your data retention requirements.

Thread safety: LongTermMemory is thread-safe via threading.Lock. ShortTermMemory and WorkingMemory are not thread-safe — use one AIMemoryManager per request thread or protect shared instances with an external lock.

Encryption key rotation: There is no built-in key rotation. To rotate keys: decrypt all existing entries with the old key, re-encrypt with the new key, and update encryption_key in the constructor.

Async FastAPI Integration Example

from fastapi import FastAPI
from contextlib import asynccontextmanager
from fennec_memory.memory import AIMemoryManager, MemoryConfig

memory_manager: AIMemoryManager = None

@asynccontextmanager
async def lifespan(app: FastAPI):
    global memory_manager
    config = MemoryConfig(persistence_path="/var/data/memory", max_tokens=4000)
    memory_manager = AIMemoryManager(config=config, llm=my_llm)
    yield
    # Cleanup if needed

app = FastAPI(lifespan=lifespan)

@app.post("/chat")
async def chat(user_id: str, tenant_id: str, message: str):
    ctx = await memory_manager.abuild_context(user_id, tenant_id, message)
    response = await my_async_llm.generate(ctx.full_text + "\n\n" + message)
    await memory_manager.asave_interaction(
        user_id, tenant_id, message, response, importance=0.7
    )
    return {"response": response}

Example With Rag

from fennec_community.llm import GeminiInterface
from fennec_community.document_loaders import TextLoader 
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem 
from fennec_memory.memory import (
    ConversationBufferMemory,
    ConversationBufferWindowMemory,
    ConversationSummaryMemory,
    ConversationEntityMemory,
)

loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = GeminiInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
rag_system.add_documents(loader_1)

rag = rag_system  
def conversational_rag_query(
    query: str,
    memory,
    memory_key: str = "history",
) -> str:
    mem_vars  = memory.load_memory_variables({"input": query})
    history   = mem_vars.get(memory_key, "")
    retrieved = rag.retrieve(query, top_k=2)
    context   = rag.context_manager.build(query, retrieved)
    if history:
        full_prompt = (
            f"سياق المحادثة السابقة:\n{str(history)[:400]}\n\n"
            f"معلومات مسترجعة:\n{context}\n\n"
            f"السؤال الحالي: {query}\nالإجابة:"
        )
    else:
        full_prompt = f"معلومات مسترجعة:\n{context}\n\nالسؤال: {query}\nالإجابة:"
    answer = rag.llm.generate(full_prompt)
    memory.save_context({"input": query}, {"output": answer})
    return answer

print("\n  [5a] ConversationBufferMemory ")
buffer_mem = ConversationBufferMemory(
    return_messages=False, input_key="input", output_key="output", memory_key="history"
)
conversation = [
    "ما هي طرق الدفع المتاحه ",
    "ما هي اعدادهم",
]
print("\n  💬  conversion with RAG (Buffer Memory):")
for turn, q in enumerate(conversation, 1):
    answer = conversational_rag_query(q, buffer_mem)
    print(f"  [{turn}] 👤 {q}")
    print(f"       🤖 {answer[:80]}...")
    print()
print(f"  📝 Buffer memory: {len(buffer_mem.chat_memory)}  saved message")
# ── 5b: Window Memory ──────────────── #
print("\n  [5b] ConversationBufferWindowMemory — نافذة K=2")
window_mem = ConversationBufferWindowMemory(k=2)
long_conversation = [
    "ما هو  طرق التواصل مع فريق الدعم ",
    "اعطني مثال عليها",
]
for q in long_conversation:
    conversational_rag_query(q, window_mem)
print(f"  📝 Window memory (k=2): {len(window_mem.chat_memory)} رسائل محفوظة (آخر 2)")
if window_mem.chat_memory:
    print(f"  last  quesion: {window_mem.chat_memory[-1].get('input', '')}")

Source: memory/memory_module_docs.md

Table of Contents

1. Overview

Why It Exists

Real-World Use Cases

2. System Architecture

Pipeline Overview

Component Responsibilities

Design Philosophy

3. Core Concepts

3.1 Memory Types (MemoryType)

3.2 MemoryEntry

3.3 Importance & Decay

3.4 Semantic Search

3.5 Multi-Tenancy

3.6 Memory Selection (Composite Scoring)

3.7 Context Assembly

4. Quick Start Guide

Installation Dependencies

Minimal Working Example

Get → Fallback → Put Pattern

5. Public API Reference

5.1 AIMemoryManager

Constructor

save_interaction

build_context

build_prompt

run_maintenance

grant / revoke

stats

clear_tenant

Async API

5.2 MemoryConfig

Environment Variable Bootstrap

to_dict() — Serialisation

5.3 Conversation Memory Classes

ConversationBufferMemory

ConversationBufferWindowMemory

ConversationSummaryMemory

ConversationEntityMemory

BaseMemory Interface

5.4 SemanticMemory

5.5 LongTermMemory (Storage)

5.6 MemorySelector

5.7 MemoryCompressor

5.8 ForgettingMechanism

5.9 ContextBuilder

build

build_prompt

Document

5.10 UserProfileManager

6. Configuration System

Full MemoryConfig with All Defaults

SelectionConfig

ForgettingConfig

Environment Variable Reference

7. Security Model

7.1 PII Detection and Masking

7.2 Encryption

7.3 Role-Based Access Control

7.4 Tenant Isolation

8. Storage Backends

ShortTermMemory (in-process deque)

WorkingMemory (in-process dict)

LongTermMemory (SQLite)

SemanticMemory (FAISS / NumPy)

Backend Comparison

9. Observability & Metrics

Per-Tenant Stats

Maintenance Reports

Forgetting Diagnostics

Selector Scoring Breakdown

User Profile Analytics

10. Edge Cases & Failure Handling

Embedding Service Failure

FAISS Unavailable

SQLite / LTM Failure

Quota / Capacity Exceeded

Corrupted or Missing Entries

Missing Tenant

Async/Sync Mismatch

3.1 Memory Types (`MemoryType`)

5.1 `AIMemoryManager`

`save_interaction`

`build_context`

`build_prompt`

`run_maintenance`

`grant` / `revoke`

`stats`

`clear_tenant`

5.2 `MemoryConfig`

`to_dict()` — Serialisation

`ConversationBufferMemory`

`ConversationBufferWindowMemory`

`ConversationSummaryMemory`

`ConversationEntityMemory`

`BaseMemory` Interface

5.4 `SemanticMemory`

5.5 `LongTermMemory` (Storage)

5.6 `MemorySelector`

5.7 `MemoryCompressor`

5.8 `ForgettingMechanism`

5.9 `ContextBuilder`

`build`

`build_prompt`

`Document`

5.10 `UserProfileManager`

Full `MemoryConfig` with All Defaults

`SelectionConfig`

`ForgettingConfig`

`ShortTermMemory` (in-process `deque`)

`WorkingMemory` (in-process `dict`)

`LongTermMemory` (SQLite)

`SemanticMemory` (FAISS / NumPy)

`ForgettingMechanism.apply` on Empty List

Encryption: `cryptography` Package Missing