Fennec Logo Fennec
Fennec Memory memory/memory_module_docs.md

Fennec Memory Module

Production-Grade Intelligent Memory System for LLM Applications


Table of Contents

  1. Overview
  2. System Architecture
  3. Core Concepts
  4. Quick Start Guide
  5. Public API Reference
  6. Configuration System
  7. Security Model
  8. Storage Backends
  9. Observability & Metrics
  10. Edge Cases & Failure Handling
  11. Advanced Usage

1. Overview

The Fennec Memory Module is a multi-layer intelligent memory system designed to give LLM applications and AI agents persistent, semantically searchable, and privacy-aware memory across conversations and sessions.

It mirrors human cognitive memory architecture — short-term recall, active working memory, and long-term persistence — and ties them together with vector similarity search, biologically-inspired forgetting, LLM-assisted compression, and per-tenant isolation.

Why It Exists

Standard LLM applications are stateless. Each request starts from zero, forcing developers to manually manage conversation history, hit context limits, and lose important context between sessions. The Fennec Memory Module solves this by:

  • Automatically promoting important interactions to long-term persistent storage
  • Retrieving the most relevant memories for any query using semantic search
  • Keeping context within token budgets via intelligent selection and compression
  • Isolating memory per tenant in multi-user systems
  • Scrubbing PII before storage with an optional encryption layer

Real-World Use Cases

Use Case How the Module Helps
Multi-session chatbots Remembers past preferences, decisions, and context across sessions
AI agents & planners Working memory holds the active reasoning context for the current turn
RAG pipelines Combines retrieved documents with relevant personal memories in a single context
Multi-tenant SaaS Complete tenant isolation; each organization's memory is fully private
Personalized assistants User profiles adapt over time to improve retrieval and response quality
Long conversation summarization LLM-based compressor condenses hundreds of turns into compact summaries

2. System Architecture

Pipeline Overview

┌──────────────────────────────────────────────────────────────────┐
│                          WRITE PATH                              │
│                                                                  │
│  user_input + assistant_output                                   │
│        │                                                         │
│        ▼                                                         │
│  [SensitiveDataMasker]  ← optional PII redaction                │
│        │                                                         │
│        ▼                                                         │
│  [MemoryEncryptor]      ← optional Fernet encryption            │
│        │                                                         │
│        ├──────────────────────────────────────────────────────┐  │
│        ▼                                                      │  │
│  [ShortTermMemory]      ← in-memory sliding window            │  │
│        │                                                      │  │
│        ├─── background thread ──► [SemanticMemory]            │  │
│        │                          FAISS / NumPy vector index  │  │
│        │                                                      │  │
│        └─── if importance ≥ threshold ──► [LongTermMemory]    │  │
│                                           SQLite persistence  │  │
│                                                               │  │
│  [UserProfileManager]   ← updated with topics & importance   ◄─┘ │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│                          READ PATH                               │
│                                                                  │
│  query string                                                    │
│        │                                                         │
│        ├──► [SemanticMemory.search]     cosine similarity top-K │
│        ├──► [LongTermMemory.get_top_*]  importance-ranked       │
│        └──► [ShortTermMemory.get_all]   recent window           │
│                                                                  │
│  ───── merge + deduplicate ─────                                 │
│        │                                                         │
│        ▼                                                         │
│  [MemorySelector]   ← composite score: relevance × recency × importance
│        │                                                         │
│        ▼                                                         │
│  [WorkingMemory]    ← token-budgeted active context             │
│        │                                                         │
│        ▼                                                         │
│  [ContextBuilder]   ← assembles: profile + memories + docs + history
│        │                                                         │
│        ▼                                                         │
│  BuiltContext.full_text  → inject into LLM prompt               │
└──────────────────────────────────────────────────────────────────┘

Component Responsibilities

Component Responsibility
ShortTermMemory Sliding deque of the most recent N interactions. Fast, in-memory, FIFO eviction.
WorkingMemory Active-context store for the current reasoning turn. Rebuilt each call by MemorySelector. Enforces a token budget.
LongTermMemory SQLite-backed persistent store. Promotes high-importance entries across sessions. Supports global decay.
SemanticMemory FAISS (or NumPy fallback) vector index. Embeds content with sentence-transformers and retrieves by cosine similarity.
MemorySelector Scores every candidate memory on three axes (relevance, recency, importance) and returns the top-K within a token budget.
MemoryCompressor Noise removal, near-duplicate merging (cosine threshold), and LLM-based summarisation. Runs during maintenance.
ForgettingMechanism Applies Ebbinghaus-inspired exponential decay modulated by access frequency. Marks low-importance entries for deletion.
ContextBuilder Assembles the final context string from profile, memories, RAG documents, and conversation history. Enforces per-section token budgets.
UserProfileManager Tracks per-user preferences, topic frequency, and a memory_retention_boost factor. Persists profiles as JSON.
SensitiveDataMasker Regex-based PII redaction (email, credit card, SSN, phone, IBAN, IP, JWT, API keys, passwords).
MemoryEncryptor Fernet (AES-128-CBC + HMAC-SHA256) symmetric encryption. Degrades to base64 if cryptography is absent.
AccessController RBAC system with built-in reader, writer, owner, admin roles per (user_id, tenant_id) pair.

Design Philosophy

  • Layered degradation. Every component falls back gracefully: FAISS → NumPy, real Fernet → base64, sentence-transformers → deterministic stub embeddings. The system never hard-fails at import time.
  • Async-friendly. All core methods have async wrappers (asave_interaction, abuild_context, arun_maintenance) backed by a thread pool, so they integrate cleanly into asyncio applications.
  • Tenant-first. Memory is namespaced at every layer. Cross-tenant data leakage is architecturally prevented.
  • Importance-driven. A single importance float (0–1) controls promotion to LTM, ordering in context, and eviction priority. It decays over time and is boosted by access.

3. Core Concepts

3.1 Memory Types (MemoryType)

from memory.core import MemoryType
Value String Aliases Description
MemoryType.SHORT_TERM "short_term", "short", "st" Temporary; limited to window_size recent entries.
MemoryType.LONG_TERM "long_term", "long", "lt" Persistent across sessions via SQLite.
MemoryType.WORKING "working", "work", "wm" Active-context; rebuilt each turn.
MemoryType.EPISODIC "episodic", "episode", "ep" Event-bound memories with temporal context.
MemoryType.SEMANTIC "semantic", "sem" General facts without temporal binding.
MemoryType.PROCEDURAL "procedural", "proc" Skills and repeatable procedures.

3.2 MemoryEntry

Every piece of stored information is a MemoryEntry dataclass.

@dataclass
class MemoryEntry:
    content: Any                    # The stored data (str, dict, or any serialisable type)
    timestamp: float                # Unix creation time
    memory_type: MemoryType
    importance: float = 0.5         # 0.0–1.0; controls promotion, ordering, eviction
    access_count: int = 0           # Incremented on every read
    tags: List[str] = []
    metadata: Dict[str, Any] = {}
    embedding: Optional[List[float]] = None  # Set by SemanticMemory
    last_access: float = time.time()
    decay_factor: float = 1.0       # Multiplied into importance on decay cycles
    original_importance: Optional[float] = None  # Preserved from creation for audit/reset

Key computed properties:

Property Type Description
entry.id str MD5-based 16-char hex ID derived from content + timestamp + type.
entry.effective_importance float importance × decay_factor. Used for all ranking decisions.
entry.priority MemoryPriority MemoryPriority enum derived from effective_importance (see table below).
entry.age_seconds float Floating-point age since creation in seconds.
entry.age_days float Floating-point age since creation in days.
entry.time_since_access_seconds float Elapsed seconds since the last entry.access() call.
entry.has_embedding bool True if a vector embedding has been computed.

MemoryPriority Thresholds

effective_importance MemoryPriority
≥ 0.9 CRITICAL
≥ 0.7 HIGH
≥ 0.5 MEDIUM
≥ 0.3 LOW
< 0.3 MINIMAL

Key instance methods:

entry.access()                          # Increments access_count, updates last_access,
                                        # and applies a small importance boost ∝ (1 − importance).
entry.apply_decay(decay_rate: float = 0.1)   # Manually apply one decay cycle in place.
entry.add_tag(tag: str)                 # Append a tag if not already present.
entry.has_tag(tag: str) -> bool         # Check for tag membership.
entry.to_dict(include_embedding: bool = False) -> dict  # Serialise to plain dict.
entry.to_json() -> str                  # Serialise to JSON string.

3.3 Importance & Decay

Each MemoryEntry starts with a user-supplied importance value (0–1). Over time:

  1. ForgettingMechanism computes an effective decay rate: base_decay_rate / (1 + recency_boost_per_access × access_count). More accesses → slower decay.
  2. It applies exponential decay: I(t) = I₀ × exp(−λ × hours_since_last_access).
  3. Entries whose effective_importance falls below deletion_threshold are marked for removal.

This mirrors the Ebbinghaus forgetting curve modulated by spaced repetition.

SemanticMemory uses sentence-transformers (default model: all-MiniLM-L6-v2) to convert text to L2-normalised float32 vectors. All vectors are stored in a FAISS IndexFlatIP (inner-product index, equivalent to cosine similarity on normalised vectors). When FAISS is unavailable, a pure-NumPy brute-force fallback is used transparently.

Retrieval: SemanticMemory.search(query, k) embeds the query and returns the top-K entries with their cosine similarity scores.

3.5 Multi-Tenancy

Every memory layer accepts a tenant_id. Entries are namespaced so a query for tenant_id="acme" can never surface entries belonging to tenant_id="startup". The AIMemoryManager maintains a _TenantBundle per tenant containing independent STM, WorkingMemory, LTM, and SemanticMemory instances.

3.6 Memory Selection (Composite Scoring)

MemorySelector scores each candidate on three axes and combines them linearly:

composite = w_relevance × cosine_sim
          + w_recency   × exp(−λ × hours_old)
          + w_importance × entry.effective_importance

Default weights: relevance=0.50, recency=0.20, importance=0.30. All weights must sum to 1.0. Only entries above min_composite_score (default 0.10) are selected, and the set is capped by top_k and a token budget.

3.7 Context Assembly

ContextBuilder assembles a structured context string from four prioritised sections:

Section Default Token Fraction Content
user_profile 5% Compact user preferences and interests
memories 30% Selected MemoryEntry objects formatted as text
documents 45% RAG-retrieved Document chunks
history 20% Recent conversation turns from STM

Each section is individually token-budgeted and truncated (never dropped) if it exceeds its allocation.


4. Quick Start Guide

Installation Dependencies

pip install fennec-memory
pip install sentence-transformers  # embeddings
pip install faiss-cpu              # fast vector search (optional, falls back to NumPy)
pip install cryptography           # Fernet encryption (optional)

Minimal Working Example

from fennec_memory.memory import AIMemoryManager, MemoryConfig

# 1. Configure
config = MemoryConfig(
    persistence_path="./my_memory",
    importance_threshold=0.6,
    max_tokens=4000,
)

# 2. Initialise
mgr = AIMemoryManager(config=config)

# 3. Save an interaction
mgr.save_interaction(
    user_id="alice",
    tenant_id="acme",
    user_input="How do Python decorators work?",
    assistant_output="Decorators are higher-order functions that wrap another function.",
    importance=0.8,
    topics=["python"],
)

# 4. Build context for the next query
ctx = mgr.build_context(
    user_id="alice",
    tenant_id="acme",
    query="Show me a decorator example.",
)

# 5. Inject into your LLM
prompt = ctx.full_text + "\n\nUser: Show me a decorator example.\nAssistant:"

Get → Fallback → Put Pattern

def get_response(mgr, user_id, tenant_id, query):
    # Build memory-enriched context
    ctx = mgr.build_context(user_id, tenant_id, query)
    
    # Call your LLM
    response = my_llm.generate(ctx.full_text + "\n\n" + query)
    
    # Store the interaction back
    mgr.save_interaction(
        user_id=user_id,
        tenant_id=tenant_id,
        user_input=query,
        assistant_output=response,
        importance=0.7,
    )
    return response

5. Public API Reference

5.1 AIMemoryManager

The central orchestrator. Instantiate once per application and share across request handlers.

from fennec_memory.memory import AIMemoryManager

Constructor

AIMemoryManager(
    config: Optional[MemoryConfig] = None,
    llm: Optional[LLMProtocol] = None,
    *,
    encryption_key: Optional[str] = None,
    enable_privacy_masking: bool = True,
    max_workers: int = 4,
)
Parameter Type Required Description
config MemoryConfig No System configuration. If None, defaults are used.
llm LLMProtocol No Any object with .generate(prompt: str, max_tokens: int) -> str. Required for LLM-based compression and summarisation.
encryption_key str No Secret key for Fernet symmetric encryption of stored content. If None, content is stored in plaintext.
enable_privacy_masking bool No When True (default), PII is redacted before storage.
max_workers int No Thread pool size for background semantic indexing and async wrappers (default: 4).

save_interaction

Persists one conversation turn through all memory layers.

def save_interaction(
    self,
    user_id: str,
    tenant_id: str,
    user_input: str,
    assistant_output: str,
    *,
    importance: float = 0.5,
    tags: Optional[List[str]] = None,
    metadata: Optional[Dict[str, Any]] = None,
    topics: Optional[List[str]] = None,
) -> MemoryEntry
Parameter Type Required Description
user_id str Yes Identifies the user within the tenant.
tenant_id str Yes Tenant namespace for isolation.
user_input str Yes The user's message text.
assistant_output str Yes The assistant's response text.
importance float No Initial importance score 0–1 (default: 0.5). Values ≥ config.importance_threshold trigger LTM promotion.
tags List[str] No Searchable tags attached to the entry.
metadata Dict[str, Any] No Arbitrary key-value metadata (e.g., session ID, request ID).
topics List[str] No Topic labels used to update the user's profile topic frequency map.

Returns: MemoryEntry — the stored entry with its generated id.

Internal steps:

  1. Access check (write permission for user_id on tenant_id).
  2. PII masking of both user_input and assistant_output.
  3. Optional Fernet encryption.
  4. User profile memory_retention_boost applied to importance.
  5. Entry added to ShortTermMemory.
  6. Semantic embedding and indexing submitted to background thread pool.
  7. If importance ≥ config.importance_threshold, entry promoted to LongTermMemory.
  8. UserProfileManager updated with topics and importance.
entry = mgr.save_interaction(
    user_id="alice",
    tenant_id="acme",
    user_input="What is RAG?",
    assistant_output="RAG is Retrieval-Augmented Generation ...",
    importance=0.9,
    tags=["AI", "RAG"],
    topics=["AI", "LLM"],
    metadata={"session_id": "session_001"},
)
print(entry.id)  # e.g. "3f2a1b9c7e4d5a6f"

build_context

Retrieves relevant memories and assembles a BuiltContext ready to inject into an LLM prompt.

def build_context(
    self,
    user_id: str,
    tenant_id: str,
    query: str,
    *,
    documents: Optional[List[Document]] = None,
    top_k_semantic: int = 8,
    top_k_ltm: int = 10,
    include_profile: bool = True,
) -> BuiltContext
Parameter Type Required Description
user_id str Yes User whose profile and access rights are applied.
tenant_id str Yes Tenant to query memories from.
query str Yes The current user question or task description.
documents List[Document] No External RAG-retrieved documents to include in context.
top_k_semantic int No Number of results from semantic search (default: 8).
top_k_ltm int No Number of top-importance entries from LTM (default: 10).
include_profile bool No Whether to inject the user profile section (default: True).

Returns: BuiltContext with fields:

Field Type Description
full_text str The complete assembled context string. Inject directly into your prompt.
token_estimate int Estimated token count of full_text.
sections Dict[str, str] Individual sections: user_profile, memories, documents, history.
truncated bool True if any section was truncated to fit its budget.
document_count int Number of documents included.
memory_count int Number of memory entries included.

Internal steps:

  1. Semantic search over STM + LTM embeddings.
  2. Top-K retrieval from LTM by importance.
  3. Merge and deduplicate all candidates.
  4. MemorySelector scores and filters by composite score and token budget.
  5. Selected entries loaded into WorkingMemory.
  6. User profile text retrieved.
  7. ContextBuilder assembles the final BuiltContext.
from fennec_memory.memory import Document

ctx = mgr.build_context(
    user_id="alice",
    tenant_id="acme",
    query="How do I implement a caching layer?",
    documents=[
        Document(page_content="Redis supports LRU eviction ...", metadata={"source":"redis_docs"}, doc_id="doc_001")
    ],
    top_k_semantic=5,
)

# Use ctx.full_text as the LLM context block
print(ctx.full_text)
print(f"Memories used: {ctx.memory_count}, truncated: {ctx.truncated}")

build_prompt

Convenience wrapper that returns a complete, ready-to-send prompt string.

def build_prompt(
    self,
    user_id: str,
    tenant_id: str,
    query: str,
    *,
    documents: Optional[List[Document]] = None,
    system_instruction: str = "You are a helpful, knowledgeable assistant.",
) -> str
prompt = mgr.build_prompt(
    user_id="alice",
    tenant_id="acme",
    query="Explain async/await in Python.",
    system_instruction="You are an expert Python engineer.",
)
response = my_llm.generate(prompt)

run_maintenance

Executes a full maintenance pass for a tenant: decay, eviction, compression, forgetting.

def run_maintenance(self, tenant_id: str) -> Dict[str, Any]

Returns a report dict with keys:

Key Description
stm_evicted STM entries removed for falling below min_importance.
ltm_decay_updated LTM entries that had their importance updated.
ltm_deleted LTM entries deleted for falling below min_importance.
working_forgotten WorkingMemory entries marked forgotten.
ltm_compression Dict with merges and noise_removed counts (only present when LTM is >80% full).

Schedule this to run periodically (e.g., once per hour via APScheduler or a background task):

import asyncio

async def maintenance_loop(mgr, tenant_id, interval_seconds=3600):
    while True:
        report = await mgr.arun_maintenance(tenant_id)
        print(f"Maintenance: {report}")
        await asyncio.sleep(interval_seconds)

grant / revoke

Manage user access roles within a tenant.

mgr.grant(user_id: str, tenant_id: str, role: str = "writer") -> None
mgr.revoke(user_id: str, tenant_id: str, role: str) -> None

Built-in roles:

Role Permissions
reader READ
writer READ, WRITE
owner READ, WRITE, DELETE
admin READ, WRITE, DELETE, ADMIN
mgr.grant("alice", "acme", role="owner")
mgr.grant("bob",   "acme", role="writer")
mgr.grant("guest", "acme", role="reader")

Default behaviour: If no roles have been assigned for a given tenant, all write operations are permitted by default. The moment grant() is called for any user in a tenant, strict enforcement activates for all users in that tenant. Explicitly assign roles to all users before making the first grant() call if you require fine-grained control.


stats

Returns a comprehensive statistics snapshot for a tenant.

def stats(self, tenant_id: str) -> Dict[str, Any]
s = mgr.stats("acme")
# s["stm"]     → ShortTermMemory stats
# s["working"] → WorkingMemory stats
# s["ltm"]     → LongTermMemory stats
# s["semantic"]→ SemanticMemory stats
# s["profile_manager"] → aggregate user profile stats
# s["encryption_active"] → bool
# s["privacy_masking_active"] → bool

clear_tenant

Wipes all memory for a tenant across every layer. Irreversible.

def clear_tenant(self, tenant_id: str) -> None

Async API

All three primary operations have async counterparts:

entry = await mgr.asave_interaction(user_id, tenant_id, user_input, assistant_output, **kwargs)
ctx   = await mgr.abuild_context(user_id, tenant_id, query, **kwargs)
report = await mgr.arun_maintenance(tenant_id)

These run the synchronous methods in the internal ThreadPoolExecutor and are safe to await from any async context.

Note: The async methods are not native coroutines — they dispatch to a ThreadPoolExecutor internally. Blocking I/O inside the thread pool may become a bottleneck if max_workers is low under high concurrency. Increase max_workers accordingly.


5.2 MemoryConfig

The top-level configuration dataclass.

from fennec_memory.memory import MemoryConfig

config = MemoryConfig(
    max_short_term=1000,
    max_long_term=1000,
    max_working=1000,
    importance_threshold=0.7,
    similarity_threshold=0.85,
    retrieval_limit=10,
    embedding_model="all-MiniLM-L6-v2",
    embedding_batch_size=32,
    embedding_cache_size=1000,
    normalize_text=False,
    preserve_case=True,
    enable_persistence=True,
    persistence_path="./memory_storage",
    auto_save_interval=300,
    enable_decay=True,
    decay_rate=0.1,
    min_importance=0.1,
    enable_consolidation=True,
    consolidation_interval=3600,
    max_tokens=2000,
    window_size=5,
    log_level="INFO",
    enable_stats=True,
)

Full parameter reference:

Parameter Type Default Description
max_short_term int 1000 Maximum entries in ShortTermMemory (sliding window).
max_long_term int 1000 Maximum entries tracked in LTM (database has no hard cap).
max_working int 1000 Maximum entries in WorkingMemory per turn.
max_episodic int 1000 Capacity hint for episodic memory.
importance_threshold float 0.7 Minimum importance for LTM promotion. Must be 0–1.
similarity_threshold float 0.85 Cosine similarity threshold for duplicate detection.
retrieval_limit int 10 Default top_k for MemorySelector.
embedding_model str "all-MiniLM-L6-v2" sentence-transformers model name.
embedding_batch_size int 32 Batch size for bulk embedding generation.
embedding_cache_size int 1000 LRU cache size for the embedding model.
normalize_text bool False Lowercase text before storage.
preserve_case bool True Keep original casing.
enable_persistence bool True Enable SQLite LTM and JSON profile persistence.
persistence_path str "./memory_storage" Base directory for all on-disk data.
auto_save_interval int 300 Auto-save interval in seconds (used by profile manager).
enable_decay bool True Enable time-based importance decay.
decay_rate float 0.1 Base importance loss per day (at zero access count).
min_importance float 0.1 Entries below this floor are evicted during maintenance.
enable_consolidation bool True Enable periodic memory consolidation.
consolidation_interval int 3600 Consolidation interval in seconds.
max_tokens int 2000 Token budget for ContextBuilder and WorkingMemory.
window_size int 5 ShortTermMemory sliding window size (number of turns).
log_level str "INFO" Python logging level.

Environment Variable Bootstrap

config = MemoryConfig.from_env()

Supported environment variables:

Variable Corresponds To
MEMORY_MAX_SHORT_TERM max_short_term
MEMORY_MAX_LONG_TERM max_long_term
MEMORY_EMBEDDING_MODEL embedding_model
MEMORY_PERSISTENCE_PATH persistence_path
MEMORY_ENABLE_PERSISTENCE enable_persistence (any truthy string)

to_dict() — Serialisation

d = config.to_dict()

Converts the full MemoryConfig to a plain dictionary. Useful for logging, auditing, or persisting configuration state.


5.3 Conversation Memory Classes

These are lightweight, LangChain-compatible memory classes for direct conversational use. They implement BaseMemory with save_context / load_memory_variables / clear, and expose async variants (asave_context, aload_memory_variables).

ConversationBufferMemory

Stores the full conversation history without any truncation.

from fennec_memory.memory import ConversationBufferMemory

mem = ConversationBufferMemory(
    return_messages=False,   # True → returns List[dict], False → returns text string
    input_key="input",
    output_key="output",
    memory_key="history",
)

mem.save_context({"input": "Hello"}, {"output": "Hi there!"})
variables = mem.load_memory_variables({})
# variables["history"] → "user: Hello\n assistant: Hi there!"

Best for: short conversations where full context fits in the LLM's window.


ConversationBufferWindowMemory

Keeps only the last k interactions.

from fennec_memory.memory import ConversationBufferWindowMemory

mem = ConversationBufferWindowMemory(k=5)  # remembers last 5 turns
info = mem.get_window_info()
# {"current_size": 3, "max_size": 5, "available_slots": 2, "is_full": False}

Best for: long conversations where only recent context matters.


ConversationSummaryMemory

Automatically summarises old turns using an LLM when the estimated token count exceeds max_token_limit.

from fennec_memory.memory import ConversationSummaryMemory

mem = ConversationSummaryMemory(
    llm=my_llm,
    max_token_limit=2000,
)

mem.save_context({"input": "..."}, {"output": "..."})  # triggers summarisation when needed
summary = mem.get_summary()       # retrieve the current running summary
mem.force_summarize()             # trigger summarisation immediately

The LLM is called with a summary_prompt_template that includes the previous summary and the new turns. On failure, the last 5 turns are preserved as a fallback.

Best for: very long conversations that must stay within a strict token budget.


ConversationEntityMemory

Tracks named entities (people, places, organisations, products) mentioned in conversation and provides relevant entity context on retrieval.

from fennec_memory.memory import ConversationEntityMemory

mem = ConversationEntityMemory(
    llm=my_llm,         # used for entity extraction; falls back to stanza NLP
    memory_key="entity_info",
    lang="en",          # stanza language code (used in fallback)
)

mem.save_context(
    {"input": "I was talking to Dr. Smith at OpenAI."},
    {"output": "That's interesting."},
)

entities = mem.list_entities()           # ["Dr. Smith", "OpenAI"]
info = mem.get_entity_info("OpenAI")     # {"name": "OpenAI", "contexts_count": 1, ...}

Entity extraction uses the LLM if available, otherwise falls back to Stanza NLP for NER.

Best for: conversations involving multiple referenced people, organisations, or places.


BaseMemory Interface

All memory classes implement:

class BaseMemory(ABC):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, Any]) -> None: ...
    def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]: ...
    def clear(self) -> None: ...
    def get_memory_stats(self) -> Dict[str, Any]: ...

    # Async variants (via asyncio.to_thread)
    async def asave_context(self, inputs, outputs): ...
    async def aload_memory_variables(self, inputs): ...

5.4 SemanticMemory

FAISS-backed (or NumPy-fallback) vector store for similarity search.

from fennec_memory.memory import SemanticMemory

sem = SemanticMemory(
    model_name="all-MiniLM-L6-v2",
    cache_size=1000,
    tenant_id="acme",
)
Method Signature Description
add (entry: MemoryEntry) -> None Embed and index one entry. Sets entry.embedding in place.
add_batch (entries: List[MemoryEntry]) -> None Batch embed for performance.
remove (entry_id: str) -> None Remove entry from index and store.
search (query: str, k: int = 5, min_score: float = 0.0) -> List[Tuple[MemoryEntry, float]] Return top-K entries with cosine scores. Calls entry.access() on each result.
search_entries (query: str, k: int = 5, min_score: float = 0.0) -> List[MemoryEntry] Returns entries only (no scores).
clear () -> None Wipe the entire index and store.
stats () -> Dict[str, Any] Returns backend (faiss or numpy), model name, entry count, tenant ID.
results = sem.search("Python async patterns", k=5, min_score=0.3)
for entry, score in results:
    print(f"Score: {score:.3f}  Content: {str(entry.content)[:60]}")

5.5 LongTermMemory (Storage)

SQLite-backed persistent store. One database file per tenant, located at {persistence_path}/{tenant_id}_ltm.db.

from fennec_memory.memory import LongTermMemory

ltm = LongTermMemory(
    db_path="./memory_storage/acme_ltm.db",
    tenant_id="acme",
)
Method Signature Description
store (entry: MemoryEntry) -> None Upsert one entry (INSERT OR UPDATE on conflict).
store_many (entries: List[MemoryEntry]) -> None Batch upsert for efficiency.
get (entry_id: str) -> Optional[MemoryEntry] Fetch by ID; calls entry.access() and persists updated access stats.
get_top_by_importance (limit: int = 20, min_importance: float = 0.0) -> List[MemoryEntry] Ordered by importance DESC.
get_recent (limit: int = 20) -> List[MemoryEntry] Ordered by timestamp DESC.
count () -> int Total entry count for this tenant.
delete (entry_id: str) -> None Remove a single entry.
delete_below_importance (threshold: float) -> int Bulk delete; returns row count.
apply_global_decay (decay_rate: float = 0.1) -> int Apply time-based decay to all entries; returns updated count.
clear () -> None Delete all entries for this tenant.
stats () -> Dict[str, Any] Returns db_path, tenant_id, total_entries.

The SQLite schema includes indices on (tenant_id), (tenant_id, importance DESC), and (tenant_id, timestamp DESC) for efficient queries. All writes are protected by a threading.Lock.


5.6 MemorySelector

Scores and ranks candidate memories for context injection.

from fennec_memory.memory import MemorySelector, SelectionConfig

config = SelectionConfig(
    relevance_weight=0.50,
    recency_weight=0.20,
    importance_weight=0.30,
    recency_half_life_hours=24.0,
    min_composite_score=0.10,
    top_k=10,
    token_budget=3000,
)
selector = MemorySelector(config=config)

Constraint: relevance_weight + recency_weight + importance_weight must equal 1.0 (±0.1%).

Method Signature Description
select (query, candidates, similarity_scores=None) -> List[ScoredMemory] Return scored and filtered entries.
select_entries (query, candidates, similarity_scores=None) -> List[MemoryEntry] Same as select but returns unwrapped entries.
explain (query, candidates, similarity_scores=None) -> List[Dict] Human-readable scoring breakdown (useful for debugging).
explanation = selector.explain(
    query="Python async patterns",
    candidates=stm.get_all(),
)
for row in explanation:
    print(f"id={row['id']}  composite={row['composite']:.3f}  "
          f"relevance={row['relevance']:.3f}  recency={row['recency']:.3f}")

ScoredMemory fields: entry, relevance_score, recency_score, importance_score, composite_score.


5.7 MemoryCompressor

LLM-based noise removal, duplicate merging, and summarisation.

from fennec_memory.memory import MemoryCompressor

compressor = MemoryCompressor(
    llm=my_llm,
    merge_similarity_threshold=0.92,
    tenant_id="acme",
)

Note: merge_duplicates and compress_batch (when merge=True) require entries to have a populated embedding field. LLM-based summarisation requires llm to be supplied in the constructor.

Method Signature Description
remove_noise (entries) -> Tuple[List[MemoryEntry], int] Drop entries with fewer than 10 chars, fewer than 3 words, or <30% alphanumeric content. Returns (cleaned, n_removed).
merge_duplicates (entries) -> Tuple[List[MemoryEntry], int] Merge entry pairs with cosine similarity ≥ merge_similarity_threshold. Returns (merged_list, n_merges).
summarise (entries, max_tokens=256, target_type=LONG_TERM) -> Optional[MemoryEntry] LLM-summarise a list into one compact entry. Falls back to concatenation if LLM is unavailable.
compress_batch (entries, *, remove_noise=True, merge=True, summarise=False, summarise_threshold=20) -> Dict Full pipeline. Returns dict with keys entries, noise_removed, merges, summarised.
result = compressor.compress_batch(
    candidates,
    remove_noise=True,
    merge=True,
    summarise=True,
    summarise_threshold=5,
)
print(f"Reduced {len(candidates)}{len(result['entries'])} entries")
print(f"Noise removed: {result['noise_removed']}, Merges: {result['merges']}")

5.8 ForgettingMechanism

Biologically-inspired forgetting based on the Ebbinghaus curve.

from fennec_memory.memory import ForgettingMechanism, ForgettingConfig

forgetter = ForgettingMechanism(ForgettingConfig(
    base_decay_rate=0.15,           # importance lost per day at zero accesses
    recency_boost_per_access=0.05,  # each access multiplies effective half-life
    max_recency_boost=2.0,          # cap on recency boost multiplier
    min_importance=0.05,            # minimum floor before removal
    deletion_threshold=0.05,        # entries at or below this are "forgotten"
    high_frequency_threshold=5,     # ≥5 accesses → "well-rehearsed"
    apply_decay_on_read=False,      # if True, decay runs on every load
))
Method Signature Description
apply (entries: List[MemoryEntry]) -> Tuple[List, List] Returns (alive, forgotten). Mutates importance and decay_factor in place.
apply_to_single (entry: MemoryEntry) -> bool Decays one entry; returns True if it should be kept.
score_retention (entry: MemoryEntry) -> float Non-mutating retention score 0–1. Use for previewing at-risk entries.
report (entries: List[MemoryEntry]) -> Dict Diagnostic dict: total, at_risk_count, at_risk_ids.
alive, forgotten = forgetter.apply(entries)
print(f"{len(alive)} entries retained, {len(forgotten)} forgotten")

# Preview without mutating
report = forgetter.report(entries)
print(f"At risk: {report['at_risk_count']} of {report['total']}")

Known behaviour: When entries is an empty list, apply() returns the string "entries is empty" instead of a tuple. Always guard with if entries: before calling. See Section 10 for the recommended pattern.


5.9 ContextBuilder

Assembles a structured, token-budgeted context string from multiple sources.

from fennec_memory.memory import ContextBuilder, Document

builder = ContextBuilder(
    total_token_budget=4000,
    profile_budget_fraction=0.05,
    memory_budget_fraction=0.30,
    document_budget_fraction=0.45,
    history_budget_fraction=0.20,
)

build

def build(
    self,
    *,
    documents: Optional[List[Document]] = None,
    memories: Optional[List[MemoryEntry]] = None,
    history: Optional[List[Dict[str, str]]] = None,
    user_profile_text: Optional[str] = None,
    query: Optional[str] = None,
) -> BuiltContext

Returns a BuiltContext. Sections are added in priority order: user_profile → memories → documents → history. Each section is individually token-capped and marked truncated=True if content was cut.

build_prompt

def build_prompt(
    self,
    query: str,
    *,
    documents=None,
    memories=None,
    history=None,
    user_profile_text=None,
    system_instruction="You are a helpful, knowledgeable assistant.",
) -> str

Returns a fully assembled prompt string including the system instruction, context, and the current query.

Document

@dataclass
class Document:
    page_content: str
    metadata: Dict[str, Any] = field(default_factory=dict)
    doc_id: Optional[str] = None

5.10 UserProfileManager

Manages user profiles across all tenants, with optional JSON persistence.

from fennec_memory.memory import UserProfileManager

manager = UserProfileManager(
    persist_dir="./memory_storage/profiles",
    auto_save=True,
)
Method Signature Description
get_or_create (user_id: str) -> UserProfile Load from disk or create a new default profile.
update_from_interaction (user_id, query, topics=None, importance=0.5) -> UserProfile Record one interaction; updates topic frequency and importance average.
set_preference (user_id, key, value) -> None Set an explicit user preference.
delete (user_id: str) -> bool Remove from memory and disk.
list_users () -> List[str] All known user IDs (in-memory + on-disk).
aggregate_stats () -> Dict Cross-user stats: user count, total interactions, top global topics.

UserProfile key properties and methods:

Member Type Description
user_id str
preferred_language str Default "en".
verbosity str "concise", "medium", or "detailed".
topics_of_interest List[str] Top-10 inferred from topic frequency.
memory_retention_boost float 0–0.5. Added to importance on every save_interaction.
get_preference(key, default=None) Any Retrieve a custom preference value.
set_preference(key, value) None Set a custom preference on the profile object.
to_context_string() str Compact profile text for injection into prompts.
start_session() None Record the start of a new session (increments session counter).
update_retention_boost(boost: float) None Update memory_retention_boost; clamped to [0, 0.5].
profile = manager.get_or_create("alice")
manager.set_preference("alice", "verbosity", "detailed")
manager.set_preference("alice", "preferred_language", "en")

# Inject into prompt
profile_text = profile.to_context_string()

6. Configuration System

Full MemoryConfig with All Defaults

from fennec_memory.memory import MemoryConfig

config = MemoryConfig(
    # Memory layer capacities
    max_short_term=1000,
    max_long_term=1000,
    max_working=1000,
    max_episodic=1000,
    max_semantic=1000,
    max_procedral=1000,

    # Retrieval
    importance_threshold=0.7,    # LTM promotion cutoff
    similarity_threshold=0.85,   # duplicate detection
    retrieval_limit=10,          # default top_k

    # Embeddings
    embedding_model="all-MiniLM-L6-v2",
    embedding_batch_size=32,
    embedding_cache_size=1000,

    # Text processing
    normalize_text=False,
    preserve_case=True,
    remove_duplicates=True,

    # Persistence
    enable_persistence=True,
    persistence_path="./memory_storage",
    auto_save_interval=300,

    # LangChain-style keys
    return_messages=False,
    input_key="input",
    output_key="output",
    memory_key="history",
    max_token_limit=2000,
    window_size=5,

    # Decay
    enable_decay=True,
    decay_rate=0.1,        # 10% per day at zero accesses
    min_importance=0.1,

    # Consolidation
    enable_consolidation=True,
    consolidation_interval=3600,

    # LLM token budget
    max_tokens=2000,

    # Logging
    log_level="INFO",
    enable_stats=True,
)

SelectionConfig

Controls the MemorySelector scoring weights:

from fennec_memory.memory import SelectionConfig

selection_config = SelectionConfig(
    relevance_weight=0.50,         # cosine similarity fraction
    recency_weight=0.20,           # recency exponential decay fraction
    importance_weight=0.30,        # effective_importance fraction
    recency_half_life_hours=24.0,  # score halves every 24 hours
    min_composite_score=0.10,      # minimum score to be included
    top_k=10,                      # maximum entries to select
    token_budget=3000,             # maximum tokens across all selected entries
)

ForgettingConfig

from fennec_memory.memory import ForgettingConfig

forgetting_config = ForgettingConfig(
    base_decay_rate=0.15,
    recency_boost_per_access=0.05,
    max_recency_boost=2.0,
    min_importance=0.05,
    deletion_threshold=0.05,
    high_frequency_threshold=5,
    apply_decay_on_read=False,
)

Environment Variable Reference

export MEMORY_MAX_SHORT_TERM=500
export MEMORY_MAX_LONG_TERM=5000
export MEMORY_EMBEDDING_MODEL="all-MiniLM-L6-v2"
export MEMORY_PERSISTENCE_PATH="/var/data/memory"
export MEMORY_ENABLE_PERSISTENCE="true"

Load with:

config = MemoryConfig.from_env()

7. Security Model

7.1 PII Detection and Masking

SensitiveDataMasker applies regex patterns to redact sensitive data before any storage operation.

Built-in patterns:

Label What It Matches
EMAIL Standard email addresses
CREDIT_CARD 13–16 digit card numbers
SSN US Social Security Numbers (###-##-####)
PHONE_INTL International phone numbers
IBAN International Bank Account Numbers
IP_V4 IPv4 addresses
URL_WITH_AUTH URLs containing credentials (user:pass@host)
JWT JSON Web Tokens (eyJ...)
API_KEY Keys starting with sk-, pk-, rk-, ak-, token-
PASSWORD_KV password=, passwd:, pwd= key-value pairs
from fennec_memory.memory import SensitiveDataMasker

masker = SensitiveDataMasker(
    custom_patterns=[
        ("INTERNAL_ID", r"INT-\d{6}"),  # add your own
    ],
    placeholder_fmt="[{label}]",
)

cleaned, report = masker.mask("Contact me at ceo@company.com, card: 4111111111111111")
# cleaned → "Contact me at [EMAIL], card: [CREDIT_CARD]"
# report  → {"EMAIL": 1, "CREDIT_CARD": 1}

# Check for sensitive data without masking
has_pii = masker.has_sensitive_data("Contact me at ceo@company.com")
# → True

# Recursively mask nested dicts or lists
cleaned_content, report = masker.mask_entry_content({"input": "my SSN is 123-45-6789"})

When enable_privacy_masking=True on AIMemoryManager, both user_input and assistant_output are masked before any storage or encryption step.

7.2 Encryption

MemoryEncryptor uses Fernet symmetric encryption (AES-128-CBC + HMAC-SHA256). The user-supplied secret key is hashed with SHA-256 to derive a 32-byte Fernet key.

In practice, you do not interact with MemoryEncryptor directly — pass encryption_key to the AIMemoryManager constructor and encryption is applied automatically on every save_interaction call.

from fennec_memory.memory import MemoryEncryptor

enc = MemoryEncryptor(secret_key="my-production-secret")

token = enc.encrypt({"input": "sensitive data", "output": "sensitive answer"})
data  = enc.decrypt(token)

print(enc.is_real_encryption)   # True if cryptography package is installed
print(enc.key_fingerprint)      # SHA-256 fingerprint of the key (first 16 hex chars)

If the cryptography package is not installed, the encryptor falls back to base64 encoding (not secure) and logs a warning. In this fallback mode, is_real_encryption is False.

Warning: If the encryption_key is lost, all encrypted entries are permanently unrecoverable.

7.3 Role-Based Access Control

AccessController implements per-tenant RBAC with built-in and custom roles.

from fennec_memory.memory import AccessController, Permission

ac = AccessController()

# Built-in role assignment
ac.assign_role("alice", "acme", "owner")
ac.assign_role("bob",   "acme", "writer")
ac.assign_role("guest", "acme", "reader")

# Custom role
ac.define_role("analyst", {Permission.READ})
ac.assign_role("carol", "acme", "analyst")

# Check and enforce
if ac.has_permission("bob", "acme", Permission.WRITE):
    # proceed
    pass

ac.require("guest", "acme", Permission.DELETE)  # raises PermissionError

# Inspect
perms = ac.list_permissions("alice", "acme")  # {Permission.READ, Permission.WRITE, Permission.DELETE}

# Revoke
ac.revoke_role("bob", "acme", "writer")

7.4 Tenant Isolation

Memory namespacing is enforced at every layer:

  • ShortTermMemory tags each entry with tenant_id in its metadata.
  • LongTermMemory includes tenant_id in all SQL WHERE clauses and indices.
  • SemanticMemory prefixes every vector index ID as "{tenant_id}::{entry_id}".
  • UserProfileManager is tenant-agnostic but user-specific; user data is never cross-contaminated between tenants.

There is no "global" query. You always specify tenant_id explicitly.


8. Storage Backends

ShortTermMemory (in-process deque)

When to use: Always. Present in every deployment as the primary write target.

from fennec_memory.memory import ShortTermMemory

stm = ShortTermMemory(window_size=20, tenant_id="acme")

Backed by collections.deque(maxlen=window_size). Oldest entries are automatically evicted when full (FIFO). Supports:

Method Signature Description
add (content, *, importance=0.5, tags=None, metadata=None) -> MemoryEntry Add one entry; returns the created MemoryEntry.
get_all () -> List[MemoryEntry] All current entries.
get_recent (n: int) -> List[MemoryEntry] Last N entries.
apply_decay (decay_rate: float = 0.1) -> None Apply one decay cycle to all entries in place.
evict_below (min_importance: float) -> List[MemoryEntry] Remove low-importance entries; returns evicted list.
clear () -> None Empty the window.

No persistence. Lost on process restart. Use LTM for durability.


WorkingMemory (in-process dict)

When to use: Always. Manages the active-context entries for the current reasoning turn.

from fennec_memory.memory import WorkingMemory

wm = WorkingMemory(capacity=10, token_budget=3000, tenant_id="acme")
Method Signature Description
load (entries: List[MemoryEntry]) -> None Replace current contents with new entries, sorted by importance, respecting token_budget.
add (content, *, importance=0.6, tags=None, metadata=None) -> MemoryEntry Add one entry; evicts the lowest-importance entry if capacity is exceeded.
remove (entry_id: str) -> Optional[MemoryEntry] Remove and return an entry by ID.
get_all () -> List[MemoryEntry] All entries, sorted descending by effective_importance.
get_as_text () -> str Serialise all entries to a newline-delimited text string, ready for prompt injection.
clear () -> None Empty the working memory.

LongTermMemory (SQLite)

When to use: Any deployment that needs memory to survive process restarts or scale across sessions. Default persistence layer.

One .db file per tenant, stored at {persistence_path}/{tenant_id}_ltm.db. Thread-safe via threading.Lock. Supports full CRUD, bulk inserts, importance filtering, timestamp ordering, and global decay.

Suitable for single-instance deployments, development, and moderate production workloads.


SemanticMemory (FAISS / NumPy)

When to use: Always. Provides the semantic search capability that distinguishes this system from simple history-based retrieval.

FAISS IndexFlatIP is the default when faiss-cpu (or faiss-gpu) is installed. Automatically falls back to a pure-NumPy brute-force O(n) cosine search when FAISS is absent.

FAISS is recommended for production with more than ~1,000 entries per tenant. NumPy fallback is suitable for testing and small datasets.


Backend Comparison

Aspect ShortTermMemory LongTermMemory SemanticMemory
Persistence None (in-memory) SQLite on disk In-memory (rebuilt on restart)
Capacity window_size entries Unbounded Unbounded
Query type Recency (FIFO) Importance / timestamp Semantic similarity
Concurrency Single-threaded Thread-safe (Lock) Single-threaded
Startup cost None Schema init Model load on first embed
Recommended use Recent context Long-lived facts Semantic retrieval

For production Redis-backed LTM, implement the LongTermMemory interface (same store, get, get_top_by_importance contract) and swap the instance in the _TenantBundle.


9. Observability & Metrics

Per-Tenant Stats

s = mgr.stats("acme")

Returns:

{
  "tenant_id": "acme",
  "stm": {
    "type": "ShortTermMemory",
    "window_size": 20,
    "current_size": 14,
    "is_full": false,
    "tenant_id": "acme"
  },
  "working": {
    "type": "WorkingMemory",
    "capacity": 10,
    "token_budget": 4000,
    "entries": 6,
    "estimated_tokens_used": 850,
    "tenant_id": "acme"
  },
  "ltm": {
    "type": "LongTermMemory",
    "db_path": "./memory_storage/acme_ltm.db",
    "tenant_id": "acme",
    "total_entries": 342
  },
  "semantic": {
    "type": "SemanticMemory",
    "backend": "faiss",
    "model": "all-MiniLM-L6-v2",
    "entries": 356,
    "tenant_id": "acme"
  },
  "profile_manager": {
    "user_count": 3,
    "total_interactions": 1204,
    "avg_interactions_per_user": 401.3,
    "top_global_topics": [["python", 340], ["AI", 210]]
  },
  "encryption_active": true,
  "privacy_masking_active": true
}

Maintenance Reports

report = mgr.run_maintenance("acme")
# {
#   "tenant_id": "acme",
#   "timestamp": 1714000000.0,
#   "stm_evicted": 2,
#   "ltm_decay_updated": 120,
#   "ltm_deleted": 5,
#   "working_forgotten": 1,
#   "ltm_compression": {"merges": 3, "noise_removed": 7}
# }

Forgetting Diagnostics

Use ForgettingMechanism.report for a non-mutating at-risk preview before committing a maintenance cycle:

report = mgr._forgetter.report(bundle.stm.get_all())
print(f"At-risk: {report['at_risk_count']} / {report['total']}")
print(f"At-risk IDs: {report['at_risk_ids']}")

Selector Scoring Breakdown

Use MemorySelector.explain to understand why specific memories were or were not selected:

explanation = mgr._selector.explain(
    query="Python caching strategies",
    candidates=bundle.stm.get_all() + bundle.ltm.get_top_by_importance(20),
)
for row in explanation:
    print(f"{row['id'][:10]}  composite={row['composite']:.3f}  "
          f"rel={row['relevance']:.3f}  rec={row['recency']:.3f}  "
          f"imp={row['importance']:.3f}  |  {row['content'][:60]}")

User Profile Analytics

mgr._profile_manager.aggregate_stats()
# {
#   "user_count": 15,
#   "total_interactions": 5230,
#   "avg_interactions_per_user": 348.7,
#   "top_global_topics": [["python", 1200], ...]
# }

10. Edge Cases & Failure Handling

Embedding Service Failure

If sentence-transformers is not installed or the model files are unavailable, _EmbeddingModel falls back to deterministic stub embeddings: hash-seeded random 384-dimensional vectors. These are consistent across calls for the same text (same hash → same seed → same vector), so similarity search still functions — but results will not be semantically meaningful.

Log warning: "sentence-transformers unavailable — using deterministic stub embeddings."

Mitigation: Pre-install the model locally and set TRANSFORMERS_OFFLINE=1 to prevent accidental remote downloads in production.

FAISS Unavailable

If faiss is not installed, SemanticMemory silently switches to _NumpyIndex (brute-force O(n) cosine search). This is transparent to callers but significantly slower at scale.

Log warning: "FAISS not installed — using pure-numpy fallback index."

Mitigation: Install faiss-cpu in production. The NumPy fallback is acceptable for fewer than ~1,000 entries.

SQLite / LTM Failure

LongTermMemory wraps all operations in a context manager that calls conn.rollback() on exception and re-raises. If the SQLite file is corrupted or the path is inaccessible:

  • store() and get() will raise an exception.
  • The AIMemoryManager thread pool submits LTM writes asynchronously; failures are logged but do not crash the main thread.

Mitigation: Ensure persistence_path is writable before instantiation. Back up the .db files regularly.

Quota / Capacity Exceeded

ShortTermMemory uses a deque(maxlen=window_size) — the oldest entry is automatically evicted when full. No exception is raised.

WorkingMemory evicts the lowest-effective_importance entry when capacity is exceeded. Entries that would exceed token_budget are silently skipped during load().

LongTermMemory has no hard cap. Use run_maintenance() and delete_below_importance() to manage growth.

Corrupted or Missing Entries

LongTermMemory._row_to_entry uses json.loads on stored content; a corrupted JSON string will raise json.JSONDecodeError. Individual corrupted rows will surface as exceptions from get() or get_top_by_importance(). The calling code in build_context does not catch these — add a try/except wrapper around build_context if corruption is a concern.

Missing Tenant

AIMemoryManager._bundle(tenant_id) creates a new _TenantBundle on first access for any unknown tenant_id. There is no explicit register_tenant step required. The bundle contains fresh, empty memory layers.

Async/Sync Mismatch

asave_interaction, abuild_context, and arun_maintenance use asyncio.get_event_loop().run_in_executor internally. They must be awaited from within a running asyncio event loop. Do not call them from synchronous code without asyncio.run(...):

# Correct in async context
entry = await mgr.asave_interaction(...)

# Correct in sync context
entry = asyncio.run(mgr.asave_interaction(...))

# Wrong — returns a coroutine object, not the result
entry = mgr.asave_interaction(...)  # ← missing await

ForgettingMechanism.apply on Empty List

When entries is empty, apply() returns the string "entries is empty" instead of a tuple. This is a known implementation inconsistency. Always guard:

if entries:
    alive, forgotten = forgetter.apply(entries)
else:
    alive, forgotten = [], []

LLM Unavailable for Compression/Summarisation

If llm=None or the LLM call raises an exception:

  • MemoryCompressor.summarise() falls back to simple string concatenation (truncated to 500 chars).
  • MemoryCompressor._merge_pair() falls back to concatenation with a | separator.
  • ConversationSummaryMemory._summarize() logs the error and retains only the last 5 messages.

The system never hard-fails due to an absent LLM.

Encryption: cryptography Package Missing

MemoryEncryptor degrades to base64 encoding with a logged warning. is_real_encryption returns False. Never use the base64 fallback for sensitive data in production — it provides no security.


11. Advanced Usage

Multi-Tenant Setup

from memory import AIMemoryManager, MemoryConfig

config = MemoryConfig(
    persistence_path="/var/data/memory",
    importance_threshold=0.6,
    max_tokens=4000,
    enable_persistence=True,
)

mgr = AIMemoryManager(
    config=config,
    llm=my_llm,
    enable_privacy_masking=True,
    encryption_key=os.environ["MEMORY_ENCRYPTION_KEY"],
)

# Tenant A — full team roles
mgr.grant("admin_user", "tenant_a", role="admin")
mgr.grant("agent_1",    "tenant_a", role="writer")
mgr.grant("readonly",   "tenant_a", role="reader")

# Tenant B — separate namespace, no data overlap
mgr.grant("agent_2", "tenant_b", role="owner")

# Tenant A and B memories are fully isolated
mgr.save_interaction("agent_1", "tenant_a", "secret A", "answer A", importance=0.9)
mgr.save_interaction("agent_2", "tenant_b", "secret B", "answer B", importance=0.9)

ctx_a = mgr.build_context("agent_1", "tenant_a", "secret B")
# ctx_a.full_text will NOT contain "secret B" — tenant_b data is invisible

RL Feedback Loop (Importance Signals)

The system does not include an explicit RL module, but the importance parameter functions as the feedback signal. Close the loop by updating importance based on downstream signals such as user upvotes, follow-up questions, or task success:

# High-quality responses: store with elevated importance
mgr.save_interaction(..., importance=0.9)

# Routine exchanges: store with lower importance
mgr.save_interaction(..., importance=0.4)

# Profile-based boost: users whose memories are consistently useful
# get a `memory_retention_boost` applied automatically
mgr._profile_manager.get_or_create("alice").update_retention_boost(0.15)

Combining with a RAG Document Retriever

from memory import Document

def answer_with_memory_and_rag(mgr, retriever, user_id, tenant_id, query):
    # 1. Retrieve documents from your vector / BM25 store
    raw_docs = retriever.search(query, k=5)
    documents = [
        Document(content=d.text, source=d.url, score=d.score)
        for d in raw_docs
    ]

    # 2. Build memory-enriched context
    ctx = mgr.build_context(
        user_id=user_id,
        tenant_id=tenant_id,
        query=query,
        documents=documents,
        top_k_semantic=5,
    )

    # 3. Generate response
    response = my_llm.generate(ctx.full_text + f"\n\nUser: {query}\nAssistant:")

    # 4. Persist the interaction
    mgr.save_interaction(user_id, tenant_id, query, response, importance=0.7)

    return response

Custom User Preferences

mgr._profile_manager.set_preference("alice", "verbosity", "concise")
mgr._profile_manager.set_preference("alice", "preferred_language", "fr")
mgr._profile_manager.set_preference("alice", "domain", "finance")

profile = mgr._profile_manager.get_or_create("alice")
print(profile.to_context_string())
# User language: fr
# Response style: concise / neutral
# Interests: finance, python, ...
# domain: finance

Production Deployment Notes

Memory isolation: Run one AIMemoryManager instance per application process. The internal ThreadPoolExecutor is not shared across processes. For multi-process deployments, consider a shared SQLite file (acceptable for moderate load) or implement a Redis-backed LongTermMemory.

Embedding model cold start: The first save_interaction call triggers model loading (1–3 seconds). Pre-warm by calling mgr._bundle("default").semantic._embedder._load() at startup.

Maintenance scheduling: Run run_maintenance(tenant_id) on a background schedule (e.g., APScheduler or a cron job). Recommended interval: once per hour per active tenant.

Disk growth: LongTermMemory has no automatic size cap. Monitor ltm.count() and configure maintenance to delete entries below a target min_importance. For long-running production systems, set enable_consolidation=True and tune decay_rate to match your data retention requirements.

Thread safety: LongTermMemory is thread-safe via threading.Lock. ShortTermMemory and WorkingMemory are not thread-safe — use one AIMemoryManager per request thread or protect shared instances with an external lock.

Encryption key rotation: There is no built-in key rotation. To rotate keys: decrypt all existing entries with the old key, re-encrypt with the new key, and update encryption_key in the constructor.

Async FastAPI Integration Example

from fastapi import FastAPI
from contextlib import asynccontextmanager
from fennec_memory.memory import AIMemoryManager, MemoryConfig

memory_manager: AIMemoryManager = None

@asynccontextmanager
async def lifespan(app: FastAPI):
    global memory_manager
    config = MemoryConfig(persistence_path="/var/data/memory", max_tokens=4000)
    memory_manager = AIMemoryManager(config=config, llm=my_llm)
    yield
    # Cleanup if needed

app = FastAPI(lifespan=lifespan)

@app.post("/chat")
async def chat(user_id: str, tenant_id: str, message: str):
    ctx = await memory_manager.abuild_context(user_id, tenant_id, message)
    response = await my_async_llm.generate(ctx.full_text + "\n\n" + message)
    await memory_manager.asave_interaction(
        user_id, tenant_id, message, response, importance=0.7
    )
    return {"response": response}

Example With Rag

from fennec_community.llm import GeminiInterface
from fennec_community.document_loaders import TextLoader 
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem 
from fennec_memory.memory import (
    ConversationBufferMemory,
    ConversationBufferWindowMemory,
    ConversationSummaryMemory,
    ConversationEntityMemory,
)

loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = GeminiInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
rag_system.add_documents(loader_1)

rag = rag_system  
def conversational_rag_query(
    query: str,
    memory,
    memory_key: str = "history",
) -> str:
    mem_vars  = memory.load_memory_variables({"input": query})
    history   = mem_vars.get(memory_key, "")
    retrieved = rag.retrieve(query, top_k=2)
    context   = rag.context_manager.build(query, retrieved)
    if history:
        full_prompt = (
            f"سياق المحادثة السابقة:\n{str(history)[:400]}\n\n"
            f"معلومات مسترجعة:\n{context}\n\n"
            f"السؤال الحالي: {query}\nالإجابة:"
        )
    else:
        full_prompt = f"معلومات مسترجعة:\n{context}\n\nالسؤال: {query}\nالإجابة:"
    answer = rag.llm.generate(full_prompt)
    memory.save_context({"input": query}, {"output": answer})
    return answer

print("\n  [5a] ConversationBufferMemory ")
buffer_mem = ConversationBufferMemory(
    return_messages=False, input_key="input", output_key="output", memory_key="history"
)
conversation = [
    "ما هي طرق الدفع المتاحه ",
    "ما هي اعدادهم",
]
print("\n  💬  conversion with RAG (Buffer Memory):")
for turn, q in enumerate(conversation, 1):
    answer = conversational_rag_query(q, buffer_mem)
    print(f"  [{turn}] 👤 {q}")
    print(f"       🤖 {answer[:80]}...")
    print()
print(f"  📝 Buffer memory: {len(buffer_mem.chat_memory)}  saved message")
# ── 5b: Window Memory ──────────────── #
print("\n  [5b] ConversationBufferWindowMemory — نافذة K=2")
window_mem = ConversationBufferWindowMemory(k=2)
long_conversation = [
    "ما هو  طرق التواصل مع فريق الدعم ",
    "اعطني مثال عليها",
]
for q in long_conversation:
    conversational_rag_query(q, window_mem)
print(f"  📝 Window memory (k=2): {len(window_mem.chat_memory)} رسائل محفوظة (آخر 2)")
if window_mem.chat_memory:
    print(f"  last  quesion: {window_mem.chat_memory[-1].get('input', '')}")
Source: memory/memory_module_docs.md