Context Modular
Query → QueryAnalyzer → HybridRetriever → ChunkFilter → CompositeRanker → SmartContextComposer → ContextGuard → AdaptiveLearning → ContextResult ```
Table of Contents
- Quick Start
- ContextEngine — Main Orchestrator
- ContextEngineConfig — Configuration
- Data Models
- HybridRetriever — Hybrid Retrieval
- Pipeline — Filtering, Ranking & Composition
- ContextGuard — Context Validator
- AdaptiveLearningSystem — Adaptive Learning
- ContextCache — Smart Cache
- Budget Allocation
- ConversationHistory — Multi-turn Support
- Utility Functions
- Reference Tables
1. Quick Start
from fennec_community.context import ContextEngine, ContextEngineConfig
# Simplest usage — all defaults applied
engine = ContextEngine()
engine.index(chunks=my_chunks, embed_fn=my_embed_function)
result = engine.run("What is machine learning?")
print(result.context) # Context ready to feed into LLM
print(result.get_stats()) # Detailed pipeline statistics# Advanced usage — full customization
from fennec_community.context import (
ContextEngine, ContextEngineConfig,
ComposerConfig, RetrieverConfig, CacheConfig
)
config = ContextEngineConfig(
composer = ComposerConfig(max_context_length=4000, template="english"),
retriever = RetrieverConfig(top_k=30, vector_weight=0.7),
cache = CacheConfig(enabled=True, ttl_seconds=600),
)
engine = ContextEngine(config=config)
engine.index(chunks=my_chunks, embed_fn=my_embed_fn)
result = engine.run(
query = "Compare BERT and GPT architectures",
metadata_filters = {"category": "nlp"},
max_chunks = 10,
)2. ContextEngine — Main Orchestrator
Description: The unified entry point for the entire system. Orchestrates all pipeline stages from query intake to final output.
ContextEngine.__init__
ContextEngine(
config: Optional[ContextEngineConfig] = None,
query_analyzer: Optional[QueryAnalyzerStrategy] = None,
retriever: Optional[HybridRetriever] = None,
chunk_filter: Optional[ChunkFilterStrategy] = None,
ranker: Optional[ChunkRankingStrategy] = None,
composer: Optional[ContextComposerStrategy] = None,
guard: Optional[ContextGuardStrategy] = None,
budget_allocator: Optional[BudgetAllocationStrategy] = None,
length_counter: Optional[LengthCounter] = None,
)Purpose: Instantiate the engine and initialize all internal components. Every parameter is optional — sensible defaults are used when nothing is passed.
| Parameter | Type | Description |
|---|---|---|
config |
ContextEngineConfig |
Master configuration object. Falls back to defaults if omitted. |
query_analyzer |
QueryAnalyzerStrategy |
Custom query analyzer (e.g. LLM-based). Optional. |
retriever |
HybridRetriever |
Custom retrieval system or external vector store adapter. Optional. |
chunk_filter |
ChunkFilterStrategy |
Custom filtering logic. Optional. |
ranker |
ChunkRankingStrategy |
Custom ranking system. Optional. |
composer |
ContextComposerStrategy |
Custom context composer. Optional. |
guard |
ContextGuardStrategy |
Custom context guard (e.g. hallucination detector). Optional. |
budget_allocator |
BudgetAllocationStrategy |
Budget distribution strategy for multi-query runs. Optional. |
length_counter |
LengthCounter |
Length measurement function — character-based or token-based. Optional. |
Returns: A fully initialized ContextEngine instance ready to use.
ContextEngine.index
engine.index(
chunks: List[Any],
embed_fn: Optional[Callable[[str], np.ndarray]] = None,
vector_store_adapter: Optional[Any] = None,
source_quality_map: Optional[Dict[str, SourceQuality]] = None,
) -> ContextEnginePurpose: Index a collection of chunks into the system and prepare the retrieval layer.
Must be called before run().
| Parameter | Type | Description |
|---|---|---|
chunks |
List[Any] |
List of chunk objects — any type that exposes a text and doc_id attribute. |
embed_fn |
Callable[[str], np.ndarray] |
Embedding function for a single text string. Example: lambda t: model.encode(t) |
vector_store_adapter |
Any |
Alternative to embed_fn — integrates with Chroma / FAISS / Pinecone. Must implement similarity_search_with_score(). |
source_quality_map |
Dict[str, SourceQuality] |
Per-source quality hints {"source_name": SourceQuality.HIGH} to boost ranking. |
Returns: self — supports method chaining.
# Method chaining example
result = (
ContextEngine(config=my_config)
.index(chunks=docs, embed_fn=embedder)
.run("How does the BM25 algorithm work?")
)ContextEngine.run
engine.run(
query: str,
metadata_filters: Optional[Dict[str, Any]] = None,
max_chunks: Optional[int] = None,
override_budget: Optional[int] = None,
use_history: bool = True,
record_turn: bool = True,
) -> ContextResultPurpose: Execute the full pipeline for a single query and return the context prepared for the LLM. This is the primary method most developers will use.
| Parameter | Type | Description |
|---|---|---|
query |
str |
The user query — English, Arabic, or mixed. |
metadata_filters |
Dict[str, Any] |
Metadata field filters. Example: {"category": "tech", "year": 2024} |
max_chunks |
int |
Override the maximum number of retrieved chunks. |
override_budget |
int |
Override the auto-computed context budget (in characters or tokens). |
use_history |
bool |
Whether to enrich the query with prior conversation context. Default: True. |
record_turn |
bool |
Whether to save this query turn in conversation history. Default: True. |
Returns: ContextResult — contains the final context string plus detailed per-stage statistics.
result = engine.run(
query = "How do LSTM networks work?",
metadata_filters = {"domain": "deep_learning"},
override_budget = 3000,
)
# Use the output
llm_prompt = f"Context:\n{result.context}\n\nQuestion: {result.query_analysis.original_query}"
# Inspect pipeline statistics
stats = result.get_stats()
print(f"Retrieved : {stats['total_retrieved']} chunks")
print(f"Included : {stats['total_included']} chunks")
print(f"Latency : {stats['pipeline_ms']} ms")
print(f"Guard OK : {stats['guard_passed']}")ContextEngine.run_multi
engine.run_multi(
queries: List[str],
global_budget: Optional[int] = None,
use_weighted: bool = True,
**kwargs,
) -> strPurpose: Run the full pipeline over multiple queries in a single call (Multi-hop RAG). Distributes the total budget intelligently across queries based on their complexity, then merges the resulting contexts into one unified string.
| Parameter | Type | Description |
|---|---|---|
queries |
List[str] |
List of query strings to process. |
global_budget |
int |
Total character/token budget shared across all queries. |
use_weighted |
bool |
True = allocate budget by complexity; False = split equally. |
**kwargs |
Additional keyword arguments forwarded to each run() call. |
Returns: str — a unified context string that concatenates the results of all queries, each labelled with its query index.
combined_context = engine.run_multi(
queries = [
"What is the Transformer architecture?",
"How does the Attention Mechanism work?",
"What is the difference between BERT and GPT?",
],
global_budget = 6000,
use_weighted = True, # More complex queries receive larger budget slices
)ContextEngine.build (Legacy API)
engine.build(
query: str,
chunks: List[Tuple], # [(chunk, score), ...]
max_chunks: Optional[int] = None,
max_length: Optional[int] = None,
) -> strPurpose: Backward-compatible interface for projects that used the old ContextManager API. Accepts raw tuples instead of ScoredChunk objects.
| Parameter | Type | Description |
|---|---|---|
query |
str |
The user query. |
chunks |
List[Tuple[Any, float]] |
List of (chunk_object, relevance_score) tuples. |
max_chunks |
int |
Maximum number of chunks to include in context. |
max_length |
int |
Maximum length of the output context in characters. |
Returns: str — the composed context string.
ContextEngine.compress_context
engine.compress_context(context: str, target_length: int) -> strPurpose: Compress an existing context string down to a target length while preserving as much meaning as possible. Useful when the LLM's context window is smaller than the assembled context.
| Parameter | Type | Description |
|---|---|---|
context |
str |
The original context string to compress. |
target_length |
int |
The desired maximum length in characters or tokens. |
Returns: str — the compressed context.
ContextEngine.get_context_stats
engine.get_context_stats(context: str) -> Dict[str, Any]Purpose: Compute analytical statistics about an already-composed context string.
| Parameter | Type | Description |
|---|---|---|
context |
str |
The context string to analyse. |
Returns: Dict containing:
| Key | Description |
|---|---|
total_length |
Total length in characters or tokens |
chunk_count |
Number of distinct chunks in the context |
avg_chunk_length |
Average chunk length |
min_chunk_length / max_chunk_length |
Shortest / longest chunk |
template |
The template used for formatting |
length_unit |
"chars" or "tokens" |
ContextEngine.get_performance_report
engine.get_performance_report() -> Dict[str, Any]Purpose: Return a comprehensive performance report aggregated over all queries processed by this engine instance so far.
Returns: Dict containing:
| Key | Description |
|---|---|
total_queries |
Total number of queries processed |
avg_response_ms |
Average end-to-end latency in milliseconds |
avg_context_length |
Average length of produced contexts |
avg_chunk_count |
Average number of chunks included per query |
guard_failure_rate |
Fraction of queries where ContextGuard failed (0.0 → 1.0) |
cache |
Cache statistics (present only when cache is enabled) |
ContextEngine.set_conversation
engine.set_conversation(history: ConversationHistory) -> NonePurpose: Restore a previously saved conversation session. Useful for resuming context after an application restart without losing conversational continuity.
| Parameter | Type | Description |
|---|---|---|
history |
ConversationHistory |
A previously saved conversation history object. |
ContextEngine.get_history
engine.get_history() -> Optional[ConversationHistory]Purpose: Retrieve the current conversation history for serialisation, analysis, or handoff.
Returns: ConversationHistory instance, or None if multi_turn is disabled in config.
ContextEngine.clear_cache
engine.clear_cache() -> NonePurpose: Flush the entire cache. Should be called whenever the underlying chunk data changes (e.g. after re-indexing documents) to prevent stale context from being served.
3. ContextEngineConfig — Configuration
ContextEngineConfig
@dataclass
class ContextEngineConfig:
query_analyzer: QueryAnalyzerConfig = QueryAnalyzerConfig()
retriever: RetrieverConfig = RetrieverConfig()
filter_cfg: FilterConfig = FilterConfig()
ranking: RankingConfig = RankingConfig()
composer: ComposerConfig = ComposerConfig()
guard: GuardConfig = GuardConfig()
cache: CacheConfig = CacheConfig()
language: str = "auto" # "auto" | "ar" | "en"
enable_adaptive: bool = True # Enable adaptive learning
multi_turn: bool = True # Enable multi-turn conversation supportPurpose: The unified master configuration object passed to ContextEngine. Each sub-component has its own isolated, fully-customisable config block.
QueryAnalyzerConfig — Query Analysis Settings
Controls how queries are analysed and how context budgets are assigned.
| Parameter | Type | Default | Description |
|---|---|---|---|
budget_low |
int |
800 |
Context budget (chars) for simple queries |
budget_medium |
int |
2000 |
Context budget for medium-complexity queries |
budget_high |
int |
4000 |
Context budget for complex queries |
custom_entities |
List[str] |
[] |
Domain-specific entity names to recognise in queries |
arabic_threshold |
float |
0.3 |
Minimum fraction of Arabic characters to classify a query as Arabic |
RetrieverConfig — Hybrid Retrieval Settings
Controls the weights and behaviour of the three retrieval methods.
| Parameter | Type | Default | Description |
|---|---|---|---|
top_k |
int |
20 |
Number of chunks to retrieve before filtering |
vector_weight |
float |
0.60 |
Weight of semantic (vector) search results |
keyword_weight |
float |
0.30 |
Weight of BM25 keyword search results |
metadata_weight |
float |
0.10 |
Weight of metadata filtering results |
use_vector |
bool |
True |
Enable / disable semantic search |
use_keyword |
bool |
True |
Enable / disable BM25 search |
use_metadata |
bool |
False |
Enable / disable metadata-based filtering |
min_vector_score |
float |
0.0 |
Minimum cosine similarity to accept a vector result |
FilterConfig — Filtering Layer Settings
Controls the multi-stage chunk filtering pipeline.
| Parameter | Type | Default | Description |
|---|---|---|---|
dedup_method |
str |
"hash" |
Deduplication strategy: "hash" | "prefix" | "semantic" |
min_chunk_length |
int |
20 |
Chunks shorter than this are discarded |
max_chunk_length |
int |
4000 |
Chunks longer than this are discarded |
semantic_sim_threshold |
float |
0.92 |
Cosine similarity threshold for near-duplicate removal |
noise_patterns |
List[str] |
(built-in) | Regex patterns that flag a chunk as noise (page numbers, dividers, etc.) |
RankingConfig — Ranking Score Weights
Controls the weight of each factor in the composite ranking score.
| Parameter | Type | Default | Description |
|---|---|---|---|
weight_vector_score |
float |
0.40 |
Weight of semantic similarity in composite score |
weight_keyword_score |
float |
0.25 |
Weight of keyword matching score |
weight_source_quality |
float |
0.20 |
Weight of source quality signal |
weight_recency |
float |
0.10 |
Weight of information recency signal |
weight_position |
float |
0.05 |
Weight of chunk position within its source document |
max_chunks_to_rank |
int |
15 |
Maximum number of chunks passed to the Composer after ranking |
ComposerConfig — Context Formatting Settings
Controls how the final context string is formatted and truncated.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_context_length |
int |
2000 |
Maximum length of the output context string |
separator |
str |
"\n---\n" |
Separator inserted between chunks |
template |
str |
"arabic" |
Formatting template: "arabic" | "english" | "minimal" | "structured" |
include_scores |
bool |
False |
Append ranking scores to each chunk in the context |
include_metadata |
bool |
False |
Append metadata fields to each chunk |
include_sources |
bool |
True |
Include source attribution for each chunk |
group_by_source |
bool |
False |
Group chunks from the same source document together |
GuardConfig — Context Guard Settings
Controls the validation checks applied to the assembled context.
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
bool |
True |
Enable / disable the guard entirely |
min_context_length |
int |
10 |
Contexts shorter than this fail validation |
max_context_length |
int |
8000 |
Contexts longer than this trigger a warning (does not fail) |
check_relevance |
bool |
False |
Semantic relevance check (requires an embed_fn) |
relevance_threshold |
float |
0.10 |
Minimum cosine similarity between query and context |
CacheConfig — LRU Cache Settings
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
bool |
True |
Enable / disable the cache |
max_size |
int |
256 |
Maximum number of cached query results |
ttl_seconds |
int |
300 |
Time-to-live for each cached entry, in seconds |
4. Data Models
ContextResult
The final output object returned by engine.run().
@dataclass
class ContextResult:
context: str # Context string ready for LLM
query_analysis: QueryAnalysis # Full query analysis result
chunks_used: List[ScoredChunk] # Chunks actually included in context
total_retrieved: int # Chunks retrieved before filtering
total_after_filter: int # Chunks remaining after filtering
total_after_rank: int # Chunks remaining after ranking
total_included: int # Chunks included in the final context
context_length: int # Context length in characters
pipeline_ms: float # Total end-to-end latency
guard_passed: bool # Whether ContextGuard validation passed
guard_warnings: List[str] # Guard warning messages (may be empty)
metadata: Dict[str, Any] # Extra metadata (e.g. {"from_cache": True})ContextResult.get_stats
result.get_stats() -> Dict[str, Any]Purpose: Return a structured summary of all pipeline metrics in a single dictionary — convenient for logging, monitoring dashboards, or debugging.
Returns: Dict with keys: pipeline_ms, total_retrieved, total_after_filter, total_after_rank, total_included, context_length, guard_passed, guard_warnings, query_intent, query_complexity, context_budget.
QueryAnalysis
The result of query analysis. Used by every stage of the pipeline to make informed decisions.
@dataclass
class QueryAnalysis:
original_query: str
normalized: str # Cleaned text (diacritics removed, whitespace normalised)
intent: QueryIntent # FACTUAL | REASONING | CODE | COMPARATIVE | ...
complexity: QueryComplexity # LOW | MEDIUM | HIGH
keywords: List[str] # Extracted keywords, ranked by relevance
entities: List[str] # Named entities detected in the query
is_arabic: bool
language: str # "ar" | "en"
question_type: str # "wh-question" | "yes/no" | "open"
context_budget: int # Recommended context size in charactersQueryAnalysis.to_dict
analysis.to_dict() -> Dict[str, Any]Purpose: Serialise the analysis result to a JSON-compatible dictionary — useful for logging pipelines, analytics stores, or passing between services.
ScoredChunk
A unified wrapper around any chunk object, enriched with all retrieval and ranking scores.
@dataclass
class ScoredChunk:
chunk: Any # Original chunk object (any type)
vector_score: float # Cosine similarity from vector search
keyword_score: float # BM25 score (normalised 0-1)
metadata_score: float # Metadata filter match score
source_quality: SourceQuality # HIGH | MEDIUM | LOW | UNKNOWN
recency_score: float # Recency signal (0-1)
composite_score: float # Final weighted ranking score
retrieval_method: RetrievalMethod
retrieval_rank: int # Original rank before re-rankingAvailable properties:
| Property | Type | Description |
|---|---|---|
.text |
str |
Chunk text — reads from text or page_content attribute |
.doc_id |
str |
Source document identifier |
.metadata |
Dict |
Chunk metadata dictionary |
.source |
str |
Source name or path |
.content_hash |
str |
SHA-256 hash of normalised content |
.char_count |
int |
Character count of the chunk text |
ScoredChunk.to_dict
chunk.to_dict() -> Dict[str, Any]Purpose: Serialise the scored chunk to a dictionary for logging or diagnostics. Includes a 120-character text preview.
Available Enums
| Enum | Values | Used For |
|---|---|---|
QueryIntent |
FACTUAL, REASONING, COMPARATIVE, PROCEDURAL, CODE, SUMMARIZE, UNKNOWN |
Query intent classification |
QueryComplexity |
LOW, MEDIUM, HIGH |
Query complexity, drives budget allocation |
SourceQuality |
HIGH=3, MEDIUM=2, LOW=1, UNKNOWN=0 |
Source trustworthiness signal for ranking |
RetrievalMethod |
VECTOR, KEYWORD, METADATA, HYBRID |
Which retriever produced a result |
5. HybridRetriever — Hybrid Retrieval
HybridRetriever.retrieve
retriever.retrieve(
query: str,
top_k: Optional[int] = None,
filters: Optional[Dict[str, Any]] = None,
) -> List[ScoredChunk]Purpose: Run hybrid retrieval (Vector + BM25 + Metadata) and fuse results using Reciprocal Rank Fusion (RRF). Chunks that appear across multiple retrieval methods automatically receive a score bonus, promoting diverse high-quality results.
RRF Formula:
rrf_score(chunk) = Σ 1 / (k + rank(chunk, list_i)) where k = 60| Parameter | Type | Description |
|---|---|---|
query |
str |
The search query |
top_k |
int |
Number of results to return (overrides config.retriever.top_k) |
filters |
Dict[str, Any] |
Metadata field filters e.g. {"category": "finance"} |
Returns: List[ScoredChunk] sorted descending by composite score.
VectorRetriever.index_chunks
vector_retriever.index_chunks(
chunks: List[Any],
embed_fn: Callable[[str], np.ndarray],
) -> NonePurpose: Build the in-memory embedding index for semantic search.
Note: This is called internally by
ContextEngine.index()— direct calls are rarely needed.
KeywordRetriever.index_chunks
keyword_retriever.index_chunks(chunks: List[Any]) -> NonePurpose: Build the BM25 index used for keyword-based retrieval. Works without any external dependencies and performs well on technical terminology.
6. Pipeline — Filtering, Ranking & Composition
ChunkFilter.filter
chunk_filter.filter(chunks: List[ScoredChunk]) -> List[ScoredChunk]Purpose: Apply the multi-stage filtering pipeline to a list of retrieved chunks:
- Noise removal — discards page numbers, blank lines, and separator lines via regex patterns.
- Length filtering — discards chunks shorter than
min_chunk_lengthor longer thanmax_chunk_length. - Hash-based deduplication — removes exact duplicate chunks using SHA-256 content hashes.
| Parameter | Type | Description |
|---|---|---|
chunks |
List[ScoredChunk] |
The raw list of retrieved chunks |
Returns: List[ScoredChunk] — the cleaned, deduplicated subset.
ChunkFilter.filter_with_embeddings
chunk_filter.filter_with_embeddings(
chunks: List[ScoredChunk],
embeddings: List[Optional[np.ndarray]],
) -> List[ScoredChunk]Purpose: Same as filter() with an additional semantic near-duplicate detection pass. Chunks whose cosine similarity exceeds semantic_sim_threshold relative to an already-kept chunk are removed.
| Parameter | Type | Description |
|---|---|---|
chunks |
List[ScoredChunk] |
Retrieved chunks to filter |
embeddings |
List[Optional[np.ndarray]] |
Pre-computed embeddings aligned with the chunk list |
Returns: List[ScoredChunk] after all filtering stages.
CompositeRanker.rank
ranker.rank(
chunks: List[ScoredChunk],
analysis: QueryAnalysis,
max_n: Optional[int] = None,
) -> List[ScoredChunk]Purpose: Re-rank chunks by a weighted composite score that combines five signals: semantic similarity, keyword matching, source quality, recency, and document position.
| Parameter | Type | Description |
|---|---|---|
chunks |
List[ScoredChunk] |
Filtered chunks from ChunkFilter |
analysis |
QueryAnalysis |
Query analysis result (provides keywords for keyword boost) |
max_n |
int |
Maximum number of chunks to return after ranking |
Returns: List[ScoredChunk] sorted descending by composite_score, capped at max_n.
SmartContextComposer.compose
composer.compose(
chunks: List[ScoredChunk],
analysis: QueryAnalysis,
budget: int,
) -> strPurpose: Format ranked chunks into a structured context string and ensure it never exceeds the allowed budget. Uses greedy inclusion with binary-search truncation as a safety clamp.
| Parameter | Type | Description |
|---|---|---|
chunks |
List[ScoredChunk] |
Ranked chunks from CompositeRanker |
analysis |
QueryAnalysis |
Query analysis (selects template language) |
budget |
int |
Maximum context length in characters or tokens |
Returns: str — the fully formatted context string ready to be injected into an LLM prompt.
SmartContextComposer.get_stats
composer.get_stats(context: str) -> Dict[str, Any]Purpose: Compute structural statistics about an already-composed context string.
Returns: Dict with: total_length, chunk_count, avg_chunk_length, min_chunk_length, max_chunk_length, template, length_unit.
7. ContextGuard — Context Validator
ContextGuard.validate
guard.validate(
context: str,
query: str,
analysis: QueryAnalysis,
) -> Tuple[bool, List[str]]Purpose: Validate the assembled context before it is delivered to the LLM. Runs up to five sequential checks and returns a pass/fail result along with human-readable warnings.
Checks performed:
| # | Check | Configurable Via |
|---|---|---|
| 1 | Context is not empty | check_empty |
| 2 | Context meets minimum length | min_context_length |
| 3 | Context below maximum length (warning only) | max_context_length |
| 4 | At least one source attribution present | check_source_coverage |
| 5 | Semantic relevance to query (optional) | check_relevance + relevance_threshold |
| Parameter | Type | Description |
|---|---|---|
context |
str |
The assembled context string |
query |
str |
The original user query |
analysis |
QueryAnalysis |
Query analysis result |
Returns: Tuple[bool, List[str]]
bool—True= context is valid,False= validation failedList[str]— list of warning/failure messages (empty on full pass)
# Standalone usage example
guard = ContextGuard(config)
passed, warnings = guard.validate(context, query, analysis)
if not passed:
for w in warnings:
logger.warning("ContextGuard: %s", w)
# fallback: retrieve more chunks or reduce strictness8. AdaptiveLearningSystem — Adaptive Learning
AdaptiveLearningSystem.suggest_budget
learning.suggest_budget(analysis: QueryAnalysis) -> intPurpose: Suggest an optimised context budget by blending the rule-based estimate from QueryAnalysis with the rolling average of actual context consumption. If the engine consistently uses far less than its budget, this method automatically reduces the allocation (with a 30% headroom), improving throughput.
| Parameter | Type | Description |
|---|---|---|
analysis |
QueryAnalysis |
Current query analysis containing the base budget estimate |
Returns: int — the recommended budget in characters (or tokens if a token counter is in use).
AdaptiveLearningSystem.get_performance_report
learning.get_performance_report() -> Dict[str, Any]Purpose: Return a statistical performance summary computed over the most recent sliding window of queries (default window: 100 queries).
Returns: Dict with: total_queries, avg_response_ms, avg_context_length, avg_chunk_count, guard_failure_rate, samples.
9. ContextCache — Smart Cache
Managed automatically by
ContextEngine— use directly only when you need explicit control.
ContextCache.get
cache.get(key: str) -> Optional[str]Purpose: Retrieve a cached context string by its key. Automatically checks TTL and returns None for expired or missing entries.
Returns: str (the context) or None.
ContextCache.put
cache.put(key: str, context: str) -> NonePurpose: Store a context string under the given key. Applies LRU eviction — the oldest entry is removed when max_size is exceeded.
ContextCache.make_key
cache.make_key(
query: str,
filters: Optional[Dict] = None,
) -> strPurpose: Generate a stable, deterministic cache key from a query string and its optional metadata filters using an MD5 hash. Both query and filters contribute to the key, so the same query with different filters produces different keys.
| Parameter | Type | Description |
|---|---|---|
query |
str |
The user query (normalised to lowercase before hashing) |
filters |
Dict |
Optional metadata filters included in the key |
Returns: str — 32-character hex MD5 digest.
ContextCache.stats (property)
cache.stats -> Dict[str, Any]Purpose: Return live cache health metrics without any side effects.
print(cache.stats)
# {
# 'size': 45, 'max_size': 256,
# 'hits': 120, 'misses': 30,
# 'hit_rate': '80.0%', 'ttl_sec': 300
# }ContextCache.clear
cache.clear() -> NonePurpose: Flush all cached entries. Call this after re-indexing documents to prevent stale contexts from being served.
10. Budget Allocation
Used by
engine.run_multi()to split a global budget across multiple queries.
EqualBudgetAllocation.allocate
EqualBudgetAllocation().allocate(
total_budget: int,
n_queries: int,
) -> List[int]Purpose: Divide the total budget equally among all queries. Any remainder from integer division is added to the first query's allocation.
Returns: List[int] — per-query budget allocations that sum to total_budget.
WeightedBudgetAllocation.allocate
WeightedBudgetAllocation(weights=[0.5, 0.3, 0.2]).allocate(
total_budget: int,
n_queries: int,
) -> List[int]Purpose: Distribute the budget according to pre-defined weights. Useful when you know in advance which queries are more important.
| Parameter | Type | Description |
|---|---|---|
weights |
List[float] |
Per-query importance weights (passed in __init__) |
total_budget |
int |
Total budget to distribute |
n_queries |
int |
Must equal len(weights), otherwise raises ValueError |
Returns: List[int] — per-query budgets proportional to the weights.
ComplexityBasedAllocation.allocate
ComplexityBasedAllocation().allocate(
total_budget: int,
n_queries: int,
analyses: Optional[List[QueryAnalysis]] = None,
) -> List[int]Purpose: Automatically distribute budget based on each query's detected complexity. No manual weight configuration needed — the engine derives weights from QueryAnalysis.complexity.
| Complexity | Budget Weight |
|---|---|
LOW |
1.0× |
MEDIUM |
1.5× |
HIGH |
2.5× |
Falls back to EqualBudgetAllocation if analyses is None or mismatched.
11. ConversationHistory — Multi-turn Support
ConversationHistory.add_turn
history.add_turn(
query: str,
answer: str = "",
context: str = "",
analysis: Optional[QueryAnalysis] = None,
) -> ConversationTurnPurpose: Append a new conversation turn (question + answer + context used) to the history. Automatically enforces the max_turns limit by discarding the oldest turn when the limit is reached.
| Parameter | Type | Description |
|---|---|---|
query |
str |
The user's question |
answer |
str |
The LLM's answer (optional — can be added after generation) |
context |
str |
The context string that was fed to the LLM |
analysis |
QueryAnalysis |
The query analysis for this turn |
Returns: The newly created ConversationTurn object.
ConversationHistory.get_recent_context
history.get_recent_context(n: int = 3) -> strPurpose: Extract the last n Q&A pairs as a plain-text string, suitable for inclusion in a retrieval query to capture conversational context.
Returns: str formatted as:
Q: First question
A: First answer
Q: Second question
A: Second answerConversationHistory.get_enriched_query
history.get_enriched_query(
current_query: str,
n: int = 2,
) -> strPurpose: Enrich the current query by prepending recent conversation context. This resolves implicit references and pronouns ("it", "that", "the previous one") so the retriever finds the correct chunks even when the query is underspecified.
| Parameter | Type | Description |
|---|---|---|
current_query |
str |
The user's current question |
n |
int |
Number of prior turns to include as context |
Returns: str — the enriched query, or the original query unchanged if there is no prior history.
# Turn 1
engine.run("What is the Transformer model?")
# Turn 2 — "it" is ambiguous without history
engine.run("How does it handle long sequences?")
# Internally enriched to:
# "[Previous conversation:
# Q: What is the Transformer model?
# A: ...]
# Current question: How does it handle long sequences?"12. Utility Functions
char_counter
from fennec_community.context import char_counter
char_counter(text: str) -> intPurpose: Count the number of characters in a string. This is the default length measurement function used throughout the engine when no custom length_counter is provided.
Returns: int — character count, or 0 for empty / None input.
make_token_counter
from fennec_community.context import make_token_counter
token_counter = make_token_counter(tokenizer)Purpose: Create a token-based length counter from any tokenizer that implements .encode(text). Pass the returned function as length_counter to ContextEngine to make all budget limits operate in tokens instead of characters — essential when targeting a model with a token-count window.
| Parameter | Type | Description |
|---|---|---|
tokenizer |
Any tokenizer | Must expose a .encode(text) method that returns a sequence |
Returns: LengthCounter — a Callable[[str], int] that returns token count.
from transformers import AutoTokenizer
from fennec_community.context import make_token_counter, ContextEngine
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
token_counter = make_token_counter(tokenizer)
engine = ContextEngine(length_counter=token_counter)
# All budget values now interpreted as token countsbuild_query_analyzer
from fennec_community.context import build_query_analyzer
analyzer = build_query_analyzer(
config: ContextEngineConfig,
strategy: Optional[QueryAnalyzerStrategy] = None,
) -> QueryAnalyzerStrategyPurpose: Factory function that constructs a query analyzer. If a custom strategy is provided (e.g. an LLM-powered analyzer), it is returned as-is. Otherwise, a RuleBasedQueryAnalyzer is built from the provided config. Use this when you want to create and test an analyzer independently before wiring it into the engine.
| Parameter | Type | Description |
|---|---|---|
config |
ContextEngineConfig |
Engine configuration |
strategy |
QueryAnalyzerStrategy |
Optional custom analyzer to use instead of the default |
Returns: QueryAnalyzerStrategy instance ready to call .analyze(query).
13. Reference Tables
Context Templates
| Template | Header | Chunk Format | Best For |
|---|---|---|---|
"arabic" |
📚 المعلومات المسترجعة: |
[المصدر: X]\nالنص |
Arabic-language applications |
"english" |
📚 Retrieved Information: |
[Source: X]\nText |
English-language applications |
"minimal" |
(none) | Raw text only | Compact prompts, token-tight scenarios |
"structured" |
=== CONTEXT START === |
--- [N] Source ---\nText |
Systems requiring explicit delimiters |
Examples End To End
from __future__ import annotations
import time
from dataclasses import dataclass, field
from typing import Optional, Dict
import numpy as np
from fennec_community.context import (
ContextEngine,
ContextEngineConfig,
ContextManager,
QueryAnalyzerConfig,
RetrieverConfig,
FilterConfig,
RankingConfig,
ComposerConfig,
GuardConfig,
CacheConfig,
SourceQuality,
ScoredChunk,
char_counter,
)
from fennec_community.chunks import DocumentChunk
_WORD_EMBEDDINGS: Dict[str, np.ndarray] = {}
def mock_embed(text: str) -> np.ndarray:
"""
simulations for embedding model for test
"""
import re
words = re.findall(r"\w+", text.lower())
vec = np.zeros(64)
for w in words:
if w not in _WORD_EMBEDDINGS:
np.random.seed(hash(w) % (2**31))
_WORD_EMBEDDINGS[w] = np.random.randn(64)
vec += _WORD_EMBEDDINGS[w]
norm = np.linalg.norm(vec)
return (vec / norm) if norm > 0 else vec
KNOWLEDGE_BASE = [
DocumentChunk(
doc_id="ml_001",
text=(
"التعلم الآلي هو فرع من فروع الذكاء الاصطناعي يركز على بناء أنظمة تتعلم من البيانات. "
"تستخدم هذه الأنظمة الخوارزميات لتحليل البيانات وتعلم الأنماط واتخاذ القرارات. "
"من أبرز تطبيقاته: التعرف على الصور، ومعالجة اللغة الطبيعية، وأنظمة التوصية."
),
metadata={"category": "ml", "lang": "ar", "date": "2024","source":"كتاب التعلم الآلي" },
),
DocumentChunk(
doc_id="dl_002",
text=(
"التعلم العميق هو مجال فرعي من التعلم الآلي يعتمد على الشبكات العصبية الاصطناعية "
"ذات الطبقات المتعددة. تستطيع هذه الشبكات تعلم تمثيلات هرمية للبيانات. "
"تُستخدم في رؤية الكمبيوتر، ومعالجة الكلام، والنصوص."
),
metadata={"category": "dl", "lang": "ar", "date": "2024","source":"كتاب التعلم العميق" },
),
DocumentChunk(
doc_id="nlp_003",
text=(
"معالجة اللغة الطبيعية (NLP) هي مجال في الذكاء الاصطناعي يُعنى بفهم اللغة البشرية. "
"يشمل مهام مثل: تصنيف النصوص، واستخراج المعلومات، والترجمة الآلية، "
"وتوليد النصوص. نماذج مثل BERT وGPT غيّرت هذا المجال جذرياً."
),
metadata={"category": "nlp", "lang": "ar", "date": "2024","source":"كتاب معالجة اللغة الطبيعية" },
),
DocumentChunk(
doc_id="rag_004",
text=(
"Retrieval-Augmented Generation (RAG) هي تقنية تجمع بين قواعد المعرفة الخارجية "
"ونماذج اللغة الكبيرة. تُحسِّن دقة الإجابات وتقلل الـ hallucination. "
"تعتمد على: استرجاع المعلومات ذات الصلة، وتضمينها في سياق الـ LLM."
),
metadata={"category": "rag", "lang": "ar", "date": "2024"},
),
DocumentChunk(
doc_id="transformer_005",
text=(
"معمارية Transformer ثورت مجال معالجة اللغة الطبيعية منذ ورقة Attention is All You Need. "
"تعتمد على آلية self-attention التي تُحدد العلاقات بين الكلمات في النص. "
"أصبحت أساساً لنماذج مثل BERT وGPT وT5."
),
metadata={"category": "transformer", "lang": "ar", "date": "2017"},
),
DocumentChunk(
doc_id="vector_006",
text=(
"قواعد البيانات المتجهية (Vector Databases) تُخزّن embeddings وتتيح البحث بالتشابه الدلالي. "
"من أبرزها: Pinecone, Chroma, Weaviate, FAISS. "
"تُستخدم في أنظمة RAG لاسترجاع المعلومات ذات الصلة بالاستعلام."
),
metadata={"category": "infrastructure", "lang": "ar", "date": "2024"},
),
DocumentChunk(
doc_id="duplicate_007", # chunk مكرر عمداً للاختبار
text=(
"التعلم الآلي هو فرع من فروع الذكاء الاصطناعي يركز على بناء أنظمة تتعلم من البيانات. "
"تستخدم هذه الأنظمة الخوارزميات لتحليل البيانات وتعلم الأنماط واتخاذ القرارات."
),
metadata={"category": "ml", "lang": "ar"},
),
DocumentChunk(
doc_id="llm_009",
text=(
"نماذج اللغة الكبيرة (LLMs) مثل GPT-4 وClaude وGemini تُنتج نصاً يشبه الكتابة البشرية. "
"تُدرَّب على مليارات الكلمات وتستطيع الإجابة على الأسئلة، والكتابة، والبرمجة. "
"التحديات الرئيسية تشمل: الـ hallucination، والتحيز، والتكلفة الحسابية."
),
metadata={"category": "llm", "lang": "ar", "date": "2024"},
),
DocumentChunk(
doc_id="fine_tuning_010",
text=(
"Fine-tuning هي عملية تدريب نموذج مُدرَّب مسبقاً على بيانات خاصة بمجال معين. "
"تُحسِّن الأداء في مهام محددة دون الحاجة إلى التدريب من الصفر. "
"LoRA وQLoRA هما تقنيتان شائعتان لـ fine-tuning فعّال من حيث الموارد."
),
metadata={"category": "training", "lang": "ar", "date": "2024"},
),
]
def example_basic():
print("\n" + "="*60)
print("📌 example 1 : basic using")
print("="*60)
engine = ContextEngine() # buila a context engine with default config
# initializing index cuhnks for retrieval
engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)
# run the engine with a query
result = engine.run("ما هو التعلم الآلي؟")
print(f"\n🔍 الاستعلام: ما هو التعلم الآلي؟")
print(f"📊 النية: {result.query_analysis.intent.value}")
print(f"📊 التعقيد: {result.query_analysis.complexity.value}")
print(f"🔑 الكلمات المفتاحية: {result.query_analysis.keywords[:5]}")
print(f"\n📄 السياق المُركَّب:\n{result.context}")
print(f"\n📈 الإحصائيات:")
stats = result.get_stats()
for k, v in stats.items():
print(f" {k}: {v}")def example_custom_config():
print("\n" + "="*60)
print("📌 example 2:custom config using")
print("="*60)
config = ContextEngineConfig(
query_analyzer = QueryAnalyzerConfig(
budget_low = 600,
budget_medium = 1500,
budget_high = 3500,
),
retriever = RetrieverConfig(
top_k = 15,
vector_weight = 0.65,
keyword_weight = 0.30,
metadata_weight= 0.05,
),
filter_cfg = FilterConfig(
dedup_method = "hash",
min_chunk_length = 30,
),
ranking = RankingConfig(
weight_vector_score = 0.45,
weight_keyword_score = 0.30,
weight_source_quality = 0.15,
weight_recency = 0.10,
max_chunks_to_rank = 5,
),
composer = ComposerConfig(
max_context_length = 2500,
template = "arabic",
include_scores = True,
include_metadata = False,
group_by_source = False,
),
guard = GuardConfig(
enabled = True,
min_context_length = 50,
check_source_coverage= True,
),
cache = CacheConfig(
enabled = True,
max_size = 128,
ttl_seconds = 600,
),
)
engine = ContextEngine(config=config)
# specific for source quality
source_quality = {
"كتاب التعلم الآلي": SourceQuality.HIGH,
"كتاب التعلم العميق": SourceQuality.HIGH,
"مجلة NLP": SourceQuality.MEDIUM,
"بحث RAG": SourceQuality.HIGH,
"مصدر مكرر": SourceQuality.LOW,
}
engine.index(
chunks = KNOWLEDGE_BASE,
embed_fn = mock_embed,
source_quality_map = source_quality,
)
result = engine.run(
"قارن بين التعلم الآلي والتعلم العميق، وما هي أبرز الفروقات؟",
metadata_filters={"lang": "ar"},
)
print(f"\n🔍 الاستعلام: قارن بين التعلم الآلي والتعلم العميق")
print(f"📊 النية: {result.query_analysis.intent.value}")
print(f"📊 التعقيد: {result.query_analysis.complexity.value}")
print(f"💰 الميزانية: {result.query_analysis.context_budget} حرف")
print(f"🛡️ Guard: {'✓ نجح' if result.guard_passed else '✗ فشل'}")
print(f"\n📄 السياق:\n{result.context}")
def example_multi_turn():
print("\n" + "="*60)
print("📌 example 3 :multi conversion turn using")
print("="*60)
engine = ContextEngine(
config=ContextEngineConfig(multi_turn=True)
)
engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)
conversation = [
"ما هو التعلم الآلي؟",
"وما علاقته بالتعلم العميق؟",
"كيف يُستخدم في RAG؟",
]
for i, query in enumerate(conversation):
result = engine.run(query, use_history=True, record_turn=True)
# simulations for llm response
fake_answer = f"[إجابة LLM للسؤال {i+1}]"
# record the answer in conversion history
if engine.get_history() and engine.get_history().turns:
engine.get_history().turns[-1].answer = fake_answer
print(f"\n💬 السؤال {i+1}: {query}")
print(f" 📊 الاستعلام المُثرى: {result.query_analysis.normalized[:80]}...")
print(f" 📄 طول السياق: {result.context_length} حرف")
print(f" ⏱️ الوقت: {result.pipeline_ms:.1f}ms")
# statistics for the conversation history
history = engine.get_history()
print(f"\n📚 تاريخ المحادثة: {len(history.turns)} دورة")
def example_cache_performance():
print("\n" + "="*60)
print("📌 example 4 : using cache performance")
print("="*60)
engine = ContextEngine(
config=ContextEngineConfig(
cache=CacheConfig(enabled=True, max_size=50, ttl_seconds=60)
)
)
engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)
query = "ما هو التعلم الآلي؟"
# first query (cache miss)
t0 = time.perf_counter()
r1 = engine.run(query)
t1 = (time.perf_counter() - t0) * 1000
# same query again (cache hit)
t0 = time.perf_counter()
r2 = engine.run(query)
t2 = (time.perf_counter() - t0) * 1000
print(f"\n📊 الاستعلام الأول (cache miss): {t1:.1f}ms")
print(f"📊 الاستعلام الثاني (cache hit): {t2:.1f}ms")
print(f"⚡ تسريع: {t1/t2:.1f}x" if t2 > 0 else "")
# statistics for the cache
perf = engine.get_performance_report()
print(f"\n📈 تقرير الأداء:")
for k, v in perf.items():
if isinstance(v, dict):
print(f" {k}:")
for kk, vv in v.items():
print(f" {kk}: {vv}")
else:
print(f" {k}: {v}")
Composite Ranking Formula
composite_score =
vector_score × 0.40 (semantic similarity)
+ keyword_boost × 0.25 (query keyword matches)
+ source_quality × 0.20 (source trustworthiness)
+ recency_score × 0.10 (information freshness)
+ position_score × 0.05 (chunk position in document)All weights are configurable via RankingConfig.
Query Complexity and Default Budgets
| Complexity | Default Budget | Example Queries |
|---|---|---|
LOW |
800 chars | "What is Python?", "Who invented the internet?" |
MEDIUM |
2000 chars | "Explain the concept of deep learning" |
HIGH |
4000 chars | "Compare Transformer and LSTM architectures in terms of..." |
Full Pipeline Step-by-Step
1. Cache Lookup Is this query already cached?
2. History Enrichment Prepend prior conversation context to query
3. Query Analysis Detect intent, complexity, keywords, budget
4. Budget Suggestion Adaptive learning adjusts budget from history
5. Retrieval Vector + BM25 + Metadata → RRF Fusion
6. Filtering Noise removal + length filter + deduplication
7. Ranking Composite score re-ranking
8. Composition Format chunks → truncate to budget
9. Guard Validation Empty? Too short? Source present? Relevant?
10. Cache Store Store result for future identical queries
11. History Record Save this turn to conversation history
12. Adaptive Record Log metrics for continuous self-tuningStrategy Extension Points
Every major component implements a Strategy interface and can be swapped without touching engine internals:
| Interface | Default Implementation | Swap It When |
|---|---|---|
QueryAnalyzerStrategy |
RuleBasedQueryAnalyzer |
You want LLM-powered intent detection |
HybridRetrieverStrategy |
HybridRetriever |
You have a custom vector store (Pinecone, Weaviate, etc.) |
ChunkFilterStrategy |
ChunkFilter |
You need domain-specific filtering rules |
ChunkRankingStrategy |
CompositeRanker |
You want learned reranking (cross-encoder, etc.) |
ContextComposerStrategy |
SmartContextComposer |
You need a custom prompt template |
ContextGuardStrategy |
ContextGuard |
You want LLM-based hallucination detection |
BudgetAllocationStrategy |
EqualBudgetAllocation |
You need custom multi-query budget logic |
Developer note: All components follow the Strategy Pattern via dependency injection. Pass any custom implementation directly to
ContextEngine.__init__()— no subclassing of the engine is required.
community/context.md