Fennec Community community/context.md

Context Modular

Query → QueryAnalyzer → HybridRetriever → ChunkFilter → CompositeRanker → SmartContextComposer → ContextGuard → AdaptiveLearning → ContextResult ```

Quick Start
ContextEngine — Main Orchestrator
ContextEngineConfig — Configuration
Data Models
HybridRetriever — Hybrid Retrieval
Pipeline — Filtering, Ranking & Composition
ContextGuard — Context Validator
AdaptiveLearningSystem — Adaptive Learning
ContextCache — Smart Cache
Budget Allocation
ConversationHistory — Multi-turn Support
Utility Functions
Reference Tables

1. Quick Start

from fennec_community.context import ContextEngine, ContextEngineConfig

# Simplest usage — all defaults applied
engine = ContextEngine()
engine.index(chunks=my_chunks, embed_fn=my_embed_function)

result = engine.run("What is machine learning?")
print(result.context)       # Context ready to feed into LLM
print(result.get_stats())   # Detailed pipeline statistics

# Advanced usage — full customization
from fennec_community.context import (
    ContextEngine, ContextEngineConfig,
    ComposerConfig, RetrieverConfig, CacheConfig
)

config = ContextEngineConfig(
    composer  = ComposerConfig(max_context_length=4000, template="english"),
    retriever = RetrieverConfig(top_k=30, vector_weight=0.7),
    cache     = CacheConfig(enabled=True, ttl_seconds=600),
)

engine = ContextEngine(config=config)
engine.index(chunks=my_chunks, embed_fn=my_embed_fn)

result = engine.run(
    query            = "Compare BERT and GPT architectures",
    metadata_filters = {"category": "nlp"},
    max_chunks       = 10,
)

2. ContextEngine — Main Orchestrator

Description: The unified entry point for the entire system. Orchestrates all pipeline stages from query intake to final output.

`ContextEngine.init`

ContextEngine(
    config:           Optional[ContextEngineConfig]      = None,
    query_analyzer:   Optional[QueryAnalyzerStrategy]    = None,
    retriever:        Optional[HybridRetriever]          = None,
    chunk_filter:     Optional[ChunkFilterStrategy]      = None,
    ranker:           Optional[ChunkRankingStrategy]     = None,
    composer:         Optional[ContextComposerStrategy]  = None,
    guard:            Optional[ContextGuardStrategy]     = None,
    budget_allocator: Optional[BudgetAllocationStrategy] = None,
    length_counter:   Optional[LengthCounter]            = None,
)

Purpose: Instantiate the engine and initialize all internal components. Every parameter is optional — sensible defaults are used when nothing is passed.

Parameter	Type	Description
`config`	`ContextEngineConfig`	Master configuration object. Falls back to defaults if omitted.
`query_analyzer`	`QueryAnalyzerStrategy`	Custom query analyzer (e.g. LLM-based). Optional.
`retriever`	`HybridRetriever`	Custom retrieval system or external vector store adapter. Optional.
`chunk_filter`	`ChunkFilterStrategy`	Custom filtering logic. Optional.
`ranker`	`ChunkRankingStrategy`	Custom ranking system. Optional.
`composer`	`ContextComposerStrategy`	Custom context composer. Optional.
`guard`	`ContextGuardStrategy`	Custom context guard (e.g. hallucination detector). Optional.
`budget_allocator`	`BudgetAllocationStrategy`	Budget distribution strategy for multi-query runs. Optional.
`length_counter`	`LengthCounter`	Length measurement function — character-based or token-based. Optional.

Returns: A fully initialized ContextEngine instance ready to use.

`ContextEngine.index`

engine.index(
    chunks:               List[Any],
    embed_fn:             Optional[Callable[[str], np.ndarray]] = None,
    vector_store_adapter: Optional[Any]                          = None,
    source_quality_map:   Optional[Dict[str, SourceQuality]]    = None,
) -> ContextEngine

Purpose: Index a collection of chunks into the system and prepare the retrieval layer. Must be called before run().

Parameter	Type	Description
`chunks`	`List[Any]`	List of chunk objects — any type that exposes a `text` and `doc_id` attribute.
`embed_fn`	`Callable[[str], np.ndarray]`	Embedding function for a single text string. Example: `lambda t: model.encode(t)`
`vector_store_adapter`	`Any`	Alternative to `embed_fn` — integrates with Chroma / FAISS / Pinecone. Must implement `similarity_search_with_score()`.
`source_quality_map`	`Dict[str, SourceQuality]`	Per-source quality hints `{"source_name": SourceQuality.HIGH}` to boost ranking.

Returns: self — supports method chaining.

# Method chaining example
result = (
    ContextEngine(config=my_config)
    .index(chunks=docs, embed_fn=embedder)
    .run("How does the BM25 algorithm work?")
)

`ContextEngine.run`

engine.run(
    query:            str,
    metadata_filters: Optional[Dict[str, Any]]  = None,
    max_chunks:       Optional[int]              = None,
    override_budget:  Optional[int]              = None,
    use_history:      bool                       = True,
    record_turn:      bool                       = True,
) -> ContextResult

Purpose: Execute the full pipeline for a single query and return the context prepared for the LLM. This is the primary method most developers will use.

Parameter	Type	Description
`query`	`str`	The user query — English, Arabic, or mixed.
`metadata_filters`	`Dict[str, Any]`	Metadata field filters. Example: `{"category": "tech", "year": 2024}`
`max_chunks`	`int`	Override the maximum number of retrieved chunks.
`override_budget`	`int`	Override the auto-computed context budget (in characters or tokens).
`use_history`	`bool`	Whether to enrich the query with prior conversation context. Default: `True`.
`record_turn`	`bool`	Whether to save this query turn in conversation history. Default: `True`.

Returns: ContextResult — contains the final context string plus detailed per-stage statistics.

result = engine.run(
    query            = "How do LSTM networks work?",
    metadata_filters = {"domain": "deep_learning"},
    override_budget  = 3000,
)

# Use the output
llm_prompt = f"Context:\n{result.context}\n\nQuestion: {result.query_analysis.original_query}"

# Inspect pipeline statistics
stats = result.get_stats()
print(f"Retrieved  : {stats['total_retrieved']} chunks")
print(f"Included   : {stats['total_included']} chunks")
print(f"Latency    : {stats['pipeline_ms']} ms")
print(f"Guard OK   : {stats['guard_passed']}")

`ContextEngine.run_multi`

engine.run_multi(
    queries:       List[str],
    global_budget: Optional[int] = None,
    use_weighted:  bool          = True,
    **kwargs,
) -> str

Purpose: Run the full pipeline over multiple queries in a single call (Multi-hop RAG). Distributes the total budget intelligently across queries based on their complexity, then merges the resulting contexts into one unified string.

Parameter	Type	Description
`queries`	`List[str]`	List of query strings to process.
`global_budget`	`int`	Total character/token budget shared across all queries.
`use_weighted`	`bool`	`True` = allocate budget by complexity; `False` = split equally.
`**kwargs`		Additional keyword arguments forwarded to each `run()` call.

Returns: str — a unified context string that concatenates the results of all queries, each labelled with its query index.

combined_context = engine.run_multi(
    queries = [
        "What is the Transformer architecture?",
        "How does the Attention Mechanism work?",
        "What is the difference between BERT and GPT?",
    ],
    global_budget = 6000,
    use_weighted  = True,   # More complex queries receive larger budget slices
)

`ContextEngine.build` (Legacy API)

engine.build(
    query:      str,
    chunks:     List[Tuple],   # [(chunk, score), ...]
    max_chunks: Optional[int] = None,
    max_length: Optional[int] = None,
) -> str

Purpose: Backward-compatible interface for projects that used the old ContextManager API. Accepts raw tuples instead of ScoredChunk objects.

Parameter	Type	Description
`query`	`str`	The user query.
`chunks`	`List[Tuple[Any, float]]`	List of `(chunk_object, relevance_score)` tuples.
`max_chunks`	`int`	Maximum number of chunks to include in context.
`max_length`	`int`	Maximum length of the output context in characters.

Returns: str — the composed context string.

`ContextEngine.compress_context`

engine.compress_context(context: str, target_length: int) -> str

Purpose: Compress an existing context string down to a target length while preserving as much meaning as possible. Useful when the LLM's context window is smaller than the assembled context.

Parameter	Type	Description
`context`	`str`	The original context string to compress.
`target_length`	`int`	The desired maximum length in characters or tokens.

Returns: str — the compressed context.

`ContextEngine.get_context_stats`

engine.get_context_stats(context: str) -> Dict[str, Any]

Purpose: Compute analytical statistics about an already-composed context string.

Parameter	Type	Description
`context`	`str`	The context string to analyse.

Returns: Dict containing:

Key	Description
`total_length`	Total length in characters or tokens
`chunk_count`	Number of distinct chunks in the context
`avg_chunk_length`	Average chunk length
`min_chunk_length` / `max_chunk_length`	Shortest / longest chunk
`template`	The template used for formatting
`length_unit`	`"chars"` or `"tokens"`

`ContextEngine.get_performance_report`

engine.get_performance_report() -> Dict[str, Any]

Purpose: Return a comprehensive performance report aggregated over all queries processed by this engine instance so far.

Returns: Dict containing:

Key	Description
`total_queries`	Total number of queries processed
`avg_response_ms`	Average end-to-end latency in milliseconds
`avg_context_length`	Average length of produced contexts
`avg_chunk_count`	Average number of chunks included per query
`guard_failure_rate`	Fraction of queries where ContextGuard failed (0.0 → 1.0)
`cache`	Cache statistics (present only when cache is enabled)

`ContextEngine.set_conversation`

engine.set_conversation(history: ConversationHistory) -> None

Purpose: Restore a previously saved conversation session. Useful for resuming context after an application restart without losing conversational continuity.

Parameter	Type	Description
`history`	`ConversationHistory`	A previously saved conversation history object.

`ContextEngine.get_history`

engine.get_history() -> Optional[ConversationHistory]

Purpose: Retrieve the current conversation history for serialisation, analysis, or handoff.

Returns: ConversationHistory instance, or None if multi_turn is disabled in config.

`ContextEngine.clear_cache`

engine.clear_cache() -> None

Purpose: Flush the entire cache. Should be called whenever the underlying chunk data changes (e.g. after re-indexing documents) to prevent stale context from being served.

3. ContextEngineConfig — Configuration

`ContextEngineConfig`

@dataclass
class ContextEngineConfig:
    query_analyzer:  QueryAnalyzerConfig = QueryAnalyzerConfig()
    retriever:       RetrieverConfig     = RetrieverConfig()
    filter_cfg:      FilterConfig        = FilterConfig()
    ranking:         RankingConfig       = RankingConfig()
    composer:        ComposerConfig      = ComposerConfig()
    guard:           GuardConfig         = GuardConfig()
    cache:           CacheConfig         = CacheConfig()
    language:        str  = "auto"       # "auto" | "ar" | "en"
    enable_adaptive: bool = True         # Enable adaptive learning
    multi_turn:      bool = True         # Enable multi-turn conversation support

Purpose: The unified master configuration object passed to ContextEngine. Each sub-component has its own isolated, fully-customisable config block.

`QueryAnalyzerConfig` — Query Analysis Settings

Controls how queries are analysed and how context budgets are assigned.

Parameter	Type	Default	Description
`budget_low`	`int`	`800`	Context budget (chars) for simple queries
`budget_medium`	`int`	`2000`	Context budget for medium-complexity queries
`budget_high`	`int`	`4000`	Context budget for complex queries
`custom_entities`	`List[str]`	`[]`	Domain-specific entity names to recognise in queries
`arabic_threshold`	`float`	`0.3`	Minimum fraction of Arabic characters to classify a query as Arabic

`RetrieverConfig` — Hybrid Retrieval Settings

Controls the weights and behaviour of the three retrieval methods.

Parameter	Type	Default	Description
`top_k`	`int`	`20`	Number of chunks to retrieve before filtering
`vector_weight`	`float`	`0.60`	Weight of semantic (vector) search results
`keyword_weight`	`float`	`0.30`	Weight of BM25 keyword search results
`metadata_weight`	`float`	`0.10`	Weight of metadata filtering results
`use_vector`	`bool`	`True`	Enable / disable semantic search
`use_keyword`	`bool`	`True`	Enable / disable BM25 search
`use_metadata`	`bool`	`False`	Enable / disable metadata-based filtering
`min_vector_score`	`float`	`0.0`	Minimum cosine similarity to accept a vector result

`FilterConfig` — Filtering Layer Settings

Controls the multi-stage chunk filtering pipeline.

Parameter	Type	Default	Description
`dedup_method`	`str`	`"hash"`	Deduplication strategy: `"hash"` \| `"prefix"` \| `"semantic"`
`min_chunk_length`	`int`	`20`	Chunks shorter than this are discarded
`max_chunk_length`	`int`	`4000`	Chunks longer than this are discarded
`semantic_sim_threshold`	`float`	`0.92`	Cosine similarity threshold for near-duplicate removal
`noise_patterns`	`List[str]`	(built-in)	Regex patterns that flag a chunk as noise (page numbers, dividers, etc.)

`RankingConfig` — Ranking Score Weights

Controls the weight of each factor in the composite ranking score.

Parameter	Type	Default	Description
`weight_vector_score`	`float`	`0.40`	Weight of semantic similarity in composite score
`weight_keyword_score`	`float`	`0.25`	Weight of keyword matching score
`weight_source_quality`	`float`	`0.20`	Weight of source quality signal
`weight_recency`	`float`	`0.10`	Weight of information recency signal
`weight_position`	`float`	`0.05`	Weight of chunk position within its source document
`max_chunks_to_rank`	`int`	`15`	Maximum number of chunks passed to the Composer after ranking

`ComposerConfig` — Context Formatting Settings

Controls how the final context string is formatted and truncated.

Parameter	Type	Default	Description
`max_context_length`	`int`	`2000`	Maximum length of the output context string
`separator`	`str`	`"\n---\n"`	Separator inserted between chunks
`template`	`str`	`"arabic"`	Formatting template: `"arabic"` \| `"english"` \| `"minimal"` \| `"structured"`
`include_scores`	`bool`	`False`	Append ranking scores to each chunk in the context
`include_metadata`	`bool`	`False`	Append metadata fields to each chunk
`include_sources`	`bool`	`True`	Include source attribution for each chunk
`group_by_source`	`bool`	`False`	Group chunks from the same source document together

`GuardConfig` — Context Guard Settings

Controls the validation checks applied to the assembled context.

Parameter	Type	Default	Description
`enabled`	`bool`	`True`	Enable / disable the guard entirely
`min_context_length`	`int`	`10`	Contexts shorter than this fail validation
`max_context_length`	`int`	`8000`	Contexts longer than this trigger a warning (does not fail)
`check_relevance`	`bool`	`False`	Semantic relevance check (requires an `embed_fn`)
`relevance_threshold`	`float`	`0.10`	Minimum cosine similarity between query and context

`CacheConfig` — LRU Cache Settings

Parameter	Type	Default	Description
`enabled`	`bool`	`True`	Enable / disable the cache
`max_size`	`int`	`256`	Maximum number of cached query results
`ttl_seconds`	`int`	`300`	Time-to-live for each cached entry, in seconds

4. Data Models

`ContextResult`

The final output object returned by engine.run().

@dataclass
class ContextResult:
    context:            str                  # Context string ready for LLM
    query_analysis:     QueryAnalysis        # Full query analysis result
    chunks_used:        List[ScoredChunk]    # Chunks actually included in context
    total_retrieved:    int                  # Chunks retrieved before filtering
    total_after_filter: int                  # Chunks remaining after filtering
    total_after_rank:   int                  # Chunks remaining after ranking
    total_included:     int                  # Chunks included in the final context
    context_length:     int                  # Context length in characters
    pipeline_ms:        float                # Total end-to-end latency
    guard_passed:       bool                 # Whether ContextGuard validation passed
    guard_warnings:     List[str]            # Guard warning messages (may be empty)
    metadata:           Dict[str, Any]       # Extra metadata (e.g. {"from_cache": True})

`ContextResult.get_stats`

result.get_stats() -> Dict[str, Any]

Purpose: Return a structured summary of all pipeline metrics in a single dictionary — convenient for logging, monitoring dashboards, or debugging.

Returns: Dict with keys: pipeline_ms, total_retrieved, total_after_filter, total_after_rank, total_included, context_length, guard_passed, guard_warnings, query_intent, query_complexity, context_budget.

`QueryAnalysis`

The result of query analysis. Used by every stage of the pipeline to make informed decisions.

@dataclass
class QueryAnalysis:
    original_query:  str
    normalized:      str             # Cleaned text (diacritics removed, whitespace normalised)
    intent:          QueryIntent     # FACTUAL | REASONING | CODE | COMPARATIVE | ...
    complexity:      QueryComplexity # LOW | MEDIUM | HIGH
    keywords:        List[str]       # Extracted keywords, ranked by relevance
    entities:        List[str]       # Named entities detected in the query
    is_arabic:       bool
    language:        str             # "ar" | "en"
    question_type:   str             # "wh-question" | "yes/no" | "open"
    context_budget:  int             # Recommended context size in characters

`QueryAnalysis.to_dict`

analysis.to_dict() -> Dict[str, Any]

Purpose: Serialise the analysis result to a JSON-compatible dictionary — useful for logging pipelines, analytics stores, or passing between services.

`ScoredChunk`

A unified wrapper around any chunk object, enriched with all retrieval and ranking scores.

@dataclass
class ScoredChunk:
    chunk:            Any            # Original chunk object (any type)
    vector_score:     float          # Cosine similarity from vector search
    keyword_score:    float          # BM25 score (normalised 0-1)
    metadata_score:   float          # Metadata filter match score
    source_quality:   SourceQuality  # HIGH | MEDIUM | LOW | UNKNOWN
    recency_score:    float          # Recency signal (0-1)
    composite_score:  float          # Final weighted ranking score
    retrieval_method: RetrievalMethod
    retrieval_rank:   int            # Original rank before re-ranking

Available properties:

Property	Type	Description
`.text`	`str`	Chunk text — reads from `text` or `page_content` attribute
`.doc_id`	`str`	Source document identifier
`.metadata`	`Dict`	Chunk metadata dictionary
`.source`	`str`	Source name or path
`.content_hash`	`str`	SHA-256 hash of normalised content
`.char_count`	`int`	Character count of the chunk text

`ScoredChunk.to_dict`

chunk.to_dict() -> Dict[str, Any]

Purpose: Serialise the scored chunk to a dictionary for logging or diagnostics. Includes a 120-character text preview.

Available Enums

Enum	Values	Used For
`QueryIntent`	`FACTUAL`, `REASONING`, `COMPARATIVE`, `PROCEDURAL`, `CODE`, `SUMMARIZE`, `UNKNOWN`	Query intent classification
`QueryComplexity`	`LOW`, `MEDIUM`, `HIGH`	Query complexity, drives budget allocation
`SourceQuality`	`HIGH=3`, `MEDIUM=2`, `LOW=1`, `UNKNOWN=0`	Source trustworthiness signal for ranking
`RetrievalMethod`	`VECTOR`, `KEYWORD`, `METADATA`, `HYBRID`	Which retriever produced a result

5. HybridRetriever — Hybrid Retrieval

`HybridRetriever.retrieve`

retriever.retrieve(
    query:   str,
    top_k:   Optional[int]           = None,
    filters: Optional[Dict[str, Any]] = None,
) -> List[ScoredChunk]

Purpose: Run hybrid retrieval (Vector + BM25 + Metadata) and fuse results using Reciprocal Rank Fusion (RRF). Chunks that appear across multiple retrieval methods automatically receive a score bonus, promoting diverse high-quality results.

RRF Formula:

rrf_score(chunk) = Σ  1 / (k + rank(chunk, list_i))     where k = 60

Parameter	Type	Description
`query`	`str`	The search query
`top_k`	`int`	Number of results to return (overrides `config.retriever.top_k`)
`filters`	`Dict[str, Any]`	Metadata field filters e.g. `{"category": "finance"}`

Returns: List[ScoredChunk] sorted descending by composite score.

`VectorRetriever.index_chunks`

vector_retriever.index_chunks(
    chunks:   List[Any],
    embed_fn: Callable[[str], np.ndarray],
) -> None

Purpose: Build the in-memory embedding index for semantic search.

Note: This is called internally by ContextEngine.index() — direct calls are rarely needed.

`KeywordRetriever.index_chunks`

keyword_retriever.index_chunks(chunks: List[Any]) -> None

Purpose: Build the BM25 index used for keyword-based retrieval. Works without any external dependencies and performs well on technical terminology.

6. Pipeline — Filtering, Ranking & Composition

`ChunkFilter.filter`

chunk_filter.filter(chunks: List[ScoredChunk]) -> List[ScoredChunk]

Purpose: Apply the multi-stage filtering pipeline to a list of retrieved chunks:

Noise removal — discards page numbers, blank lines, and separator lines via regex patterns.
Length filtering — discards chunks shorter than min_chunk_length or longer than max_chunk_length.
Hash-based deduplication — removes exact duplicate chunks using SHA-256 content hashes.

Parameter	Type	Description
`chunks`	`List[ScoredChunk]`	The raw list of retrieved chunks

Returns: List[ScoredChunk] — the cleaned, deduplicated subset.

`ChunkFilter.filter_with_embeddings`

chunk_filter.filter_with_embeddings(
    chunks:     List[ScoredChunk],
    embeddings: List[Optional[np.ndarray]],
) -> List[ScoredChunk]

Purpose: Same as filter() with an additional semantic near-duplicate detection pass. Chunks whose cosine similarity exceeds semantic_sim_threshold relative to an already-kept chunk are removed.

Parameter	Type	Description
`chunks`	`List[ScoredChunk]`	Retrieved chunks to filter
`embeddings`	`List[Optional[np.ndarray]]`	Pre-computed embeddings aligned with the chunk list

Returns: List[ScoredChunk] after all filtering stages.

`CompositeRanker.rank`

ranker.rank(
    chunks:   List[ScoredChunk],
    analysis: QueryAnalysis,
    max_n:    Optional[int] = None,
) -> List[ScoredChunk]

Purpose: Re-rank chunks by a weighted composite score that combines five signals: semantic similarity, keyword matching, source quality, recency, and document position.

Parameter	Type	Description
`chunks`	`List[ScoredChunk]`	Filtered chunks from `ChunkFilter`
`analysis`	`QueryAnalysis`	Query analysis result (provides keywords for keyword boost)
`max_n`	`int`	Maximum number of chunks to return after ranking

Returns: List[ScoredChunk] sorted descending by composite_score, capped at max_n.

`SmartContextComposer.compose`

composer.compose(
    chunks:   List[ScoredChunk],
    analysis: QueryAnalysis,
    budget:   int,
) -> str

Purpose: Format ranked chunks into a structured context string and ensure it never exceeds the allowed budget. Uses greedy inclusion with binary-search truncation as a safety clamp.

Parameter	Type	Description
`chunks`	`List[ScoredChunk]`	Ranked chunks from `CompositeRanker`
`analysis`	`QueryAnalysis`	Query analysis (selects template language)
`budget`	`int`	Maximum context length in characters or tokens

Returns: str — the fully formatted context string ready to be injected into an LLM prompt.

`SmartContextComposer.get_stats`

composer.get_stats(context: str) -> Dict[str, Any]

Purpose: Compute structural statistics about an already-composed context string.

Returns: Dict with: total_length, chunk_count, avg_chunk_length, min_chunk_length, max_chunk_length, template, length_unit.

7. ContextGuard — Context Validator

`ContextGuard.validate`

guard.validate(
    context:  str,
    query:    str,
    analysis: QueryAnalysis,
) -> Tuple[bool, List[str]]

Purpose: Validate the assembled context before it is delivered to the LLM. Runs up to five sequential checks and returns a pass/fail result along with human-readable warnings.

Checks performed:

#	Check	Configurable Via
1	Context is not empty	`check_empty`
2	Context meets minimum length	`min_context_length`
3	Context below maximum length (warning only)	`max_context_length`
4	At least one source attribution present	`check_source_coverage`
5	Semantic relevance to query (optional)	`check_relevance` + `relevance_threshold`

Parameter	Type	Description
`context`	`str`	The assembled context string
`query`	`str`	The original user query
`analysis`	`QueryAnalysis`	Query analysis result

Returns: Tuple[bool, List[str]]

bool — True = context is valid, False = validation failed
List[str] — list of warning/failure messages (empty on full pass)

# Standalone usage example
guard = ContextGuard(config)
passed, warnings = guard.validate(context, query, analysis)

if not passed:
    for w in warnings:
        logger.warning("ContextGuard: %s", w)
    # fallback: retrieve more chunks or reduce strictness

8. AdaptiveLearningSystem — Adaptive Learning

`AdaptiveLearningSystem.suggest_budget`

learning.suggest_budget(analysis: QueryAnalysis) -> int

Purpose: Suggest an optimised context budget by blending the rule-based estimate from QueryAnalysis with the rolling average of actual context consumption. If the engine consistently uses far less than its budget, this method automatically reduces the allocation (with a 30% headroom), improving throughput.

Parameter	Type	Description
`analysis`	`QueryAnalysis`	Current query analysis containing the base budget estimate

Returns: int — the recommended budget in characters (or tokens if a token counter is in use).

`AdaptiveLearningSystem.get_performance_report`

learning.get_performance_report() -> Dict[str, Any]

Purpose: Return a statistical performance summary computed over the most recent sliding window of queries (default window: 100 queries).

Returns: Dict with: total_queries, avg_response_ms, avg_context_length, avg_chunk_count, guard_failure_rate, samples.

9. ContextCache — Smart Cache

Managed automatically by ContextEngine — use directly only when you need explicit control.

`ContextCache.get`

cache.get(key: str) -> Optional[str]

Purpose: Retrieve a cached context string by its key. Automatically checks TTL and returns None for expired or missing entries.

Returns: str (the context) or None.

`ContextCache.put`

cache.put(key: str, context: str) -> None

Purpose: Store a context string under the given key. Applies LRU eviction — the oldest entry is removed when max_size is exceeded.

`ContextCache.make_key`

cache.make_key(
    query:   str,
    filters: Optional[Dict] = None,
) -> str

Purpose: Generate a stable, deterministic cache key from a query string and its optional metadata filters using an MD5 hash. Both query and filters contribute to the key, so the same query with different filters produces different keys.

Parameter	Type	Description
`query`	`str`	The user query (normalised to lowercase before hashing)
`filters`	`Dict`	Optional metadata filters included in the key

Returns: str — 32-character hex MD5 digest.

`ContextCache.stats` (property)

cache.stats -> Dict[str, Any]

Purpose: Return live cache health metrics without any side effects.

print(cache.stats)
# {
#   'size': 45, 'max_size': 256,
#   'hits': 120, 'misses': 30,
#   'hit_rate': '80.0%', 'ttl_sec': 300
# }

`ContextCache.clear`

cache.clear() -> None

Purpose: Flush all cached entries. Call this after re-indexing documents to prevent stale contexts from being served.

10. Budget Allocation

Used by engine.run_multi() to split a global budget across multiple queries.

`EqualBudgetAllocation.allocate`

EqualBudgetAllocation().allocate(
    total_budget: int,
    n_queries:    int,
) -> List[int]

Purpose: Divide the total budget equally among all queries. Any remainder from integer division is added to the first query's allocation.

Returns: List[int] — per-query budget allocations that sum to total_budget.

`WeightedBudgetAllocation.allocate`

WeightedBudgetAllocation(weights=[0.5, 0.3, 0.2]).allocate(
    total_budget: int,
    n_queries:    int,
) -> List[int]

Purpose: Distribute the budget according to pre-defined weights. Useful when you know in advance which queries are more important.

Parameter	Type	Description
`weights`	`List[float]`	Per-query importance weights (passed in `__init__`)
`total_budget`	`int`	Total budget to distribute
`n_queries`	`int`	Must equal `len(weights)`, otherwise raises `ValueError`

Returns: List[int] — per-query budgets proportional to the weights.

`ComplexityBasedAllocation.allocate`

ComplexityBasedAllocation().allocate(
    total_budget: int,
    n_queries:    int,
    analyses:     Optional[List[QueryAnalysis]] = None,
) -> List[int]

Purpose: Automatically distribute budget based on each query's detected complexity. No manual weight configuration needed — the engine derives weights from QueryAnalysis.complexity.

Complexity	Budget Weight
`LOW`	1.0×
`MEDIUM`	1.5×
`HIGH`	2.5×

Falls back to EqualBudgetAllocation if analyses is None or mismatched.

11. ConversationHistory — Multi-turn Support

`ConversationHistory.add_turn`

history.add_turn(
    query:    str,
    answer:   str        = "",
    context:  str        = "",
    analysis: Optional[QueryAnalysis] = None,
) -> ConversationTurn

Purpose: Append a new conversation turn (question + answer + context used) to the history. Automatically enforces the max_turns limit by discarding the oldest turn when the limit is reached.

Parameter	Type	Description
`query`	`str`	The user's question
`answer`	`str`	The LLM's answer (optional — can be added after generation)
`context`	`str`	The context string that was fed to the LLM
`analysis`	`QueryAnalysis`	The query analysis for this turn

Returns: The newly created ConversationTurn object.

`ConversationHistory.get_recent_context`

history.get_recent_context(n: int = 3) -> str

Purpose: Extract the last n Q&A pairs as a plain-text string, suitable for inclusion in a retrieval query to capture conversational context.

Returns: str formatted as:

Q: First question
A: First answer

Q: Second question
A: Second answer

`ConversationHistory.get_enriched_query`

history.get_enriched_query(
    current_query: str,
    n:             int = 2,
) -> str

Purpose: Enrich the current query by prepending recent conversation context. This resolves implicit references and pronouns ("it", "that", "the previous one") so the retriever finds the correct chunks even when the query is underspecified.

Parameter	Type	Description
`current_query`	`str`	The user's current question
`n`	`int`	Number of prior turns to include as context

Returns: str — the enriched query, or the original query unchanged if there is no prior history.

# Turn 1
engine.run("What is the Transformer model?")

# Turn 2 — "it" is ambiguous without history
engine.run("How does it handle long sequences?")
# Internally enriched to:
# "[Previous conversation:
#   Q: What is the Transformer model?
#   A: ...]
#  Current question: How does it handle long sequences?"

12. Utility Functions

`char_counter`

from fennec_community.context import char_counter

char_counter(text: str) -> int

Purpose: Count the number of characters in a string. This is the default length measurement function used throughout the engine when no custom length_counter is provided.

Returns: int — character count, or 0 for empty / None input.

`make_token_counter`

from fennec_community.context import make_token_counter

token_counter = make_token_counter(tokenizer)

Purpose: Create a token-based length counter from any tokenizer that implements .encode(text). Pass the returned function as length_counter to ContextEngine to make all budget limits operate in tokens instead of characters — essential when targeting a model with a token-count window.

Parameter	Type	Description
`tokenizer`	Any tokenizer	Must expose a `.encode(text)` method that returns a sequence

Returns: LengthCounter — a Callable[[str], int] that returns token count.

from transformers import AutoTokenizer
from fennec_community.context import make_token_counter, ContextEngine

tokenizer     = AutoTokenizer.from_pretrained("bert-base-uncased")
token_counter = make_token_counter(tokenizer)

engine = ContextEngine(length_counter=token_counter)
# All budget values now interpreted as token counts

`build_query_analyzer`

from fennec_community.context import build_query_analyzer

analyzer = build_query_analyzer(
    config:   ContextEngineConfig,
    strategy: Optional[QueryAnalyzerStrategy] = None,
) -> QueryAnalyzerStrategy

Purpose: Factory function that constructs a query analyzer. If a custom strategy is provided (e.g. an LLM-powered analyzer), it is returned as-is. Otherwise, a RuleBasedQueryAnalyzer is built from the provided config. Use this when you want to create and test an analyzer independently before wiring it into the engine.

Parameter	Type	Description
`config`	`ContextEngineConfig`	Engine configuration
`strategy`	`QueryAnalyzerStrategy`	Optional custom analyzer to use instead of the default

Returns: QueryAnalyzerStrategy instance ready to call .analyze(query).

13. Reference Tables

Context Templates

Template	Header	Chunk Format	Best For
`"arabic"`	`📚 المعلومات المسترجعة:`	`[المصدر: X]\nالنص`	Arabic-language applications
`"english"`	`📚 Retrieved Information:`	`[Source: X]\nText`	English-language applications
`"minimal"`	(none)	Raw text only	Compact prompts, token-tight scenarios
`"structured"`	`=== CONTEXT START ===`	`--- [N] Source ---\nText`	Systems requiring explicit delimiters

Examples End To End

from __future__ import annotations
import time
from dataclasses import dataclass, field
from typing import Optional, Dict
import numpy as np
from fennec_community.context import (
    ContextEngine,
    ContextEngineConfig,
    ContextManager,
    QueryAnalyzerConfig,
    RetrieverConfig,
    FilterConfig,
    RankingConfig,
    ComposerConfig,
    GuardConfig,
    CacheConfig,
    SourceQuality,
    ScoredChunk,
    char_counter,
)
from fennec_community.chunks import DocumentChunk

_WORD_EMBEDDINGS: Dict[str, np.ndarray] = {}
def mock_embed(text: str) -> np.ndarray:
    """
    simulations for embedding model for test
    """
    import re
    words = re.findall(r"\w+", text.lower())
    vec   = np.zeros(64)
    for w in words:
        if w not in _WORD_EMBEDDINGS:
            np.random.seed(hash(w) % (2**31))
            _WORD_EMBEDDINGS[w] = np.random.randn(64)
        vec += _WORD_EMBEDDINGS[w]
    norm = np.linalg.norm(vec)
    return (vec / norm) if norm > 0 else vec


KNOWLEDGE_BASE = [
    DocumentChunk(
        doc_id="ml_001",
        text=(
            "التعلم الآلي هو فرع من فروع الذكاء الاصطناعي يركز على بناء أنظمة تتعلم من البيانات. "
            "تستخدم هذه الأنظمة الخوارزميات لتحليل البيانات وتعلم الأنماط واتخاذ القرارات. "
            "من أبرز تطبيقاته: التعرف على الصور، ومعالجة اللغة الطبيعية، وأنظمة التوصية."
        ),
        metadata={"category": "ml", "lang": "ar", "date": "2024","source":"كتاب التعلم الآلي" },
    ),
    DocumentChunk(
        doc_id="dl_002",
        text=(
            "التعلم العميق هو مجال فرعي من التعلم الآلي يعتمد على الشبكات العصبية الاصطناعية "
            "ذات الطبقات المتعددة. تستطيع هذه الشبكات تعلم تمثيلات هرمية للبيانات. "
            "تُستخدم في رؤية الكمبيوتر، ومعالجة الكلام، والنصوص."
        ),
        metadata={"category": "dl", "lang": "ar", "date": "2024","source":"كتاب التعلم العميق" },
    ),
    DocumentChunk(
        doc_id="nlp_003",
        text=(
            "معالجة اللغة الطبيعية (NLP) هي مجال في الذكاء الاصطناعي يُعنى بفهم اللغة البشرية. "
            "يشمل مهام مثل: تصنيف النصوص، واستخراج المعلومات، والترجمة الآلية، "
            "وتوليد النصوص. نماذج مثل BERT وGPT غيّرت هذا المجال جذرياً."
        ),
        metadata={"category": "nlp", "lang": "ar", "date": "2024","source":"كتاب معالجة اللغة الطبيعية" },
    ),
    DocumentChunk(
        doc_id="rag_004",
        text=(
            "Retrieval-Augmented Generation (RAG) هي تقنية تجمع بين قواعد المعرفة الخارجية "
            "ونماذج اللغة الكبيرة. تُحسِّن دقة الإجابات وتقلل الـ hallucination. "
            "تعتمد على: استرجاع المعلومات ذات الصلة، وتضمينها في سياق الـ LLM."
        ),
        metadata={"category": "rag", "lang": "ar", "date": "2024"},
    ),
    DocumentChunk(
        doc_id="transformer_005",
        text=(
            "معمارية Transformer ثورت مجال معالجة اللغة الطبيعية منذ ورقة Attention is All You Need. "
            "تعتمد على آلية self-attention التي تُحدد العلاقات بين الكلمات في النص. "
            "أصبحت أساساً لنماذج مثل BERT وGPT وT5."
        ),
        metadata={"category": "transformer", "lang": "ar", "date": "2017"},
    ),
    DocumentChunk(
        doc_id="vector_006",
        text=(
            "قواعد البيانات المتجهية (Vector Databases) تُخزّن embeddings وتتيح البحث بالتشابه الدلالي. "
            "من أبرزها: Pinecone, Chroma, Weaviate, FAISS. "
            "تُستخدم في أنظمة RAG لاسترجاع المعلومات ذات الصلة بالاستعلام."
        ),
        metadata={"category": "infrastructure", "lang": "ar", "date": "2024"},
    ),
    DocumentChunk(
        doc_id="duplicate_007",  # chunk مكرر عمداً للاختبار
        text=(
            "التعلم الآلي هو فرع من فروع الذكاء الاصطناعي يركز على بناء أنظمة تتعلم من البيانات. "
            "تستخدم هذه الأنظمة الخوارزميات لتحليل البيانات وتعلم الأنماط واتخاذ القرارات."
        ),
        metadata={"category": "ml", "lang": "ar"},
    ),

    DocumentChunk(
        doc_id="llm_009",
        text=(
            "نماذج اللغة الكبيرة (LLMs) مثل GPT-4 وClaude وGemini تُنتج نصاً يشبه الكتابة البشرية. "
            "تُدرَّب على مليارات الكلمات وتستطيع الإجابة على الأسئلة، والكتابة، والبرمجة. "
            "التحديات الرئيسية تشمل: الـ hallucination، والتحيز، والتكلفة الحسابية."
        ),
        metadata={"category": "llm", "lang": "ar", "date": "2024"},
    ),
    DocumentChunk(
        doc_id="fine_tuning_010",
        text=(
            "Fine-tuning هي عملية تدريب نموذج مُدرَّب مسبقاً على بيانات خاصة بمجال معين. "
            "تُحسِّن الأداء في مهام محددة دون الحاجة إلى التدريب من الصفر. "
            "LoRA وQLoRA هما تقنيتان شائعتان لـ fine-tuning فعّال من حيث الموارد."
        ),
        metadata={"category": "training", "lang": "ar", "date": "2024"},
    ),
]

def example_basic():
    print("\n" + "="*60)
    print("📌 example 1 : basic using")
    print("="*60)

    engine = ContextEngine()  # buila a context engine with default config


    # initializing index cuhnks for retrieval 
    engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)

    # run the engine with a query
    result = engine.run("ما هو التعلم الآلي؟")

    print(f"\n🔍 الاستعلام: ما هو التعلم الآلي؟")
    print(f"📊 النية: {result.query_analysis.intent.value}")
    print(f"📊 التعقيد: {result.query_analysis.complexity.value}")
    print(f"🔑 الكلمات المفتاحية: {result.query_analysis.keywords[:5]}")
    print(f"\n📄 السياق المُركَّب:\n{result.context}")
    print(f"\n📈 الإحصائيات:")
    stats = result.get_stats()
    for k, v in stats.items():
        print(f"   {k}: {v}")

def example_custom_config():
    print("\n" + "="*60)
    print("📌 example 2:custom config using")
    print("="*60)

    config = ContextEngineConfig(
        query_analyzer = QueryAnalyzerConfig(
            budget_low    = 600,
            budget_medium = 1500,
            budget_high   = 3500,
        ),
        retriever = RetrieverConfig(
            top_k          = 15,
            vector_weight  = 0.65,
            keyword_weight = 0.30,
            metadata_weight= 0.05,
        ),
        filter_cfg = FilterConfig(
            dedup_method   = "hash",
            min_chunk_length = 30,
        ),
        ranking = RankingConfig(
            weight_vector_score   = 0.45,
            weight_keyword_score  = 0.30,
            weight_source_quality = 0.15,
            weight_recency        = 0.10,
            max_chunks_to_rank    = 5,
        ),
        composer = ComposerConfig(
            max_context_length = 2500,
            template           = "arabic",
            include_scores     = True,
            include_metadata   = False,
            group_by_source    = False,
        ),
        guard = GuardConfig(
            enabled              = True,
            min_context_length   = 50,
            check_source_coverage= True,
        ),
        cache = CacheConfig(
            enabled     = True,
            max_size    = 128,
            ttl_seconds = 600,
        ),
    )

    engine = ContextEngine(config=config)

    # specific for source quality
    source_quality = {
        "كتاب التعلم الآلي":   SourceQuality.HIGH,
        "كتاب التعلم العميق":  SourceQuality.HIGH,
        "مجلة NLP":            SourceQuality.MEDIUM,
        "بحث RAG":             SourceQuality.HIGH,
        "مصدر مكرر":           SourceQuality.LOW,
    }

    engine.index(
        chunks             = KNOWLEDGE_BASE,
        embed_fn           = mock_embed,
        source_quality_map = source_quality,
    )

    
    result = engine.run(
        "قارن بين التعلم الآلي والتعلم العميق، وما هي أبرز الفروقات؟",
        metadata_filters={"lang": "ar"},
    )

    print(f"\n🔍 الاستعلام: قارن بين التعلم الآلي والتعلم العميق")
    print(f"📊 النية: {result.query_analysis.intent.value}")
    print(f"📊 التعقيد: {result.query_analysis.complexity.value}")
    print(f"💰 الميزانية: {result.query_analysis.context_budget} حرف")
    print(f"🛡️ Guard: {'✓ نجح' if result.guard_passed else '✗ فشل'}")
    print(f"\n📄 السياق:\n{result.context}")


def example_multi_turn():
    print("\n" + "="*60)
    print("📌 example 3 :multi conversion turn using")
    print("="*60)

    engine = ContextEngine(
        config=ContextEngineConfig(multi_turn=True)
    )
    engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)

    conversation = [
        "ما هو التعلم الآلي؟",
        "وما علاقته بالتعلم العميق؟",
        "كيف يُستخدم في RAG؟",
    ]

    for i, query in enumerate(conversation):
        result = engine.run(query, use_history=True, record_turn=True)

        # simulations for llm response
        fake_answer = f"[إجابة LLM للسؤال {i+1}]"

        # record the answer in conversion history
        if engine.get_history() and engine.get_history().turns:
            engine.get_history().turns[-1].answer = fake_answer

        print(f"\n💬 السؤال {i+1}: {query}")
        print(f"   📊 الاستعلام المُثرى: {result.query_analysis.normalized[:80]}...")
        print(f"   📄 طول السياق: {result.context_length} حرف")
        print(f"   ⏱️ الوقت: {result.pipeline_ms:.1f}ms")

    # statistics for the conversation history
    history = engine.get_history()
    print(f"\n📚 تاريخ المحادثة: {len(history.turns)} دورة")


def example_cache_performance():
    print("\n" + "="*60)
    print("📌 example 4 : using cache performance")
    print("="*60)

    engine = ContextEngine(
        config=ContextEngineConfig(
            cache=CacheConfig(enabled=True, max_size=50, ttl_seconds=60)
        )
    )
    engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)

    query = "ما هو التعلم الآلي؟"

    # first query (cache miss)
    t0 = time.perf_counter()
    r1 = engine.run(query)
    t1 = (time.perf_counter() - t0) * 1000

    # same query again (cache hit)
    t0 = time.perf_counter()
    r2 = engine.run(query)
    t2 = (time.perf_counter() - t0) * 1000

    print(f"\n📊 الاستعلام الأول (cache miss): {t1:.1f}ms")
    print(f"📊 الاستعلام الثاني (cache hit):  {t2:.1f}ms")
    print(f"⚡ تسريع: {t1/t2:.1f}x" if t2 > 0 else "")

    # statistics for the cache
    perf = engine.get_performance_report()
    print(f"\n📈 تقرير الأداء:")
    for k, v in perf.items():
        if isinstance(v, dict):
            print(f"   {k}:")
            for kk, vv in v.items():
                print(f"      {kk}: {vv}")
        else:
            print(f"   {k}: {v}")

Composite Ranking Formula

composite_score =
    vector_score   × 0.40    (semantic similarity)
  + keyword_boost  × 0.25    (query keyword matches)
  + source_quality × 0.20    (source trustworthiness)
  + recency_score  × 0.10    (information freshness)
  + position_score × 0.05    (chunk position in document)

All weights are configurable via RankingConfig.

Query Complexity and Default Budgets

Complexity	Default Budget	Example Queries
`LOW`	800 chars	"What is Python?", "Who invented the internet?"
`MEDIUM`	2000 chars	"Explain the concept of deep learning"
`HIGH`	4000 chars	"Compare Transformer and LSTM architectures in terms of..."

Full Pipeline Step-by-Step

 1. Cache Lookup          Is this query already cached?
 2. History Enrichment    Prepend prior conversation context to query
 3. Query Analysis        Detect intent, complexity, keywords, budget
 4. Budget Suggestion     Adaptive learning adjusts budget from history
 5. Retrieval             Vector + BM25 + Metadata  →  RRF Fusion
 6. Filtering             Noise removal + length filter + deduplication
 7. Ranking               Composite score re-ranking
 8. Composition           Format chunks → truncate to budget
 9. Guard Validation      Empty? Too short? Source present? Relevant?
10. Cache Store           Store result for future identical queries
11. History Record        Save this turn to conversation history
12. Adaptive Record       Log metrics for continuous self-tuning

Strategy Extension Points

Every major component implements a Strategy interface and can be swapped without touching engine internals:

Interface	Default Implementation	Swap It When
`QueryAnalyzerStrategy`	`RuleBasedQueryAnalyzer`	You want LLM-powered intent detection
`HybridRetrieverStrategy`	`HybridRetriever`	You have a custom vector store (Pinecone, Weaviate, etc.)
`ChunkFilterStrategy`	`ChunkFilter`	You need domain-specific filtering rules
`ChunkRankingStrategy`	`CompositeRanker`	You want learned reranking (cross-encoder, etc.)
`ContextComposerStrategy`	`SmartContextComposer`	You need a custom prompt template
`ContextGuardStrategy`	`ContextGuard`	You want LLM-based hallucination detection
`BudgetAllocationStrategy`	`EqualBudgetAllocation`	You need custom multi-query budget logic

Developer note: All components follow the Strategy Pattern via dependency injection. Pass any custom implementation directly to ContextEngine.__init__() — no subclassing of the engine is required.

Source: community/context.md

Table of Contents

1. Quick Start

2. ContextEngine — Main Orchestrator

ContextEngine.__init__

ContextEngine.index

ContextEngine.run

ContextEngine.run_multi

ContextEngine.build (Legacy API)

ContextEngine.compress_context

ContextEngine.get_context_stats

ContextEngine.get_performance_report

ContextEngine.set_conversation

ContextEngine.get_history

ContextEngine.clear_cache

3. ContextEngineConfig — Configuration

ContextEngineConfig

QueryAnalyzerConfig — Query Analysis Settings

RetrieverConfig — Hybrid Retrieval Settings

FilterConfig — Filtering Layer Settings

RankingConfig — Ranking Score Weights

ComposerConfig — Context Formatting Settings

GuardConfig — Context Guard Settings

CacheConfig — LRU Cache Settings

4. Data Models

ContextResult

ContextResult.get_stats

QueryAnalysis

QueryAnalysis.to_dict

ScoredChunk

ScoredChunk.to_dict