Fennec Logo Fennec
Fennec Community community/context.md

Context Modular

Query → QueryAnalyzer → HybridRetriever → ChunkFilter → CompositeRanker → SmartContextComposer → ContextGuard → AdaptiveLearning → ContextResult ```


Table of Contents

  1. Quick Start
  2. ContextEngine — Main Orchestrator
  3. ContextEngineConfig — Configuration
  4. Data Models
  5. HybridRetriever — Hybrid Retrieval
  6. Pipeline — Filtering, Ranking & Composition
  7. ContextGuard — Context Validator
  8. AdaptiveLearningSystem — Adaptive Learning
  9. ContextCache — Smart Cache
  10. Budget Allocation
  11. ConversationHistory — Multi-turn Support
  12. Utility Functions
  13. Reference Tables

1. Quick Start

from fennec_community.context import ContextEngine, ContextEngineConfig

# Simplest usage — all defaults applied
engine = ContextEngine()
engine.index(chunks=my_chunks, embed_fn=my_embed_function)

result = engine.run("What is machine learning?")
print(result.context)       # Context ready to feed into LLM
print(result.get_stats())   # Detailed pipeline statistics
# Advanced usage — full customization
from fennec_community.context import (
    ContextEngine, ContextEngineConfig,
    ComposerConfig, RetrieverConfig, CacheConfig
)

config = ContextEngineConfig(
    composer  = ComposerConfig(max_context_length=4000, template="english"),
    retriever = RetrieverConfig(top_k=30, vector_weight=0.7),
    cache     = CacheConfig(enabled=True, ttl_seconds=600),
)

engine = ContextEngine(config=config)
engine.index(chunks=my_chunks, embed_fn=my_embed_fn)

result = engine.run(
    query            = "Compare BERT and GPT architectures",
    metadata_filters = {"category": "nlp"},
    max_chunks       = 10,
)

2. ContextEngine — Main Orchestrator

Description: The unified entry point for the entire system. Orchestrates all pipeline stages from query intake to final output.


ContextEngine.__init__

ContextEngine(
    config:           Optional[ContextEngineConfig]      = None,
    query_analyzer:   Optional[QueryAnalyzerStrategy]    = None,
    retriever:        Optional[HybridRetriever]          = None,
    chunk_filter:     Optional[ChunkFilterStrategy]      = None,
    ranker:           Optional[ChunkRankingStrategy]     = None,
    composer:         Optional[ContextComposerStrategy]  = None,
    guard:            Optional[ContextGuardStrategy]     = None,
    budget_allocator: Optional[BudgetAllocationStrategy] = None,
    length_counter:   Optional[LengthCounter]            = None,
)

Purpose: Instantiate the engine and initialize all internal components. Every parameter is optional — sensible defaults are used when nothing is passed.

Parameter Type Description
config ContextEngineConfig Master configuration object. Falls back to defaults if omitted.
query_analyzer QueryAnalyzerStrategy Custom query analyzer (e.g. LLM-based). Optional.
retriever HybridRetriever Custom retrieval system or external vector store adapter. Optional.
chunk_filter ChunkFilterStrategy Custom filtering logic. Optional.
ranker ChunkRankingStrategy Custom ranking system. Optional.
composer ContextComposerStrategy Custom context composer. Optional.
guard ContextGuardStrategy Custom context guard (e.g. hallucination detector). Optional.
budget_allocator BudgetAllocationStrategy Budget distribution strategy for multi-query runs. Optional.
length_counter LengthCounter Length measurement function — character-based or token-based. Optional.

Returns: A fully initialized ContextEngine instance ready to use.


ContextEngine.index

engine.index(
    chunks:               List[Any],
    embed_fn:             Optional[Callable[[str], np.ndarray]] = None,
    vector_store_adapter: Optional[Any]                          = None,
    source_quality_map:   Optional[Dict[str, SourceQuality]]    = None,
) -> ContextEngine

Purpose: Index a collection of chunks into the system and prepare the retrieval layer. Must be called before run().

Parameter Type Description
chunks List[Any] List of chunk objects — any type that exposes a text and doc_id attribute.
embed_fn Callable[[str], np.ndarray] Embedding function for a single text string. Example: lambda t: model.encode(t)
vector_store_adapter Any Alternative to embed_fn — integrates with Chroma / FAISS / Pinecone. Must implement similarity_search_with_score().
source_quality_map Dict[str, SourceQuality] Per-source quality hints {"source_name": SourceQuality.HIGH} to boost ranking.

Returns: self — supports method chaining.

# Method chaining example
result = (
    ContextEngine(config=my_config)
    .index(chunks=docs, embed_fn=embedder)
    .run("How does the BM25 algorithm work?")
)

ContextEngine.run

engine.run(
    query:            str,
    metadata_filters: Optional[Dict[str, Any]]  = None,
    max_chunks:       Optional[int]              = None,
    override_budget:  Optional[int]              = None,
    use_history:      bool                       = True,
    record_turn:      bool                       = True,
) -> ContextResult

Purpose: Execute the full pipeline for a single query and return the context prepared for the LLM. This is the primary method most developers will use.

Parameter Type Description
query str The user query — English, Arabic, or mixed.
metadata_filters Dict[str, Any] Metadata field filters. Example: {"category": "tech", "year": 2024}
max_chunks int Override the maximum number of retrieved chunks.
override_budget int Override the auto-computed context budget (in characters or tokens).
use_history bool Whether to enrich the query with prior conversation context. Default: True.
record_turn bool Whether to save this query turn in conversation history. Default: True.

Returns: ContextResult — contains the final context string plus detailed per-stage statistics.

result = engine.run(
    query            = "How do LSTM networks work?",
    metadata_filters = {"domain": "deep_learning"},
    override_budget  = 3000,
)

# Use the output
llm_prompt = f"Context:\n{result.context}\n\nQuestion: {result.query_analysis.original_query}"

# Inspect pipeline statistics
stats = result.get_stats()
print(f"Retrieved  : {stats['total_retrieved']} chunks")
print(f"Included   : {stats['total_included']} chunks")
print(f"Latency    : {stats['pipeline_ms']} ms")
print(f"Guard OK   : {stats['guard_passed']}")

ContextEngine.run_multi

engine.run_multi(
    queries:       List[str],
    global_budget: Optional[int] = None,
    use_weighted:  bool          = True,
    **kwargs,
) -> str

Purpose: Run the full pipeline over multiple queries in a single call (Multi-hop RAG). Distributes the total budget intelligently across queries based on their complexity, then merges the resulting contexts into one unified string.

Parameter Type Description
queries List[str] List of query strings to process.
global_budget int Total character/token budget shared across all queries.
use_weighted bool True = allocate budget by complexity; False = split equally.
**kwargs Additional keyword arguments forwarded to each run() call.

Returns: str — a unified context string that concatenates the results of all queries, each labelled with its query index.

combined_context = engine.run_multi(
    queries = [
        "What is the Transformer architecture?",
        "How does the Attention Mechanism work?",
        "What is the difference between BERT and GPT?",
    ],
    global_budget = 6000,
    use_weighted  = True,   # More complex queries receive larger budget slices
)

ContextEngine.build (Legacy API)

engine.build(
    query:      str,
    chunks:     List[Tuple],   # [(chunk, score), ...]
    max_chunks: Optional[int] = None,
    max_length: Optional[int] = None,
) -> str

Purpose: Backward-compatible interface for projects that used the old ContextManager API. Accepts raw tuples instead of ScoredChunk objects.

Parameter Type Description
query str The user query.
chunks List[Tuple[Any, float]] List of (chunk_object, relevance_score) tuples.
max_chunks int Maximum number of chunks to include in context.
max_length int Maximum length of the output context in characters.

Returns: str — the composed context string.


ContextEngine.compress_context

engine.compress_context(context: str, target_length: int) -> str

Purpose: Compress an existing context string down to a target length while preserving as much meaning as possible. Useful when the LLM's context window is smaller than the assembled context.

Parameter Type Description
context str The original context string to compress.
target_length int The desired maximum length in characters or tokens.

Returns: str — the compressed context.


ContextEngine.get_context_stats

engine.get_context_stats(context: str) -> Dict[str, Any]

Purpose: Compute analytical statistics about an already-composed context string.

Parameter Type Description
context str The context string to analyse.

Returns: Dict containing:

Key Description
total_length Total length in characters or tokens
chunk_count Number of distinct chunks in the context
avg_chunk_length Average chunk length
min_chunk_length / max_chunk_length Shortest / longest chunk
template The template used for formatting
length_unit "chars" or "tokens"

ContextEngine.get_performance_report

engine.get_performance_report() -> Dict[str, Any]

Purpose: Return a comprehensive performance report aggregated over all queries processed by this engine instance so far.

Returns: Dict containing:

Key Description
total_queries Total number of queries processed
avg_response_ms Average end-to-end latency in milliseconds
avg_context_length Average length of produced contexts
avg_chunk_count Average number of chunks included per query
guard_failure_rate Fraction of queries where ContextGuard failed (0.0 → 1.0)
cache Cache statistics (present only when cache is enabled)

ContextEngine.set_conversation

engine.set_conversation(history: ConversationHistory) -> None

Purpose: Restore a previously saved conversation session. Useful for resuming context after an application restart without losing conversational continuity.

Parameter Type Description
history ConversationHistory A previously saved conversation history object.

ContextEngine.get_history

engine.get_history() -> Optional[ConversationHistory]

Purpose: Retrieve the current conversation history for serialisation, analysis, or handoff.

Returns: ConversationHistory instance, or None if multi_turn is disabled in config.


ContextEngine.clear_cache

engine.clear_cache() -> None

Purpose: Flush the entire cache. Should be called whenever the underlying chunk data changes (e.g. after re-indexing documents) to prevent stale context from being served.


3. ContextEngineConfig — Configuration

ContextEngineConfig

@dataclass
class ContextEngineConfig:
    query_analyzer:  QueryAnalyzerConfig = QueryAnalyzerConfig()
    retriever:       RetrieverConfig     = RetrieverConfig()
    filter_cfg:      FilterConfig        = FilterConfig()
    ranking:         RankingConfig       = RankingConfig()
    composer:        ComposerConfig      = ComposerConfig()
    guard:           GuardConfig         = GuardConfig()
    cache:           CacheConfig         = CacheConfig()
    language:        str  = "auto"       # "auto" | "ar" | "en"
    enable_adaptive: bool = True         # Enable adaptive learning
    multi_turn:      bool = True         # Enable multi-turn conversation support

Purpose: The unified master configuration object passed to ContextEngine. Each sub-component has its own isolated, fully-customisable config block.


QueryAnalyzerConfig — Query Analysis Settings

Controls how queries are analysed and how context budgets are assigned.

Parameter Type Default Description
budget_low int 800 Context budget (chars) for simple queries
budget_medium int 2000 Context budget for medium-complexity queries
budget_high int 4000 Context budget for complex queries
custom_entities List[str] [] Domain-specific entity names to recognise in queries
arabic_threshold float 0.3 Minimum fraction of Arabic characters to classify a query as Arabic

RetrieverConfig — Hybrid Retrieval Settings

Controls the weights and behaviour of the three retrieval methods.

Parameter Type Default Description
top_k int 20 Number of chunks to retrieve before filtering
vector_weight float 0.60 Weight of semantic (vector) search results
keyword_weight float 0.30 Weight of BM25 keyword search results
metadata_weight float 0.10 Weight of metadata filtering results
use_vector bool True Enable / disable semantic search
use_keyword bool True Enable / disable BM25 search
use_metadata bool False Enable / disable metadata-based filtering
min_vector_score float 0.0 Minimum cosine similarity to accept a vector result

FilterConfig — Filtering Layer Settings

Controls the multi-stage chunk filtering pipeline.

Parameter Type Default Description
dedup_method str "hash" Deduplication strategy: "hash" | "prefix" | "semantic"
min_chunk_length int 20 Chunks shorter than this are discarded
max_chunk_length int 4000 Chunks longer than this are discarded
semantic_sim_threshold float 0.92 Cosine similarity threshold for near-duplicate removal
noise_patterns List[str] (built-in) Regex patterns that flag a chunk as noise (page numbers, dividers, etc.)

RankingConfig — Ranking Score Weights

Controls the weight of each factor in the composite ranking score.

Parameter Type Default Description
weight_vector_score float 0.40 Weight of semantic similarity in composite score
weight_keyword_score float 0.25 Weight of keyword matching score
weight_source_quality float 0.20 Weight of source quality signal
weight_recency float 0.10 Weight of information recency signal
weight_position float 0.05 Weight of chunk position within its source document
max_chunks_to_rank int 15 Maximum number of chunks passed to the Composer after ranking

ComposerConfig — Context Formatting Settings

Controls how the final context string is formatted and truncated.

Parameter Type Default Description
max_context_length int 2000 Maximum length of the output context string
separator str "\n---\n" Separator inserted between chunks
template str "arabic" Formatting template: "arabic" | "english" | "minimal" | "structured"
include_scores bool False Append ranking scores to each chunk in the context
include_metadata bool False Append metadata fields to each chunk
include_sources bool True Include source attribution for each chunk
group_by_source bool False Group chunks from the same source document together

GuardConfig — Context Guard Settings

Controls the validation checks applied to the assembled context.

Parameter Type Default Description
enabled bool True Enable / disable the guard entirely
min_context_length int 10 Contexts shorter than this fail validation
max_context_length int 8000 Contexts longer than this trigger a warning (does not fail)
check_relevance bool False Semantic relevance check (requires an embed_fn)
relevance_threshold float 0.10 Minimum cosine similarity between query and context

CacheConfig — LRU Cache Settings

Parameter Type Default Description
enabled bool True Enable / disable the cache
max_size int 256 Maximum number of cached query results
ttl_seconds int 300 Time-to-live for each cached entry, in seconds

4. Data Models

ContextResult

The final output object returned by engine.run().

@dataclass
class ContextResult:
    context:            str                  # Context string ready for LLM
    query_analysis:     QueryAnalysis        # Full query analysis result
    chunks_used:        List[ScoredChunk]    # Chunks actually included in context
    total_retrieved:    int                  # Chunks retrieved before filtering
    total_after_filter: int                  # Chunks remaining after filtering
    total_after_rank:   int                  # Chunks remaining after ranking
    total_included:     int                  # Chunks included in the final context
    context_length:     int                  # Context length in characters
    pipeline_ms:        float                # Total end-to-end latency
    guard_passed:       bool                 # Whether ContextGuard validation passed
    guard_warnings:     List[str]            # Guard warning messages (may be empty)
    metadata:           Dict[str, Any]       # Extra metadata (e.g. {"from_cache": True})

ContextResult.get_stats

result.get_stats() -> Dict[str, Any]

Purpose: Return a structured summary of all pipeline metrics in a single dictionary — convenient for logging, monitoring dashboards, or debugging.

Returns: Dict with keys: pipeline_ms, total_retrieved, total_after_filter, total_after_rank, total_included, context_length, guard_passed, guard_warnings, query_intent, query_complexity, context_budget.


QueryAnalysis

The result of query analysis. Used by every stage of the pipeline to make informed decisions.

@dataclass
class QueryAnalysis:
    original_query:  str
    normalized:      str             # Cleaned text (diacritics removed, whitespace normalised)
    intent:          QueryIntent     # FACTUAL | REASONING | CODE | COMPARATIVE | ...
    complexity:      QueryComplexity # LOW | MEDIUM | HIGH
    keywords:        List[str]       # Extracted keywords, ranked by relevance
    entities:        List[str]       # Named entities detected in the query
    is_arabic:       bool
    language:        str             # "ar" | "en"
    question_type:   str             # "wh-question" | "yes/no" | "open"
    context_budget:  int             # Recommended context size in characters

QueryAnalysis.to_dict

analysis.to_dict() -> Dict[str, Any]

Purpose: Serialise the analysis result to a JSON-compatible dictionary — useful for logging pipelines, analytics stores, or passing between services.


ScoredChunk

A unified wrapper around any chunk object, enriched with all retrieval and ranking scores.

@dataclass
class ScoredChunk:
    chunk:            Any            # Original chunk object (any type)
    vector_score:     float          # Cosine similarity from vector search
    keyword_score:    float          # BM25 score (normalised 0-1)
    metadata_score:   float          # Metadata filter match score
    source_quality:   SourceQuality  # HIGH | MEDIUM | LOW | UNKNOWN
    recency_score:    float          # Recency signal (0-1)
    composite_score:  float          # Final weighted ranking score
    retrieval_method: RetrievalMethod
    retrieval_rank:   int            # Original rank before re-ranking

Available properties:

Property Type Description
.text str Chunk text — reads from text or page_content attribute
.doc_id str Source document identifier
.metadata Dict Chunk metadata dictionary
.source str Source name or path
.content_hash str SHA-256 hash of normalised content
.char_count int Character count of the chunk text

ScoredChunk.to_dict

chunk.to_dict() -> Dict[str, Any]

Purpose: Serialise the scored chunk to a dictionary for logging or diagnostics. Includes a 120-character text preview.


Available Enums

Enum Values Used For
QueryIntent FACTUAL, REASONING, COMPARATIVE, PROCEDURAL, CODE, SUMMARIZE, UNKNOWN Query intent classification
QueryComplexity LOW, MEDIUM, HIGH Query complexity, drives budget allocation
SourceQuality HIGH=3, MEDIUM=2, LOW=1, UNKNOWN=0 Source trustworthiness signal for ranking
RetrievalMethod VECTOR, KEYWORD, METADATA, HYBRID Which retriever produced a result

5. HybridRetriever — Hybrid Retrieval

HybridRetriever.retrieve

retriever.retrieve(
    query:   str,
    top_k:   Optional[int]           = None,
    filters: Optional[Dict[str, Any]] = None,
) -> List[ScoredChunk]

Purpose: Run hybrid retrieval (Vector + BM25 + Metadata) and fuse results using Reciprocal Rank Fusion (RRF). Chunks that appear across multiple retrieval methods automatically receive a score bonus, promoting diverse high-quality results.

RRF Formula:

rrf_score(chunk) = Σ  1 / (k + rank(chunk, list_i))     where k = 60
Parameter Type Description
query str The search query
top_k int Number of results to return (overrides config.retriever.top_k)
filters Dict[str, Any] Metadata field filters e.g. {"category": "finance"}

Returns: List[ScoredChunk] sorted descending by composite score.


VectorRetriever.index_chunks

vector_retriever.index_chunks(
    chunks:   List[Any],
    embed_fn: Callable[[str], np.ndarray],
) -> None

Purpose: Build the in-memory embedding index for semantic search.

Note: This is called internally by ContextEngine.index() — direct calls are rarely needed.


KeywordRetriever.index_chunks

keyword_retriever.index_chunks(chunks: List[Any]) -> None

Purpose: Build the BM25 index used for keyword-based retrieval. Works without any external dependencies and performs well on technical terminology.


6. Pipeline — Filtering, Ranking & Composition

ChunkFilter.filter

chunk_filter.filter(chunks: List[ScoredChunk]) -> List[ScoredChunk]

Purpose: Apply the multi-stage filtering pipeline to a list of retrieved chunks:

  1. Noise removal — discards page numbers, blank lines, and separator lines via regex patterns.
  2. Length filtering — discards chunks shorter than min_chunk_length or longer than max_chunk_length.
  3. Hash-based deduplication — removes exact duplicate chunks using SHA-256 content hashes.
Parameter Type Description
chunks List[ScoredChunk] The raw list of retrieved chunks

Returns: List[ScoredChunk] — the cleaned, deduplicated subset.


ChunkFilter.filter_with_embeddings

chunk_filter.filter_with_embeddings(
    chunks:     List[ScoredChunk],
    embeddings: List[Optional[np.ndarray]],
) -> List[ScoredChunk]

Purpose: Same as filter() with an additional semantic near-duplicate detection pass. Chunks whose cosine similarity exceeds semantic_sim_threshold relative to an already-kept chunk are removed.

Parameter Type Description
chunks List[ScoredChunk] Retrieved chunks to filter
embeddings List[Optional[np.ndarray]] Pre-computed embeddings aligned with the chunk list

Returns: List[ScoredChunk] after all filtering stages.


CompositeRanker.rank

ranker.rank(
    chunks:   List[ScoredChunk],
    analysis: QueryAnalysis,
    max_n:    Optional[int] = None,
) -> List[ScoredChunk]

Purpose: Re-rank chunks by a weighted composite score that combines five signals: semantic similarity, keyword matching, source quality, recency, and document position.

Parameter Type Description
chunks List[ScoredChunk] Filtered chunks from ChunkFilter
analysis QueryAnalysis Query analysis result (provides keywords for keyword boost)
max_n int Maximum number of chunks to return after ranking

Returns: List[ScoredChunk] sorted descending by composite_score, capped at max_n.


SmartContextComposer.compose

composer.compose(
    chunks:   List[ScoredChunk],
    analysis: QueryAnalysis,
    budget:   int,
) -> str

Purpose: Format ranked chunks into a structured context string and ensure it never exceeds the allowed budget. Uses greedy inclusion with binary-search truncation as a safety clamp.

Parameter Type Description
chunks List[ScoredChunk] Ranked chunks from CompositeRanker
analysis QueryAnalysis Query analysis (selects template language)
budget int Maximum context length in characters or tokens

Returns: str — the fully formatted context string ready to be injected into an LLM prompt.


SmartContextComposer.get_stats

composer.get_stats(context: str) -> Dict[str, Any]

Purpose: Compute structural statistics about an already-composed context string.

Returns: Dict with: total_length, chunk_count, avg_chunk_length, min_chunk_length, max_chunk_length, template, length_unit.


7. ContextGuard — Context Validator

ContextGuard.validate

guard.validate(
    context:  str,
    query:    str,
    analysis: QueryAnalysis,
) -> Tuple[bool, List[str]]

Purpose: Validate the assembled context before it is delivered to the LLM. Runs up to five sequential checks and returns a pass/fail result along with human-readable warnings.

Checks performed:

# Check Configurable Via
1 Context is not empty check_empty
2 Context meets minimum length min_context_length
3 Context below maximum length (warning only) max_context_length
4 At least one source attribution present check_source_coverage
5 Semantic relevance to query (optional) check_relevance + relevance_threshold
Parameter Type Description
context str The assembled context string
query str The original user query
analysis QueryAnalysis Query analysis result

Returns: Tuple[bool, List[str]]

  • boolTrue = context is valid, False = validation failed
  • List[str] — list of warning/failure messages (empty on full pass)
# Standalone usage example
guard = ContextGuard(config)
passed, warnings = guard.validate(context, query, analysis)

if not passed:
    for w in warnings:
        logger.warning("ContextGuard: %s", w)
    # fallback: retrieve more chunks or reduce strictness

8. AdaptiveLearningSystem — Adaptive Learning

AdaptiveLearningSystem.suggest_budget

learning.suggest_budget(analysis: QueryAnalysis) -> int

Purpose: Suggest an optimised context budget by blending the rule-based estimate from QueryAnalysis with the rolling average of actual context consumption. If the engine consistently uses far less than its budget, this method automatically reduces the allocation (with a 30% headroom), improving throughput.

Parameter Type Description
analysis QueryAnalysis Current query analysis containing the base budget estimate

Returns: int — the recommended budget in characters (or tokens if a token counter is in use).


AdaptiveLearningSystem.get_performance_report

learning.get_performance_report() -> Dict[str, Any]

Purpose: Return a statistical performance summary computed over the most recent sliding window of queries (default window: 100 queries).

Returns: Dict with: total_queries, avg_response_ms, avg_context_length, avg_chunk_count, guard_failure_rate, samples.


9. ContextCache — Smart Cache

Managed automatically by ContextEngine — use directly only when you need explicit control.

ContextCache.get

cache.get(key: str) -> Optional[str]

Purpose: Retrieve a cached context string by its key. Automatically checks TTL and returns None for expired or missing entries.

Returns: str (the context) or None.


ContextCache.put

cache.put(key: str, context: str) -> None

Purpose: Store a context string under the given key. Applies LRU eviction — the oldest entry is removed when max_size is exceeded.


ContextCache.make_key

cache.make_key(
    query:   str,
    filters: Optional[Dict] = None,
) -> str

Purpose: Generate a stable, deterministic cache key from a query string and its optional metadata filters using an MD5 hash. Both query and filters contribute to the key, so the same query with different filters produces different keys.

Parameter Type Description
query str The user query (normalised to lowercase before hashing)
filters Dict Optional metadata filters included in the key

Returns: str — 32-character hex MD5 digest.


ContextCache.stats (property)

cache.stats -> Dict[str, Any]

Purpose: Return live cache health metrics without any side effects.

print(cache.stats)
# {
#   'size': 45, 'max_size': 256,
#   'hits': 120, 'misses': 30,
#   'hit_rate': '80.0%', 'ttl_sec': 300
# }

ContextCache.clear

cache.clear() -> None

Purpose: Flush all cached entries. Call this after re-indexing documents to prevent stale contexts from being served.


10. Budget Allocation

Used by engine.run_multi() to split a global budget across multiple queries.

EqualBudgetAllocation.allocate

EqualBudgetAllocation().allocate(
    total_budget: int,
    n_queries:    int,
) -> List[int]

Purpose: Divide the total budget equally among all queries. Any remainder from integer division is added to the first query's allocation.

Returns: List[int] — per-query budget allocations that sum to total_budget.


WeightedBudgetAllocation.allocate

WeightedBudgetAllocation(weights=[0.5, 0.3, 0.2]).allocate(
    total_budget: int,
    n_queries:    int,
) -> List[int]

Purpose: Distribute the budget according to pre-defined weights. Useful when you know in advance which queries are more important.

Parameter Type Description
weights List[float] Per-query importance weights (passed in __init__)
total_budget int Total budget to distribute
n_queries int Must equal len(weights), otherwise raises ValueError

Returns: List[int] — per-query budgets proportional to the weights.


ComplexityBasedAllocation.allocate

ComplexityBasedAllocation().allocate(
    total_budget: int,
    n_queries:    int,
    analyses:     Optional[List[QueryAnalysis]] = None,
) -> List[int]

Purpose: Automatically distribute budget based on each query's detected complexity. No manual weight configuration needed — the engine derives weights from QueryAnalysis.complexity.

Complexity Budget Weight
LOW 1.0×
MEDIUM 1.5×
HIGH 2.5×

Falls back to EqualBudgetAllocation if analyses is None or mismatched.


11. ConversationHistory — Multi-turn Support

ConversationHistory.add_turn

history.add_turn(
    query:    str,
    answer:   str        = "",
    context:  str        = "",
    analysis: Optional[QueryAnalysis] = None,
) -> ConversationTurn

Purpose: Append a new conversation turn (question + answer + context used) to the history. Automatically enforces the max_turns limit by discarding the oldest turn when the limit is reached.

Parameter Type Description
query str The user's question
answer str The LLM's answer (optional — can be added after generation)
context str The context string that was fed to the LLM
analysis QueryAnalysis The query analysis for this turn

Returns: The newly created ConversationTurn object.


ConversationHistory.get_recent_context

history.get_recent_context(n: int = 3) -> str

Purpose: Extract the last n Q&A pairs as a plain-text string, suitable for inclusion in a retrieval query to capture conversational context.

Returns: str formatted as:

Q: First question
A: First answer

Q: Second question
A: Second answer

ConversationHistory.get_enriched_query

history.get_enriched_query(
    current_query: str,
    n:             int = 2,
) -> str

Purpose: Enrich the current query by prepending recent conversation context. This resolves implicit references and pronouns ("it", "that", "the previous one") so the retriever finds the correct chunks even when the query is underspecified.

Parameter Type Description
current_query str The user's current question
n int Number of prior turns to include as context

Returns: str — the enriched query, or the original query unchanged if there is no prior history.

# Turn 1
engine.run("What is the Transformer model?")

# Turn 2 — "it" is ambiguous without history
engine.run("How does it handle long sequences?")
# Internally enriched to:
# "[Previous conversation:
#   Q: What is the Transformer model?
#   A: ...]
#  Current question: How does it handle long sequences?"

12. Utility Functions

char_counter

from fennec_community.context import char_counter

char_counter(text: str) -> int

Purpose: Count the number of characters in a string. This is the default length measurement function used throughout the engine when no custom length_counter is provided.

Returns: int — character count, or 0 for empty / None input.


make_token_counter

from fennec_community.context import make_token_counter

token_counter = make_token_counter(tokenizer)

Purpose: Create a token-based length counter from any tokenizer that implements .encode(text). Pass the returned function as length_counter to ContextEngine to make all budget limits operate in tokens instead of characters — essential when targeting a model with a token-count window.

Parameter Type Description
tokenizer Any tokenizer Must expose a .encode(text) method that returns a sequence

Returns: LengthCounter — a Callable[[str], int] that returns token count.

from transformers import AutoTokenizer
from fennec_community.context import make_token_counter, ContextEngine

tokenizer     = AutoTokenizer.from_pretrained("bert-base-uncased")
token_counter = make_token_counter(tokenizer)

engine = ContextEngine(length_counter=token_counter)
# All budget values now interpreted as token counts

build_query_analyzer

from fennec_community.context import build_query_analyzer

analyzer = build_query_analyzer(
    config:   ContextEngineConfig,
    strategy: Optional[QueryAnalyzerStrategy] = None,
) -> QueryAnalyzerStrategy

Purpose: Factory function that constructs a query analyzer. If a custom strategy is provided (e.g. an LLM-powered analyzer), it is returned as-is. Otherwise, a RuleBasedQueryAnalyzer is built from the provided config. Use this when you want to create and test an analyzer independently before wiring it into the engine.

Parameter Type Description
config ContextEngineConfig Engine configuration
strategy QueryAnalyzerStrategy Optional custom analyzer to use instead of the default

Returns: QueryAnalyzerStrategy instance ready to call .analyze(query).


13. Reference Tables

Context Templates

Template Header Chunk Format Best For
"arabic" 📚 المعلومات المسترجعة: [المصدر: X]\nالنص Arabic-language applications
"english" 📚 Retrieved Information: [Source: X]\nText English-language applications
"minimal" (none) Raw text only Compact prompts, token-tight scenarios
"structured" === CONTEXT START === --- [N] Source ---\nText Systems requiring explicit delimiters

Examples End To End

from __future__ import annotations
import time
from dataclasses import dataclass, field
from typing import Optional, Dict
import numpy as np
from fennec_community.context import (
    ContextEngine,
    ContextEngineConfig,
    ContextManager,
    QueryAnalyzerConfig,
    RetrieverConfig,
    FilterConfig,
    RankingConfig,
    ComposerConfig,
    GuardConfig,
    CacheConfig,
    SourceQuality,
    ScoredChunk,
    char_counter,
)
from fennec_community.chunks import DocumentChunk

_WORD_EMBEDDINGS: Dict[str, np.ndarray] = {}
def mock_embed(text: str) -> np.ndarray:
    """
    simulations for embedding model for test
    """
    import re
    words = re.findall(r"\w+", text.lower())
    vec   = np.zeros(64)
    for w in words:
        if w not in _WORD_EMBEDDINGS:
            np.random.seed(hash(w) % (2**31))
            _WORD_EMBEDDINGS[w] = np.random.randn(64)
        vec += _WORD_EMBEDDINGS[w]
    norm = np.linalg.norm(vec)
    return (vec / norm) if norm > 0 else vec


KNOWLEDGE_BASE = [
    DocumentChunk(
        doc_id="ml_001",
        text=(
            "التعلم الآلي هو فرع من فروع الذكاء الاصطناعي يركز على بناء أنظمة تتعلم من البيانات. "
            "تستخدم هذه الأنظمة الخوارزميات لتحليل البيانات وتعلم الأنماط واتخاذ القرارات. "
            "من أبرز تطبيقاته: التعرف على الصور، ومعالجة اللغة الطبيعية، وأنظمة التوصية."
        ),
        metadata={"category": "ml", "lang": "ar", "date": "2024","source":"كتاب التعلم الآلي" },
    ),
    DocumentChunk(
        doc_id="dl_002",
        text=(
            "التعلم العميق هو مجال فرعي من التعلم الآلي يعتمد على الشبكات العصبية الاصطناعية "
            "ذات الطبقات المتعددة. تستطيع هذه الشبكات تعلم تمثيلات هرمية للبيانات. "
            "تُستخدم في رؤية الكمبيوتر، ومعالجة الكلام، والنصوص."
        ),
        metadata={"category": "dl", "lang": "ar", "date": "2024","source":"كتاب التعلم العميق" },
    ),
    DocumentChunk(
        doc_id="nlp_003",
        text=(
            "معالجة اللغة الطبيعية (NLP) هي مجال في الذكاء الاصطناعي يُعنى بفهم اللغة البشرية. "
            "يشمل مهام مثل: تصنيف النصوص، واستخراج المعلومات، والترجمة الآلية، "
            "وتوليد النصوص. نماذج مثل BERT وGPT غيّرت هذا المجال جذرياً."
        ),
        metadata={"category": "nlp", "lang": "ar", "date": "2024","source":"كتاب معالجة اللغة الطبيعية" },
    ),
    DocumentChunk(
        doc_id="rag_004",
        text=(
            "Retrieval-Augmented Generation (RAG) هي تقنية تجمع بين قواعد المعرفة الخارجية "
            "ونماذج اللغة الكبيرة. تُحسِّن دقة الإجابات وتقلل الـ hallucination. "
            "تعتمد على: استرجاع المعلومات ذات الصلة، وتضمينها في سياق الـ LLM."
        ),
        metadata={"category": "rag", "lang": "ar", "date": "2024"},
    ),
    DocumentChunk(
        doc_id="transformer_005",
        text=(
            "معمارية Transformer ثورت مجال معالجة اللغة الطبيعية منذ ورقة Attention is All You Need. "
            "تعتمد على آلية self-attention التي تُحدد العلاقات بين الكلمات في النص. "
            "أصبحت أساساً لنماذج مثل BERT وGPT وT5."
        ),
        metadata={"category": "transformer", "lang": "ar", "date": "2017"},
    ),
    DocumentChunk(
        doc_id="vector_006",
        text=(
            "قواعد البيانات المتجهية (Vector Databases) تُخزّن embeddings وتتيح البحث بالتشابه الدلالي. "
            "من أبرزها: Pinecone, Chroma, Weaviate, FAISS. "
            "تُستخدم في أنظمة RAG لاسترجاع المعلومات ذات الصلة بالاستعلام."
        ),
        metadata={"category": "infrastructure", "lang": "ar", "date": "2024"},
    ),
    DocumentChunk(
        doc_id="duplicate_007",  # chunk مكرر عمداً للاختبار
        text=(
            "التعلم الآلي هو فرع من فروع الذكاء الاصطناعي يركز على بناء أنظمة تتعلم من البيانات. "
            "تستخدم هذه الأنظمة الخوارزميات لتحليل البيانات وتعلم الأنماط واتخاذ القرارات."
        ),
        metadata={"category": "ml", "lang": "ar"},
    ),

    DocumentChunk(
        doc_id="llm_009",
        text=(
            "نماذج اللغة الكبيرة (LLMs) مثل GPT-4 وClaude وGemini تُنتج نصاً يشبه الكتابة البشرية. "
            "تُدرَّب على مليارات الكلمات وتستطيع الإجابة على الأسئلة، والكتابة، والبرمجة. "
            "التحديات الرئيسية تشمل: الـ hallucination، والتحيز، والتكلفة الحسابية."
        ),
        metadata={"category": "llm", "lang": "ar", "date": "2024"},
    ),
    DocumentChunk(
        doc_id="fine_tuning_010",
        text=(
            "Fine-tuning هي عملية تدريب نموذج مُدرَّب مسبقاً على بيانات خاصة بمجال معين. "
            "تُحسِّن الأداء في مهام محددة دون الحاجة إلى التدريب من الصفر. "
            "LoRA وQLoRA هما تقنيتان شائعتان لـ fine-tuning فعّال من حيث الموارد."
        ),
        metadata={"category": "training", "lang": "ar", "date": "2024"},
    ),
]

def example_basic():
    print("\n" + "="*60)
    print("📌 example 1 : basic using")
    print("="*60)

    engine = ContextEngine()  # buila a context engine with default config


    # initializing index cuhnks for retrieval 
    engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)

    # run the engine with a query
    result = engine.run("ما هو التعلم الآلي؟")

    print(f"\n🔍 الاستعلام: ما هو التعلم الآلي؟")
    print(f"📊 النية: {result.query_analysis.intent.value}")
    print(f"📊 التعقيد: {result.query_analysis.complexity.value}")
    print(f"🔑 الكلمات المفتاحية: {result.query_analysis.keywords[:5]}")
    print(f"\n📄 السياق المُركَّب:\n{result.context}")
    print(f"\n📈 الإحصائيات:")
    stats = result.get_stats()
    for k, v in stats.items():
        print(f"   {k}: {v}")

def example_custom_config():
    print("\n" + "="*60)
    print("📌 example 2:custom config using")
    print("="*60)

    config = ContextEngineConfig(
        query_analyzer = QueryAnalyzerConfig(
            budget_low    = 600,
            budget_medium = 1500,
            budget_high   = 3500,
        ),
        retriever = RetrieverConfig(
            top_k          = 15,
            vector_weight  = 0.65,
            keyword_weight = 0.30,
            metadata_weight= 0.05,
        ),
        filter_cfg = FilterConfig(
            dedup_method   = "hash",
            min_chunk_length = 30,
        ),
        ranking = RankingConfig(
            weight_vector_score   = 0.45,
            weight_keyword_score  = 0.30,
            weight_source_quality = 0.15,
            weight_recency        = 0.10,
            max_chunks_to_rank    = 5,
        ),
        composer = ComposerConfig(
            max_context_length = 2500,
            template           = "arabic",
            include_scores     = True,
            include_metadata   = False,
            group_by_source    = False,
        ),
        guard = GuardConfig(
            enabled              = True,
            min_context_length   = 50,
            check_source_coverage= True,
        ),
        cache = CacheConfig(
            enabled     = True,
            max_size    = 128,
            ttl_seconds = 600,
        ),
    )

    engine = ContextEngine(config=config)

    # specific for source quality
    source_quality = {
        "كتاب التعلم الآلي":   SourceQuality.HIGH,
        "كتاب التعلم العميق":  SourceQuality.HIGH,
        "مجلة NLP":            SourceQuality.MEDIUM,
        "بحث RAG":             SourceQuality.HIGH,
        "مصدر مكرر":           SourceQuality.LOW,
    }

    engine.index(
        chunks             = KNOWLEDGE_BASE,
        embed_fn           = mock_embed,
        source_quality_map = source_quality,
    )

    
    result = engine.run(
        "قارن بين التعلم الآلي والتعلم العميق، وما هي أبرز الفروقات؟",
        metadata_filters={"lang": "ar"},
    )

    print(f"\n🔍 الاستعلام: قارن بين التعلم الآلي والتعلم العميق")
    print(f"📊 النية: {result.query_analysis.intent.value}")
    print(f"📊 التعقيد: {result.query_analysis.complexity.value}")
    print(f"💰 الميزانية: {result.query_analysis.context_budget} حرف")
    print(f"🛡️ Guard: {'✓ نجح' if result.guard_passed else '✗ فشل'}")
    print(f"\n📄 السياق:\n{result.context}")


def example_multi_turn():
    print("\n" + "="*60)
    print("📌 example 3 :multi conversion turn using")
    print("="*60)

    engine = ContextEngine(
        config=ContextEngineConfig(multi_turn=True)
    )
    engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)

    conversation = [
        "ما هو التعلم الآلي؟",
        "وما علاقته بالتعلم العميق؟",
        "كيف يُستخدم في RAG؟",
    ]

    for i, query in enumerate(conversation):
        result = engine.run(query, use_history=True, record_turn=True)

        # simulations for llm response
        fake_answer = f"[إجابة LLM للسؤال {i+1}]"

        # record the answer in conversion history
        if engine.get_history() and engine.get_history().turns:
            engine.get_history().turns[-1].answer = fake_answer

        print(f"\n💬 السؤال {i+1}: {query}")
        print(f"   📊 الاستعلام المُثرى: {result.query_analysis.normalized[:80]}...")
        print(f"   📄 طول السياق: {result.context_length} حرف")
        print(f"   ⏱️ الوقت: {result.pipeline_ms:.1f}ms")

    # statistics for the conversation history
    history = engine.get_history()
    print(f"\n📚 تاريخ المحادثة: {len(history.turns)} دورة")


def example_cache_performance():
    print("\n" + "="*60)
    print("📌 example 4 : using cache performance")
    print("="*60)

    engine = ContextEngine(
        config=ContextEngineConfig(
            cache=CacheConfig(enabled=True, max_size=50, ttl_seconds=60)
        )
    )
    engine.index(chunks=KNOWLEDGE_BASE, embed_fn=mock_embed)

    query = "ما هو التعلم الآلي؟"

    # first query (cache miss)
    t0 = time.perf_counter()
    r1 = engine.run(query)
    t1 = (time.perf_counter() - t0) * 1000

    # same query again (cache hit)
    t0 = time.perf_counter()
    r2 = engine.run(query)
    t2 = (time.perf_counter() - t0) * 1000

    print(f"\n📊 الاستعلام الأول (cache miss): {t1:.1f}ms")
    print(f"📊 الاستعلام الثاني (cache hit):  {t2:.1f}ms")
    print(f"⚡ تسريع: {t1/t2:.1f}x" if t2 > 0 else "")

    # statistics for the cache
    perf = engine.get_performance_report()
    print(f"\n📈 تقرير الأداء:")
    for k, v in perf.items():
        if isinstance(v, dict):
            print(f"   {k}:")
            for kk, vv in v.items():
                print(f"      {kk}: {vv}")
        else:
            print(f"   {k}: {v}")

Composite Ranking Formula

composite_score =
    vector_score   × 0.40    (semantic similarity)
  + keyword_boost  × 0.25    (query keyword matches)
  + source_quality × 0.20    (source trustworthiness)
  + recency_score  × 0.10    (information freshness)
  + position_score × 0.05    (chunk position in document)

All weights are configurable via RankingConfig.


Query Complexity and Default Budgets

Complexity Default Budget Example Queries
LOW 800 chars "What is Python?", "Who invented the internet?"
MEDIUM 2000 chars "Explain the concept of deep learning"
HIGH 4000 chars "Compare Transformer and LSTM architectures in terms of..."

Full Pipeline Step-by-Step

 1. Cache Lookup          Is this query already cached?
 2. History Enrichment    Prepend prior conversation context to query
 3. Query Analysis        Detect intent, complexity, keywords, budget
 4. Budget Suggestion     Adaptive learning adjusts budget from history
 5. Retrieval             Vector + BM25 + Metadata  →  RRF Fusion
 6. Filtering             Noise removal + length filter + deduplication
 7. Ranking               Composite score re-ranking
 8. Composition           Format chunks → truncate to budget
 9. Guard Validation      Empty? Too short? Source present? Relevant?
10. Cache Store           Store result for future identical queries
11. History Record        Save this turn to conversation history
12. Adaptive Record       Log metrics for continuous self-tuning

Strategy Extension Points

Every major component implements a Strategy interface and can be swapped without touching engine internals:

Interface Default Implementation Swap It When
QueryAnalyzerStrategy RuleBasedQueryAnalyzer You want LLM-powered intent detection
HybridRetrieverStrategy HybridRetriever You have a custom vector store (Pinecone, Weaviate, etc.)
ChunkFilterStrategy ChunkFilter You need domain-specific filtering rules
ChunkRankingStrategy CompositeRanker You want learned reranking (cross-encoder, etc.)
ContextComposerStrategy SmartContextComposer You need a custom prompt template
ContextGuardStrategy ContextGuard You want LLM-based hallucination detection
BudgetAllocationStrategy EqualBudgetAllocation You need custom multi-query budget logic

Developer note: All components follow the Strategy Pattern via dependency injection. Pass any custom implementation directly to ContextEngine.__init__() — no subclassing of the engine is required.

Source: community/context.md