Fennec Community community/rag/hybrid_search.md

`hybrid_search` — Enterprise SDK Documentation

Overview
Architecture
Installation & Quick Start
Module: SearchConfig
Module: StanzaTokenizer
Module: BM25Scorer
Module: TFIDFScorer
Module: HybridSearchRAG
Fusion Methods Reference
Error Reference
Full Integration Examples

Overview

hybrid_search is a production-grade Retrieval-Augmented Generation (RAG) layer that fuses semantic vector search with lexical keyword search (BM25 or TF-IDF) into a single, unified retrieval pipeline. It is designed for multi-lingual corpora — with native Arabic NLP support via Stanza — and ships with three configurable fusion strategies, a fully async API, and built-in observability through runtime statistics.

Core capabilities:

Semantic search via an underlying RAG vector database
Keyword search via BM25 (recommended) or TF-IDF
Three fusion strategies: Weighted Sum, Reciprocal Rank Fusion (RRF), Max Score
Synchronous and asynchronous APIs — compatible with FastAPI, Jupyter, and plain scripts
Token-level streaming for LLM answer generation
Automatic index training and lazy retraining
Source attribution in generated answers
Side-by-side method comparison for evaluation

Architecture

┌──────────────────────────────────────────────────────────────┐
│                        HybridSearchRAG                        │
│                                                              │
│  ┌────────────────┐          ┌──────────────────────────┐   │
│  │  Semantic Layer│          │     Keyword Layer         │   │
│  │  (RAG vector   │          │  BM25Scorer / TFIDFScorer │   │
│  │   database)    │          │  + StanzaTokenizer        │   │
│  └───────┬────────┘          └────────────┬─────────────┘   │
│          │   semantic_results              │  keyword_results │
│          └──────────────┬─────────────────┘                 │
│                         ▼                                    │
│              ┌──────────────────────┐                        │
│              │   Fusion Engine      │                        │
│              │  weighted_sum        │                        │
│              │  rrf                 │                        │
│              │  max                 │                        │
│              └──────────┬───────────┘                       │
│                         ▼                                    │
│              ┌──────────────────────┐                        │
│              │   LLM Generation     │                        │
│              │  generate / agenerate│                        │
│              │  / astream           │                        │
│              └──────────────────────┘                        │
└──────────────────────────────────────────────────────────────┘

Installation & Quick Start

pip install stanza numpy fennec_community
python -c "import stanza; stanza.download('ar')"   # Arabic model
python -c "import stanza; stanza.download('en')"   # English model (optional)

from fenenc_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

config = SearchConfig(
    semantic_weight=0.7,
    keyword_weight=0.3,
    top_k=10,
    fusion_method="rrf"
)

searcher = HybridSearchRAG(
    rag_system=my_rag,
    config=config,
    keyword_method="bm25",
    language="ar"
)

# Train keyword index once after documents are loaded
searcher.train_keyword_index()

# Run a hybrid search
results = searcher.hybrid_search("ما هو الذكاء الاصطناعي؟")

# Or generate a full answer
answer = searcher.generate("ما هو الذكاء الاصطناعي؟", include_sources=True)
print(answer)

Module: `SearchConfig`

SearchConfig is a dataclass that centralises all tunable parameters for the hybrid search pipeline. Pass an instance to HybridSearchRAG to override the defaults.

from fennec_community.rag.types.hybrid_search import SearchConfig

config = SearchConfig(
    semantic_weight=0.6,
    keyword_weight=0.4,
    min_score=0.25,
    top_k=15,
    enable_reranking=True,
    fusion_method="rrf"
)

Fields

Field	Type	Default	Description
`semantic_weight`	`float`	`0.7`	Weight applied to normalised semantic scores during fusion. Must sum with `keyword_weight` to a meaningful ratio (not forced to 1.0).
`keyword_weight`	`float`	`0.3`	Weight applied to normalised keyword scores during fusion.
`min_score`	`float`	`0.3`	Minimum combined score threshold. Results below this value are filtered out before generation.
`top_k`	`int`	`10`	Maximum number of results returned by any search method.
`enable_reranking`	`bool`	`True`	Reserved flag for optional post-fusion reranking (configurable by downstream integrations).
`fusion_method`	`str`	`'weighted_sum'`	Strategy for merging semantic and keyword result lists. Accepted values: `'weighted_sum'`, `'rrf'`, `'max'`. See Fusion Methods Reference.

Module: `StanzaTokenizer`

StanzaTokenizer is the NLP pre-processing backbone for both BM25Scorer and TFIDFScorer. It wraps the Stanza NLP pipeline and provides tokenization with optional morphological lemmatization — critical for high-recall retrieval in morphologically rich languages such as Arabic.

If Stanza is unavailable at runtime, the class automatically falls back to a regex-based simple tokenizer with no external dependencies.

from fennec_community.rag.types.hybrid_search import StanzaTokenizer

tokenizer = StanzaTokenizer(language="ar", use_lemmatization=True)
tokens = tokenizer.tokenize("الذكاء الاصطناعي يغير العالم")
# ['ذكاء', 'اصطناعي', 'غير', 'عالم']

`StanzaTokenizer.init`

StanzaTokenizer(language: str = 'ar', use_lemmatization: bool = True)

Initialises the tokenizer and loads the Stanza NLP pipeline for the specified language.

Parameters

Parameter	Type	Default	Description
`language`	`str`	`'ar'`	BCP-47 language code. Use `'ar'` for Arabic, `'en'` for English. Must match a downloaded Stanza model.
`use_lemmatization`	`bool`	`True`	When `True`, activates Stanza's `mwt`, `pos`, and `lemma` processors so each token is reduced to its dictionary root form. Strongly recommended for Arabic.

Notes

If Stanza is not installed (ImportError) or the model fails to load, the tokenizer silently falls back to _simple_tokenize and logs a warning.
All initialisation errors are non-fatal by design — the system will continue to function with reduced accuracy.

`StanzaTokenizer.tokenize`

tokenize(text: str) -> List[str]

Tokenizes a single text string into a list of cleaned, lowercase tokens using the Stanza pipeline (or the simple fallback).

Parameters

Parameter	Type	Description
`text`	`str`	The raw input text to tokenize. Empty strings or whitespace-only input return an empty list immediately.

Returns

List[str] — A list of lowercase tokens. Single-character tokens are removed. If use_lemmatization=True, each token is its lemma form.

Example

tokens = tokenizer.tokenize("الطلاب يدرسون في الجامعات")
# With lemmatization → ['طالب', 'درس', 'جامعة']
# Without           → ['الطلاب', 'يدرسون', 'الجامعات']

Important: This is the core method called by BM25Scorer and TFIDFScorer during both training (fit) and scoring. Consistency between training and query tokenization is guaranteed because both paths use the same StanzaTokenizer instance.

`StanzaTokenizer.tokenize_batch`

tokenize_batch(texts: List[str]) -> List[List[str]]

Tokenizes multiple texts in sequence. Convenience wrapper over tokenize.

Parameters

Parameter	Type	Description
`texts`	`List[str]`	A list of raw text strings to tokenize.

Returns

List[List[str]] — Each inner list corresponds to the tokens of the text at the same index in the input.

Example

token_lists = tokenizer.tokenize_batch([
    "البحث عن المعلومات",
    "تعلم الآلة والذكاء الاصطناعي"
])
# [['بحث', 'معلومة'], ['تعلم', 'آلة', 'ذكاء', 'اصطناعي']]

Module: `BM25Scorer`

BM25Scorer implements the Okapi BM25 ranking function — the industry-standard keyword relevance algorithm that outperforms plain TF-IDF on short-to-medium texts. It uses StanzaTokenizer internally for linguistically aware tokenization.

BM25 formula used:

score(q, d) = Σ IDF(t) × [ tf(t,d) × (k1+1) ] / [ tf(t,d) + k1×(1 - b + b×|d|/avgdl) ]
IDF(t)      = log( (N - df(t) + 0.5) / (df(t) + 0.5) + 1 )

from fennec_community.rag.types.hybrid_search import BM25Scorer

scorer = BM25Scorer(k1=1.5, b=0.75, language="ar")
scorer.fit(documents)
scores = scorer.batch_score("الذكاء الاصطناعي", documents)

`BM25Scorer.init`

BM25Scorer(
    k1: float = 1.5,
    b: float = 0.75,
    language: str = 'ar',
    use_lemmatization: bool = True
)

Constructs a BM25 scorer. The model is not ready for scoring until fit() is called.

Parameters

Parameter	Type	Default	Description
`k1`	`float`	`1.5`	Term-frequency saturation parameter. Higher values give more weight to repeated terms. Recommended range: `1.2–2.0`.
`b`	`float`	`0.75`	Document-length normalisation factor. `1.0` = full normalisation, `0.0` = no normalisation. Standard value is `0.75`.
`language`	`str`	`'ar'`	Language code forwarded to `StanzaTokenizer`.
`use_lemmatization`	`bool`	`True`	Forwarded to `StanzaTokenizer`. Recommended `True` for Arabic.

`BM25Scorer.fit`

fit(documents: List[str]) -> None

Trains the BM25 model on the document corpus. Computes per-term IDF values and per-document token lengths. Must be called before any scoring method.

Parameters

Parameter	Type	Description
`documents`	`List[str]`	The full list of raw document texts forming the corpus. Each string is one document/chunk.

Returns

None

Side effects: Populates self.idf, self.doc_lengths, self.avgdl, and self.corpus_size.

Example

corpus = ["الذكاء الاصطناعي مجال واسع", "تعلم الآلة فرع من الذكاء"]
scorer.fit(corpus)

Warning: Calling fit again overwrites the existing model. For large corpora, this is a CPU-intensive operation because every document is tokenized via Stanza.

`BM25Scorer.score`

score(query: str, document: str, doc_idx: int) -> float

Calculates the BM25 relevance score between a query and a single document.

Parameters

Parameter	Type	Description
`query`	`str`	The search query text.
`document`	`str`	The document text to score against the query.
`doc_idx`	`int`	The zero-based index of this document in the corpus used during `fit`. Used to retrieve the pre-computed document length. If the index is out of range, the document's actual token count is used as fallback.

Returns

float — A non-negative BM25 score. Higher values indicate greater relevance. Returns 0.0 for query terms absent from the training vocabulary.

Example

s = scorer.score("الذكاء الاصطناعي", "تعلم الآلة هو جزء من الذكاء الاصطناعي", 0)
print(s)  # e.g. 2.341

`BM25Scorer.batch_score`

batch_score(query: str, documents: List[str]) -> List[float]

Computes BM25 scores for a query against an entire list of documents in a single call. This is the primary method used internally by HybridSearchRAG.keyword_search.

Parameters

Parameter	Type	Description
`query`	`str`	The search query text.
`documents`	`List[str]`	The list of documents to score. Should match the corpus passed to `fit` to ensure correct document-length normalisation.

Returns

List[float] — A score for each document at the corresponding index. Scores are non-negative; unmatched documents receive 0.0.

Example

scores = scorer.batch_score("الذكاء", ["نص أول", "نص عن الذكاء", "نص ثالث"])
# e.g. [0.0, 1.87, 0.0]

Module: `TFIDFScorer`

TFIDFScorer implements a smoothed TF-IDF keyword scorer. It is an alternative to BM25Scorer — lighter and faster but less accurate on short documents. Prefer BM25 for most production use cases.

TF-IDF formula used:

score(q, d) = Σ tf(t,d) × IDF(t)
tf(t,d)     = count(t,d) / |d|
IDF(t)      = log( (N+1) / (df(t)+1) ) + 1    [smoothed]

from fennec_community.rag.types.hybrid_search import TFIDFScorer

scorer = TFIDFScorer(language="en", use_lemmatization=True)
scorer.fit(documents)
scores = scorer.batch_score("machine learning", documents)

`TFIDFScorer.init`

TFIDFScorer(language: str = 'ar', use_lemmatization: bool = True)

Constructs a TF-IDF scorer. Not ready for scoring until fit() is called.

Parameters

Parameter	Type	Default	Description
`language`	`str`	`'ar'`	Language code forwarded to `StanzaTokenizer`.
`use_lemmatization`	`bool`	`True`	Forwarded to `StanzaTokenizer`.

`TFIDFScorer.fit`

fit(documents: List[str]) -> None

Trains the TF-IDF model by computing IDF values for all terms in the corpus. Must be called before scoring.

Parameters

Parameter	Type	Description
`documents`	`List[str]`	The full corpus as a list of raw document strings.

Returns

None

Side effects: Populates self.idf and self.corpus_size.

Example

scorer.fit(["machine learning basics", "deep learning for NLP"])

`TFIDFScorer.score`

score(query: str, document: str) -> float

Computes the TF-IDF relevance score between a query and a single document.

Parameters

Parameter	Type	Description
`query`	`str`	The search query text.
`document`	`str`	The document text to score.

Returns

float — A non-negative TF-IDF score. 0.0 means no query term was found in the document or the vocabulary.

Example

s = scorer.score("deep learning", "deep learning is a subset of machine learning")
print(s)  # e.g. 0.412

`TFIDFScorer.batch_score`

batch_score(query: str, documents: List[str]) -> List[float]

Computes TF-IDF scores for a query against a list of documents.

Parameters

Parameter	Type	Description
`query`	`str`	The search query text.
`documents`	`List[str]`	The list of document texts to score.

Returns

List[float] — One score per document, in the same order as the input list.

Example

scores = scorer.batch_score("neural networks", docs)
top_idx = max(range(len(scores)), key=lambda i: scores[i])

Module: `HybridSearchRAG`

HybridSearchRAG is the central orchestrator of the hybrid_search package. It wraps an existing RAG system (any object exposing retrieve, generate, llm, and vector_db) and augments it with a hybrid retrieval layer. All public search and generation methods are available in both synchronous and asynchronous variants.

Constructor

HybridSearchRAG(
    rag_system,
    config: Optional[SearchConfig] = None,
    keyword_method: str = 'bm25',
    language: str = 'ar',
    use_lemmatization: bool = True
)

Parameters

Parameter	Type	Default	Description
`rag_system`	`Any`	required	The underlying RAG backend. Must expose: `retrieve(query, top_k)`, `generate(prompt)`, `llm`, `vector_db.chunks`, and `context_manager.build(query, chunks)`. Passing `None` raises `ValueError`.
`config`	`SearchConfig \| None`	`None`	Search configuration object. A default `SearchConfig()` is created if not provided.
`keyword_method`	`str`	`'bm25'`	Which keyword scorer to instantiate. `'bm25'` (recommended) or `'tfidf'`.
`language`	`str`	`'ar'`	Language code passed to the tokenizer and used for bilingual response messages.
`use_lemmatization`	`bool`	`True`	Whether to apply morphological lemmatization during tokenization.

Raises

ValueError — if rag_system is None.

Example

from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

searcher = HybridSearchRAG(
    rag_system=my_rag,
    config=SearchConfig(fusion_method="rrf", top_k=15),
    keyword_method="bm25",
    language="ar",
    use_lemmatization=True
)

Index Management

`train_keyword_index`

train_keyword_index() -> None

Builds and trains the keyword index (BM25 or TF-IDF) from all document chunks currently loaded in rag_system.vector_db. This method must be called before any keyword-based or hybrid search if _is_trained is False. hybrid_search and keyword_search will auto-invoke this lazily if it has not been called manually, but explicit upfront training is recommended for production deployments to avoid first-query latency spikes.

Parameters

None.

Returns

None

Side effects:

Extracts text from all rag_system.vector_db.chunks.
Calls keyword_scorer.fit(documents).
Sets self._is_trained = True.
Logs the number of indexed documents.

Behaviour when no chunks exist: Logs a warning and returns without raising an exception. _is_trained remains False.

Example

# After documents have been indexed into the RAG vector store:
searcher.train_keyword_index()
# Ready for keyword and hybrid searches.

Performance note: For large corpora (>100k chunks) with Stanza lemmatization enabled, this call can take several minutes. Consider running it in a background thread or async worker at startup.

Synchronous Search API

`semantic_search`

semantic_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Executes a pure semantic (embedding-based) search by delegating directly to the underlying RAG system's retrieve method. Use this when exact keyword matching is not needed and conceptual similarity is sufficient.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The natural-language search query.
`top_k`	`int \| None`	`None`	Maximum results to return. Falls back to `config.top_k` if `None`.

Returns

List[Tuple[DocumentChunk, float]] — A list of (chunk, similarity_score) pairs, ordered by descending similarity. Scores are cosine similarity values in [0.0, 1.0].

Side effects: Increments self.stats['semantic_searches'].

Example

results = searcher.semantic_search("ما هي تطبيقات الذكاء الاصطناعي؟", top_k=5)
for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:100]}")

`keyword_search`

keyword_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Executes a pure keyword search using the configured BM25 or TF-IDF scorer against the indexed document corpus. Only results with scores strictly greater than 0.0 are included. If the keyword index has not been trained yet, it is automatically trained before the search.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The keyword search query.
`top_k`	`int \| None`	`None`	Maximum results to return. Falls back to `config.top_k` if `None`.

Returns

List[Tuple[DocumentChunk, float]] — A list of (chunk, bm25_or_tfidf_score) pairs, ordered by descending keyword relevance.

Side effects: Increments self.stats['keyword_searches']. May trigger train_keyword_index() if not yet trained.

Example

results = searcher.keyword_search("الشبكات العصبية")
for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:100]}")

`hybrid_search`

hybrid_search(
    query: str,
    top_k: Optional[int] = None,
    semantic_weight: Optional[float] = None,
    keyword_weight: Optional[float] = None
) -> List[Tuple]

The flagship retrieval method. Runs both semantic and keyword searches in parallel (fetching top_k × 2 candidates from each), then fuses the results using the configured fusion strategy into a single ranked list. This is the recommended method for production use.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The natural-language search query.
`top_k`	`int \| None`	`None`	Maximum results to return from fusion. Falls back to `config.top_k` if `None`.
`semantic_weight`	`float \| None`	`None`	Per-call override for the semantic weight. Overrides `config.semantic_weight` for this call only.
`keyword_weight`	`float \| None`	`None`	Per-call override for the keyword weight. Overrides `config.keyword_weight` for this call only.

Returns

List[Tuple[DocumentChunk, float]] — Fused and ranked (chunk, combined_score) pairs. The combined score definition depends on the active fusion method (see Fusion Methods Reference).

Side effects: Increments self.stats['total_searches'] and self.stats['hybrid_searches']. May trigger train_keyword_index() if not yet trained.

Example

# Using config weights
results = searcher.hybrid_search("تعلم الآلة في الطب")

# Overriding weights for a specific query
results = searcher.hybrid_search(
    "نصوص قانونية",
    semantic_weight=0.4,
    keyword_weight=0.6  # Boost keyword for exact legal terms
)

for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:80]}")

Synchronous Generation API

`generate`

generate(
    query: str,
    search_method: str = 'hybrid',
    include_sources: bool = False
) -> str

Performs retrieval using the specified search method, applies the min_score threshold to filter weak results, constructs a grounded prompt, and calls the underlying LLM to produce a final natural-language answer. The prompt is specifically engineered to prevent hallucination — the LLM is strictly instructed to answer only from retrieved context and to respond with a clear "insufficient information" message rather than fabricate an answer.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The natural-language question to answer.
`search_method`	`str`	`'hybrid'`	Retrieval strategy to use. One of: `'hybrid'` (recommended), `'semantic'`, `'keyword'`.
`include_sources`	`bool`	`False`	When `True`, appends a formatted source attribution block (up to 3 unique source documents with similarity scores) to the answer.

Returns

str — The LLM-generated answer grounded in retrieved context. Returns a bilingual fallback string if no results are found, if all results fall below min_score, or if rag_system.llm is None.

Possible fallback return values:

Condition	Arabic message	English message
No retrieval results	`"لا تتوفر معلومات كافية في المستندات للإجابة على هذا السؤال."`	`"No sufficient information found in the documents to answer this question."`
All results below `min_score`	`"المعلومات المسترجعة غير ذات صلة كافية بسؤالك."`	`"Retrieved information is not relevant enough to answer your question."`
LLM unavailable	`"Language model is not available"`	—

Example

# Hybrid answer without sources
answer = searcher.generate("ما هو الفرق بين الذكاء الاصطناعي والتعلم الآلي؟")
print(answer)

# Semantic-only with sources
answer = searcher.generate(
    "What are transformer architectures?",
    search_method="semantic",
    include_sources=True
)
print(answer)

Asynchronous Search API

All asynchronous methods are non-blocking and safe to use in FastAPI endpoints, async task queues, and concurrent pipelines.

`asemantic_search`

async asemantic_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Async wrapper around semantic_search. Runs the synchronous search in a thread pool via asyncio.to_thread to avoid blocking the event loop.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The search query.
`top_k`	`int \| None`	`None`	Max results. Falls back to `config.top_k`.

Returns

List[Tuple[DocumentChunk, float]] — Same as semantic_search.

Example

results = await searcher.asemantic_search("neural networks", top_k=5)

`akeyword_search`

async akeyword_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Async wrapper around keyword_search. Runs the synchronous scorer in a thread pool.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The search query.
`top_k`	`int \| None`	`None`	Max results. Falls back to `config.top_k`.

Returns

List[Tuple[DocumentChunk, float]] — Same as keyword_search.

Example

results = await searcher.akeyword_search("الشبكات العصبية")

`ahybrid_search`

async ahybrid_search(
    query: str,
    top_k: Optional[int] = None,
    semantic_weight: Optional[float] = None,
    keyword_weight: Optional[float] = None
) -> List[Tuple]

The async flagship retrieval method. Unlike its synchronous counterpart which runs both searches sequentially, ahybrid_search dispatches asemantic_search and akeyword_search concurrently via asyncio.gather, reducing total latency by up to 50% on I/O-bound RAG backends. It then applies the configured fusion strategy synchronously.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The search query.
`top_k`	`int \| None`	`None`	Max results from fusion. Falls back to `config.top_k`.
`semantic_weight`	`float \| None`	`None`	Per-call semantic weight override.
`keyword_weight`	`float \| None`	`None`	Per-call keyword weight override.

Returns

List[Tuple[DocumentChunk, float]] — Fused and ranked results, identical in format to hybrid_search.

Side effects: Auto-trains keyword index via asyncio.to_thread if not yet trained.

Example

# FastAPI endpoint example
@app.get("/search")
async def search(q: str):
    results = await searcher.ahybrid_search(q, top_k=10)
    return [{"text": c.text, "score": s} for c, s in results]

Asynchronous Generation & Streaming API

`agenerate`

async agenerate(
    query: str,
    search_method: str = 'hybrid',
    include_sources: bool = False
) -> str

Async version of generate. Retrieves context using the specified async search method, constructs the same anti-hallucination prompt, and calls the LLM's async generation method if available (llm.generate_async), falling back to a thread-pool call for synchronous LLMs.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The question to answer.
`search_method`	`str`	`'hybrid'`	`'hybrid'`, `'semantic'`, or `'keyword'`.
`include_sources`	`bool`	`False`	Append source attribution to the answer.

Returns

str — The generated answer string. Same fallback messages as generate.

Example

answer = await searcher.agenerate(
    "Explain federated learning",
    search_method="hybrid",
    include_sources=True
)
print(answer)

`astream`

async astream(query: str, search_method: str = 'hybrid') -> AsyncGenerator[str, None]

An async streaming generator that retrieves context via ahybrid_search and yields the LLM response as a sequence of text tokens or word fragments in real time. Ideal for chat interfaces and streaming HTTP responses where time-to-first-token matters.

Streaming behaviour:

If rag_system.llm exposes an astream(prompt) async generator, each token is yielded as it is produced by the LLM.
If the LLM only has a synchronous generate method, the full answer is generated in a thread and then re-emitted word-by-word with asyncio.sleep(0) between each word to yield control back to the event loop.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The question to answer.
`search_method`	`str`	`'hybrid'`	Search strategy for context retrieval. Note: this parameter is accepted but the method always uses `ahybrid_search` internally regardless of the value.

Returns

AsyncGenerator[str, None] — Yields text fragments (tokens or words). A bilingual fallback string is yielded as a single item if retrieval fails or scores are below threshold.

Example

# CLI streaming
async for token in searcher.astream("اشرح مفهوم الشبكات العصبية"):
    print(token, end="", flush=True)

# FastAPI streaming response
from fastapi.responses import StreamingResponse

@app.get("/stream")
async def stream_answer(q: str):
    async def generator():
        async for token in searcher.astream(q):
            yield token
    return StreamingResponse(generator(), media_type="text/plain")

Analytics & Diagnostics

`compare_methods`

compare_methods(query: str, top_k: int = 5) -> Dict

Runs the same query through all three search methods (semantic, keyword, hybrid) sequentially and returns their results alongside pairwise overlap statistics. Designed for offline evaluation, benchmarking, and debugging to understand the contribution of each retrieval method for a given query.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The evaluation query.
`top_k`	`int`	`5`	Number of results to fetch from each method for comparison.

Returns

Dict with the following structure:

{
    "query": str,                             # The original query
    "semantic": List[Tuple],                  # Results from semantic_search
    "keyword": List[Tuple],                   # Results from keyword_search
    "hybrid": List[Tuple],                    # Results from hybrid_search
    "overlap": {
        "semantic_keyword": int,              # Chunks in both semantic & keyword
        "semantic_hybrid": int,               # Chunks in both semantic & hybrid
        "keyword_hybrid": int                 # Chunks in both keyword & hybrid
    }
}

Example

comparison = searcher.compare_methods("ما هو التعلم العميق؟", top_k=5)

print(f"Semantic results:  {len(comparison['semantic'])}")
print(f"Keyword results:   {len(comparison['keyword'])}")
print(f"Hybrid results:    {len(comparison['hybrid'])}")
print(f"Semantic∩Keyword:  {comparison['overlap']['semantic_keyword']}")
print(f"Semantic∩Hybrid:   {comparison['overlap']['semantic_hybrid']}")

Use case: Low overlap between semantic and keyword results indicates that hybrid search adds significant value (each method is surfacing different relevant documents). High overlap suggests that one method may be redundant for this query type.

`get_stats`

get_stats() -> dict

Returns a snapshot of the system's operational statistics and configuration state. Useful for monitoring dashboards, logging, and performance analysis.

Parameters

None.

Returns

dict with the following keys:

Key	Type	Description
`total_searches`	`int`	Total calls to any search or generate method.
`semantic_searches`	`int`	Total calls to `semantic_search` / `asemantic_search`.
`keyword_searches`	`int`	Total calls to `keyword_search` / `akeyword_search`.
`hybrid_searches`	`int`	Total calls to `hybrid_search` / `ahybrid_search`.
`is_trained`	`bool`	Whether the keyword index has been trained.
`indexed_documents`	`int`	Number of document chunks in the keyword index.
`keyword_method`	`str`	Active scorer: `'bm25'` or `'tfidf'`.
`language`	`str`	Active language code.
`use_lemmatization`	`bool`	Whether lemmatization is enabled.
`fusion_method`	`str`	Active fusion strategy.
`weights`	`dict`	`{"semantic": float, "keyword": float}` — current weights from config.

Example

stats = searcher.get_stats()
print(stats)
# {
#   'total_searches': 142,
#   'semantic_searches': 38,
#   'keyword_searches': 22,
#   'hybrid_searches': 82,
#   'is_trained': True,
#   'indexed_documents': 4500,
#   'keyword_method': 'bm25',
#   'language': 'ar',
#   'use_lemmatization': True,
#   'fusion_method': 'rrf',
#   'weights': {'semantic': 0.7, 'keyword': 0.3}
# }

Fusion Methods Reference

The fusion_method field in SearchConfig controls how semantic and keyword results are merged. All methods receive top_k × 2 candidates from each retrieval path before fusion.

`weighted_sum` (default)

Normalises both score lists to [0, 1] using min-max normalisation, then computes the combined score as:

combined_score = sem_score_normalised × semantic_weight
               + key_score_normalised × keyword_weight

Results below config.min_score are discarded. Best for scenarios where you have calibrated confidence in both retrievers and want predictable, tunable blending.

`rrf` — Reciprocal Rank Fusion

Uses only the rank position of each result (not the raw score), making it robust to score scale differences between the two retrievers:

rrf_score(d) = 1/(k + rank_semantic(d)) + 1/(k + rank_keyword(d))

where k = 60 (standard constant). Documents appearing in only one result list still receive a contribution from that list. The min_score threshold does not apply to RRF (scores are fractional and not directly comparable to min_score). Best practice when the two scorers have very different score magnitudes.

`max`

Normalises both score lists to [0, 1], then assigns each document the maximum of its two normalised scores:

max_score(d) = max(sem_score_normalised, key_score_normalised)

Documents appearing in only one list receive only that score. Best for sparse queries where one retrieval signal dominates and you do not want the other to dilute it.

Error Reference

Situation	Behaviour
`rag_system=None`	`ValueError` raised in constructor immediately.
Stanza not installed	Warning logged; simple regex tokenizer used transparently.
Stanza model fails to load	Warning logged; simple tokenizer used transparently.
`keyword_search` called before `fit`	Auto-trains index via `train_keyword_index()` with a warning log.
No chunks in vector store	`train_keyword_index` logs a warning and returns; `_is_trained` stays `False`.
Unsupported `fusion_method`	`ValueError: Unsupported fusion method: <value>` raised in `_fuse_results`.
All results below `min_score`	`generate` / `agenerate` return a bilingual fallback string; no exception.
LLM is `None`	`generate` returns `"Language model is not available"`.

Full Integration Examples

Example 1 — Synchronous Pipeline

from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

config = SearchConfig(
    semantic_weight=0.65,
    keyword_weight=0.35,
    min_score=0.25,
    top_k=10,
    fusion_method="rrf"
)

searcher = HybridSearchRAG(
    rag_system=my_rag,
    config=config,
    keyword_method="bm25",
    language="ar"
)

# Train once at startup
searcher.train_keyword_index()

# Search
results = searcher.hybrid_search("ما هو التعلم العميق؟")
for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:120]}")

# Generate
answer = searcher.generate(
    "ما هو التعلم العميق؟",
    search_method="hybrid",
    include_sources=True
)
print(answer)

Example 2 — Async FastAPI Service

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

app = FastAPI()
searcher = HybridSearchRAG(rag_system=my_rag, language="en")

@app.on_event("startup")
async def startup():
    import asyncio
    await asyncio.to_thread(searcher.train_keyword_index)

@app.get("/answer")
async def answer(q: str, method: str = "hybrid"):
    return {"answer": await searcher.agenerate(q, search_method=method)}

@app.get("/stream")
async def stream(q: str):
    async def gen():
        async for token in searcher.astream(q):
            yield token
    return StreamingResponse(gen(), media_type="text/plain")

@app.get("/stats")
async def stats():
    return searcher.get_stats()

Example 3 — Evaluation & Method Comparison

queries = [
    "What is federated learning?",
    "Explain transformer attention",
    "How does RAG work?"
]

for q in queries:
    comp = searcher.compare_methods(q, top_k=5)
    sem_k_overlap = comp["overlap"]["semantic_keyword"]
    print(f"Query: {q}")
    print(f"  Semantic∩Keyword overlap: {sem_k_overlap}/5")
    print(f"  Hybrid top result: {comp['hybrid'][0][0].text[:80] if comp['hybrid'] else 'None'}")
    print()

Example 4 — Custom Keyword Scorer Standalone

from fennec_community.rag.types.hybrid_search import BM25Scorer, TFIDFScorer

# BM25 standalone
bm25 = BM25Scorer(language="en", use_lemmatization=True)
corpus = ["deep learning models", "natural language processing", "computer vision tasks"]
bm25.fit(corpus)

scores = bm25.batch_score("language models", corpus)
best = corpus[scores.index(max(scores))]
print(f"Most relevant: {best}")  # "natural language processing"

# TF-IDF standalone
tfidf = TFIDFScorer(language="en")
tfidf.fit(corpus)
s = tfidf.score("vision models", "computer vision tasks")
print(f"TF-IDF score: {s:.4f}")

Source: community/rag/hybrid_search.md

Table of Contents

Overview

Architecture

Installation & Quick Start

Module: SearchConfig

Fields

Module: StanzaTokenizer

StanzaTokenizer.__init__

StanzaTokenizer.tokenize

StanzaTokenizer.tokenize_batch

Module: BM25Scorer

BM25Scorer.__init__

BM25Scorer.fit

BM25Scorer.score

BM25Scorer.batch_score

Module: TFIDFScorer

TFIDFScorer.__init__

TFIDFScorer.fit

TFIDFScorer.score

TFIDFScorer.batch_score

Module: HybridSearchRAG

Constructor

Index Management

train_keyword_index

Synchronous Search API

semantic_search

keyword_search

hybrid_search

Synchronous Generation API

generate

Asynchronous Search API

asemantic_search

akeyword_search

ahybrid_search

Asynchronous Generation & Streaming API

agenerate

astream

Analytics & Diagnostics

compare_methods

get_stats

Fusion Methods Reference

weighted_sum (default)

rrf — Reciprocal Rank Fusion

max

Error Reference

Full Integration Examples

Example 1 — Synchronous Pipeline

Example 2 — Async FastAPI Service

Example 3 — Evaluation & Method Comparison

Example 4 — Custom Keyword Scorer Standalone

Module: `SearchConfig`

Module: `StanzaTokenizer`

`StanzaTokenizer.init`

`StanzaTokenizer.tokenize`

`StanzaTokenizer.tokenize_batch`

Module: `BM25Scorer`

`BM25Scorer.init`

`BM25Scorer.fit`

`BM25Scorer.score`

`BM25Scorer.batch_score`

Module: `TFIDFScorer`

`TFIDFScorer.init`

`TFIDFScorer.fit`

`TFIDFScorer.score`

`TFIDFScorer.batch_score`

Module: `HybridSearchRAG`

`train_keyword_index`

`semantic_search`

`keyword_search`

`hybrid_search`

`generate`

`asemantic_search`

`akeyword_search`

`ahybrid_search`

`agenerate`

`astream`

`compare_methods`

`get_stats`

`weighted_sum` (default)

`rrf` — Reciprocal Rank Fusion

`max`