`hybrid_search` — Enterprise SDK Documentation
Table of Contents
- Overview
- Architecture
- Installation & Quick Start
- Module:
SearchConfig - Module:
StanzaTokenizer - Module:
BM25Scorer - Module:
TFIDFScorer - Module:
HybridSearchRAG - Fusion Methods Reference
- Error Reference
- Full Integration Examples
Overview
hybrid_search is a production-grade Retrieval-Augmented Generation (RAG) layer that fuses semantic vector search with lexical keyword search (BM25 or TF-IDF) into a single, unified retrieval pipeline. It is designed for multi-lingual corpora — with native Arabic NLP support via Stanza — and ships with three configurable fusion strategies, a fully async API, and built-in observability through runtime statistics.
Core capabilities:
- Semantic search via an underlying RAG vector database
- Keyword search via BM25 (recommended) or TF-IDF
- Three fusion strategies: Weighted Sum, Reciprocal Rank Fusion (RRF), Max Score
- Synchronous and asynchronous APIs — compatible with FastAPI, Jupyter, and plain scripts
- Token-level streaming for LLM answer generation
- Automatic index training and lazy retraining
- Source attribution in generated answers
- Side-by-side method comparison for evaluation
Architecture
┌──────────────────────────────────────────────────────────────┐
│ HybridSearchRAG │
│ │
│ ┌────────────────┐ ┌──────────────────────────┐ │
│ │ Semantic Layer│ │ Keyword Layer │ │
│ │ (RAG vector │ │ BM25Scorer / TFIDFScorer │ │
│ │ database) │ │ + StanzaTokenizer │ │
│ └───────┬────────┘ └────────────┬─────────────┘ │
│ │ semantic_results │ keyword_results │
│ └──────────────┬─────────────────┘ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Fusion Engine │ │
│ │ weighted_sum │ │
│ │ rrf │ │
│ │ max │ │
│ └──────────┬───────────┘ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ LLM Generation │ │
│ │ generate / agenerate│ │
│ │ / astream │ │
│ └──────────────────────┘ │
└──────────────────────────────────────────────────────────────┘Installation & Quick Start
pip install stanza numpy fennec_community
python -c "import stanza; stanza.download('ar')" # Arabic model
python -c "import stanza; stanza.download('en')" # English model (optional)from fenenc_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig
config = SearchConfig(
semantic_weight=0.7,
keyword_weight=0.3,
top_k=10,
fusion_method="rrf"
)
searcher = HybridSearchRAG(
rag_system=my_rag,
config=config,
keyword_method="bm25",
language="ar"
)
# Train keyword index once after documents are loaded
searcher.train_keyword_index()
# Run a hybrid search
results = searcher.hybrid_search("ما هو الذكاء الاصطناعي؟")
# Or generate a full answer
answer = searcher.generate("ما هو الذكاء الاصطناعي؟", include_sources=True)
print(answer)Module: SearchConfig
SearchConfig is a dataclass that centralises all tunable parameters for the hybrid search pipeline. Pass an instance to HybridSearchRAG to override the defaults.
from fennec_community.rag.types.hybrid_search import SearchConfig
config = SearchConfig(
semantic_weight=0.6,
keyword_weight=0.4,
min_score=0.25,
top_k=15,
enable_reranking=True,
fusion_method="rrf"
)Fields
| Field | Type | Default | Description |
|---|---|---|---|
semantic_weight |
float |
0.7 |
Weight applied to normalised semantic scores during fusion. Must sum with keyword_weight to a meaningful ratio (not forced to 1.0). |
keyword_weight |
float |
0.3 |
Weight applied to normalised keyword scores during fusion. |
min_score |
float |
0.3 |
Minimum combined score threshold. Results below this value are filtered out before generation. |
top_k |
int |
10 |
Maximum number of results returned by any search method. |
enable_reranking |
bool |
True |
Reserved flag for optional post-fusion reranking (configurable by downstream integrations). |
fusion_method |
str |
'weighted_sum' |
Strategy for merging semantic and keyword result lists. Accepted values: 'weighted_sum', 'rrf', 'max'. See Fusion Methods Reference. |
Module: StanzaTokenizer
StanzaTokenizer is the NLP pre-processing backbone for both BM25Scorer and TFIDFScorer. It wraps the Stanza NLP pipeline and provides tokenization with optional morphological lemmatization — critical for high-recall retrieval in morphologically rich languages such as Arabic.
If Stanza is unavailable at runtime, the class automatically falls back to a regex-based simple tokenizer with no external dependencies.
from fennec_community.rag.types.hybrid_search import StanzaTokenizer
tokenizer = StanzaTokenizer(language="ar", use_lemmatization=True)
tokens = tokenizer.tokenize("الذكاء الاصطناعي يغير العالم")
# ['ذكاء', 'اصطناعي', 'غير', 'عالم']StanzaTokenizer.__init__
StanzaTokenizer(language: str = 'ar', use_lemmatization: bool = True)Initialises the tokenizer and loads the Stanza NLP pipeline for the specified language.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
language |
str |
'ar' |
BCP-47 language code. Use 'ar' for Arabic, 'en' for English. Must match a downloaded Stanza model. |
use_lemmatization |
bool |
True |
When True, activates Stanza's mwt, pos, and lemma processors so each token is reduced to its dictionary root form. Strongly recommended for Arabic. |
Notes
- If Stanza is not installed (
ImportError) or the model fails to load, the tokenizer silently falls back to_simple_tokenizeand logs a warning. - All initialisation errors are non-fatal by design — the system will continue to function with reduced accuracy.
StanzaTokenizer.tokenize
tokenize(text: str) -> List[str]Tokenizes a single text string into a list of cleaned, lowercase tokens using the Stanza pipeline (or the simple fallback).
Parameters
| Parameter | Type | Description |
|---|---|---|
text |
str |
The raw input text to tokenize. Empty strings or whitespace-only input return an empty list immediately. |
Returns
List[str] — A list of lowercase tokens. Single-character tokens are removed. If use_lemmatization=True, each token is its lemma form.
Example
tokens = tokenizer.tokenize("الطلاب يدرسون في الجامعات")
# With lemmatization → ['طالب', 'درس', 'جامعة']
# Without → ['الطلاب', 'يدرسون', 'الجامعات']Important: This is the core method called by BM25Scorer and TFIDFScorer during both training (fit) and scoring. Consistency between training and query tokenization is guaranteed because both paths use the same StanzaTokenizer instance.
StanzaTokenizer.tokenize_batch
tokenize_batch(texts: List[str]) -> List[List[str]]Tokenizes multiple texts in sequence. Convenience wrapper over tokenize.
Parameters
| Parameter | Type | Description |
|---|---|---|
texts |
List[str] |
A list of raw text strings to tokenize. |
Returns
List[List[str]] — Each inner list corresponds to the tokens of the text at the same index in the input.
Example
token_lists = tokenizer.tokenize_batch([
"البحث عن المعلومات",
"تعلم الآلة والذكاء الاصطناعي"
])
# [['بحث', 'معلومة'], ['تعلم', 'آلة', 'ذكاء', 'اصطناعي']]Module: BM25Scorer
BM25Scorer implements the Okapi BM25 ranking function — the industry-standard keyword relevance algorithm that outperforms plain TF-IDF on short-to-medium texts. It uses StanzaTokenizer internally for linguistically aware tokenization.
BM25 formula used:
score(q, d) = Σ IDF(t) × [ tf(t,d) × (k1+1) ] / [ tf(t,d) + k1×(1 - b + b×|d|/avgdl) ]
IDF(t) = log( (N - df(t) + 0.5) / (df(t) + 0.5) + 1 )from fennec_community.rag.types.hybrid_search import BM25Scorer
scorer = BM25Scorer(k1=1.5, b=0.75, language="ar")
scorer.fit(documents)
scores = scorer.batch_score("الذكاء الاصطناعي", documents)BM25Scorer.__init__
BM25Scorer(
k1: float = 1.5,
b: float = 0.75,
language: str = 'ar',
use_lemmatization: bool = True
)Constructs a BM25 scorer. The model is not ready for scoring until fit() is called.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
k1 |
float |
1.5 |
Term-frequency saturation parameter. Higher values give more weight to repeated terms. Recommended range: 1.2–2.0. |
b |
float |
0.75 |
Document-length normalisation factor. 1.0 = full normalisation, 0.0 = no normalisation. Standard value is 0.75. |
language |
str |
'ar' |
Language code forwarded to StanzaTokenizer. |
use_lemmatization |
bool |
True |
Forwarded to StanzaTokenizer. Recommended True for Arabic. |
BM25Scorer.fit
fit(documents: List[str]) -> NoneTrains the BM25 model on the document corpus. Computes per-term IDF values and per-document token lengths. Must be called before any scoring method.
Parameters
| Parameter | Type | Description |
|---|---|---|
documents |
List[str] |
The full list of raw document texts forming the corpus. Each string is one document/chunk. |
Returns
None
Side effects: Populates self.idf, self.doc_lengths, self.avgdl, and self.corpus_size.
Example
corpus = ["الذكاء الاصطناعي مجال واسع", "تعلم الآلة فرع من الذكاء"]
scorer.fit(corpus)Warning: Calling fit again overwrites the existing model. For large corpora, this is a CPU-intensive operation because every document is tokenized via Stanza.
BM25Scorer.score
score(query: str, document: str, doc_idx: int) -> floatCalculates the BM25 relevance score between a query and a single document.
Parameters
| Parameter | Type | Description |
|---|---|---|
query |
str |
The search query text. |
document |
str |
The document text to score against the query. |
doc_idx |
int |
The zero-based index of this document in the corpus used during fit. Used to retrieve the pre-computed document length. If the index is out of range, the document's actual token count is used as fallback. |
Returns
float — A non-negative BM25 score. Higher values indicate greater relevance. Returns 0.0 for query terms absent from the training vocabulary.
Example
s = scorer.score("الذكاء الاصطناعي", "تعلم الآلة هو جزء من الذكاء الاصطناعي", 0)
print(s) # e.g. 2.341BM25Scorer.batch_score
batch_score(query: str, documents: List[str]) -> List[float]Computes BM25 scores for a query against an entire list of documents in a single call. This is the primary method used internally by HybridSearchRAG.keyword_search.
Parameters
| Parameter | Type | Description |
|---|---|---|
query |
str |
The search query text. |
documents |
List[str] |
The list of documents to score. Should match the corpus passed to fit to ensure correct document-length normalisation. |
Returns
List[float] — A score for each document at the corresponding index. Scores are non-negative; unmatched documents receive 0.0.
Example
scores = scorer.batch_score("الذكاء", ["نص أول", "نص عن الذكاء", "نص ثالث"])
# e.g. [0.0, 1.87, 0.0]Module: TFIDFScorer
TFIDFScorer implements a smoothed TF-IDF keyword scorer. It is an alternative to BM25Scorer — lighter and faster but less accurate on short documents. Prefer BM25 for most production use cases.
TF-IDF formula used:
score(q, d) = Σ tf(t,d) × IDF(t)
tf(t,d) = count(t,d) / |d|
IDF(t) = log( (N+1) / (df(t)+1) ) + 1 [smoothed]from fennec_community.rag.types.hybrid_search import TFIDFScorer
scorer = TFIDFScorer(language="en", use_lemmatization=True)
scorer.fit(documents)
scores = scorer.batch_score("machine learning", documents)TFIDFScorer.__init__
TFIDFScorer(language: str = 'ar', use_lemmatization: bool = True)Constructs a TF-IDF scorer. Not ready for scoring until fit() is called.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
language |
str |
'ar' |
Language code forwarded to StanzaTokenizer. |
use_lemmatization |
bool |
True |
Forwarded to StanzaTokenizer. |
TFIDFScorer.fit
fit(documents: List[str]) -> NoneTrains the TF-IDF model by computing IDF values for all terms in the corpus. Must be called before scoring.
Parameters
| Parameter | Type | Description |
|---|---|---|
documents |
List[str] |
The full corpus as a list of raw document strings. |
Returns
None
Side effects: Populates self.idf and self.corpus_size.
Example
scorer.fit(["machine learning basics", "deep learning for NLP"])TFIDFScorer.score
score(query: str, document: str) -> floatComputes the TF-IDF relevance score between a query and a single document.
Parameters
| Parameter | Type | Description |
|---|---|---|
query |
str |
The search query text. |
document |
str |
The document text to score. |
Returns
float — A non-negative TF-IDF score. 0.0 means no query term was found in the document or the vocabulary.
Example
s = scorer.score("deep learning", "deep learning is a subset of machine learning")
print(s) # e.g. 0.412TFIDFScorer.batch_score
batch_score(query: str, documents: List[str]) -> List[float]Computes TF-IDF scores for a query against a list of documents.
Parameters
| Parameter | Type | Description |
|---|---|---|
query |
str |
The search query text. |
documents |
List[str] |
The list of document texts to score. |
Returns
List[float] — One score per document, in the same order as the input list.
Example
scores = scorer.batch_score("neural networks", docs)
top_idx = max(range(len(scores)), key=lambda i: scores[i])Module: HybridSearchRAG
HybridSearchRAG is the central orchestrator of the hybrid_search package. It wraps an existing RAG system (any object exposing retrieve, generate, llm, and vector_db) and augments it with a hybrid retrieval layer. All public search and generation methods are available in both synchronous and asynchronous variants.
Constructor
HybridSearchRAG(
rag_system,
config: Optional[SearchConfig] = None,
keyword_method: str = 'bm25',
language: str = 'ar',
use_lemmatization: bool = True
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
rag_system |
Any |
required | The underlying RAG backend. Must expose: retrieve(query, top_k), generate(prompt), llm, vector_db.chunks, and context_manager.build(query, chunks). Passing None raises ValueError. |
config |
SearchConfig | None |
None |
Search configuration object. A default SearchConfig() is created if not provided. |
keyword_method |
str |
'bm25' |
Which keyword scorer to instantiate. 'bm25' (recommended) or 'tfidf'. |
language |
str |
'ar' |
Language code passed to the tokenizer and used for bilingual response messages. |
use_lemmatization |
bool |
True |
Whether to apply morphological lemmatization during tokenization. |
Raises
ValueError— ifrag_systemisNone.
Example
from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig
searcher = HybridSearchRAG(
rag_system=my_rag,
config=SearchConfig(fusion_method="rrf", top_k=15),
keyword_method="bm25",
language="ar",
use_lemmatization=True
)Index Management
train_keyword_index
train_keyword_index() -> NoneBuilds and trains the keyword index (BM25 or TF-IDF) from all document chunks currently loaded in rag_system.vector_db. This method must be called before any keyword-based or hybrid search if _is_trained is False. hybrid_search and keyword_search will auto-invoke this lazily if it has not been called manually, but explicit upfront training is recommended for production deployments to avoid first-query latency spikes.
Parameters
None.
Returns
None
Side effects:
- Extracts text from all
rag_system.vector_db.chunks. - Calls
keyword_scorer.fit(documents). - Sets
self._is_trained = True. - Logs the number of indexed documents.
Behaviour when no chunks exist: Logs a warning and returns without raising an exception. _is_trained remains False.
Example
# After documents have been indexed into the RAG vector store:
searcher.train_keyword_index()
# Ready for keyword and hybrid searches.Performance note: For large corpora (>100k chunks) with Stanza lemmatization enabled, this call can take several minutes. Consider running it in a background thread or async worker at startup.
Synchronous Search API
semantic_search
semantic_search(query: str, top_k: Optional[int] = None) -> List[Tuple]Executes a pure semantic (embedding-based) search by delegating directly to the underlying RAG system's retrieve method. Use this when exact keyword matching is not needed and conceptual similarity is sufficient.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The natural-language search query. |
top_k |
int | None |
None |
Maximum results to return. Falls back to config.top_k if None. |
Returns
List[Tuple[DocumentChunk, float]] — A list of (chunk, similarity_score) pairs, ordered by descending similarity. Scores are cosine similarity values in [0.0, 1.0].
Side effects: Increments self.stats['semantic_searches'].
Example
results = searcher.semantic_search("ما هي تطبيقات الذكاء الاصطناعي؟", top_k=5)
for chunk, score in results:
print(f"[{score:.3f}] {chunk.text[:100]}")keyword_search
keyword_search(query: str, top_k: Optional[int] = None) -> List[Tuple]Executes a pure keyword search using the configured BM25 or TF-IDF scorer against the indexed document corpus. Only results with scores strictly greater than 0.0 are included. If the keyword index has not been trained yet, it is automatically trained before the search.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The keyword search query. |
top_k |
int | None |
None |
Maximum results to return. Falls back to config.top_k if None. |
Returns
List[Tuple[DocumentChunk, float]] — A list of (chunk, bm25_or_tfidf_score) pairs, ordered by descending keyword relevance.
Side effects: Increments self.stats['keyword_searches']. May trigger train_keyword_index() if not yet trained.
Example
results = searcher.keyword_search("الشبكات العصبية")
for chunk, score in results:
print(f"[{score:.3f}] {chunk.text[:100]}")hybrid_search
hybrid_search(
query: str,
top_k: Optional[int] = None,
semantic_weight: Optional[float] = None,
keyword_weight: Optional[float] = None
) -> List[Tuple]The flagship retrieval method. Runs both semantic and keyword searches in parallel (fetching top_k × 2 candidates from each), then fuses the results using the configured fusion strategy into a single ranked list. This is the recommended method for production use.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The natural-language search query. |
top_k |
int | None |
None |
Maximum results to return from fusion. Falls back to config.top_k if None. |
semantic_weight |
float | None |
None |
Per-call override for the semantic weight. Overrides config.semantic_weight for this call only. |
keyword_weight |
float | None |
None |
Per-call override for the keyword weight. Overrides config.keyword_weight for this call only. |
Returns
List[Tuple[DocumentChunk, float]] — Fused and ranked (chunk, combined_score) pairs. The combined score definition depends on the active fusion method (see Fusion Methods Reference).
Side effects: Increments self.stats['total_searches'] and self.stats['hybrid_searches']. May trigger train_keyword_index() if not yet trained.
Example
# Using config weights
results = searcher.hybrid_search("تعلم الآلة في الطب")
# Overriding weights for a specific query
results = searcher.hybrid_search(
"نصوص قانونية",
semantic_weight=0.4,
keyword_weight=0.6 # Boost keyword for exact legal terms
)
for chunk, score in results:
print(f"[{score:.3f}] {chunk.text[:80]}")Synchronous Generation API
generate
generate(
query: str,
search_method: str = 'hybrid',
include_sources: bool = False
) -> strPerforms retrieval using the specified search method, applies the min_score threshold to filter weak results, constructs a grounded prompt, and calls the underlying LLM to produce a final natural-language answer. The prompt is specifically engineered to prevent hallucination — the LLM is strictly instructed to answer only from retrieved context and to respond with a clear "insufficient information" message rather than fabricate an answer.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The natural-language question to answer. |
search_method |
str |
'hybrid' |
Retrieval strategy to use. One of: 'hybrid' (recommended), 'semantic', 'keyword'. |
include_sources |
bool |
False |
When True, appends a formatted source attribution block (up to 3 unique source documents with similarity scores) to the answer. |
Returns
str — The LLM-generated answer grounded in retrieved context. Returns a bilingual fallback string if no results are found, if all results fall below min_score, or if rag_system.llm is None.
Possible fallback return values:
| Condition | Arabic message | English message |
|---|---|---|
| No retrieval results | "لا تتوفر معلومات كافية في المستندات للإجابة على هذا السؤال." |
"No sufficient information found in the documents to answer this question." |
All results below min_score |
"المعلومات المسترجعة غير ذات صلة كافية بسؤالك." |
"Retrieved information is not relevant enough to answer your question." |
| LLM unavailable | "Language model is not available" |
— |
Example
# Hybrid answer without sources
answer = searcher.generate("ما هو الفرق بين الذكاء الاصطناعي والتعلم الآلي؟")
print(answer)
# Semantic-only with sources
answer = searcher.generate(
"What are transformer architectures?",
search_method="semantic",
include_sources=True
)
print(answer)Asynchronous Search API
All asynchronous methods are non-blocking and safe to use in FastAPI endpoints, async task queues, and concurrent pipelines.
asemantic_search
async asemantic_search(query: str, top_k: Optional[int] = None) -> List[Tuple]Async wrapper around semantic_search. Runs the synchronous search in a thread pool via asyncio.to_thread to avoid blocking the event loop.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The search query. |
top_k |
int | None |
None |
Max results. Falls back to config.top_k. |
Returns
List[Tuple[DocumentChunk, float]] — Same as semantic_search.
Example
results = await searcher.asemantic_search("neural networks", top_k=5)akeyword_search
async akeyword_search(query: str, top_k: Optional[int] = None) -> List[Tuple]Async wrapper around keyword_search. Runs the synchronous scorer in a thread pool.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The search query. |
top_k |
int | None |
None |
Max results. Falls back to config.top_k. |
Returns
List[Tuple[DocumentChunk, float]] — Same as keyword_search.
Example
results = await searcher.akeyword_search("الشبكات العصبية")ahybrid_search
async ahybrid_search(
query: str,
top_k: Optional[int] = None,
semantic_weight: Optional[float] = None,
keyword_weight: Optional[float] = None
) -> List[Tuple]The async flagship retrieval method. Unlike its synchronous counterpart which runs both searches sequentially, ahybrid_search dispatches asemantic_search and akeyword_search concurrently via asyncio.gather, reducing total latency by up to 50% on I/O-bound RAG backends. It then applies the configured fusion strategy synchronously.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The search query. |
top_k |
int | None |
None |
Max results from fusion. Falls back to config.top_k. |
semantic_weight |
float | None |
None |
Per-call semantic weight override. |
keyword_weight |
float | None |
None |
Per-call keyword weight override. |
Returns
List[Tuple[DocumentChunk, float]] — Fused and ranked results, identical in format to hybrid_search.
Side effects: Auto-trains keyword index via asyncio.to_thread if not yet trained.
Example
# FastAPI endpoint example
@app.get("/search")
async def search(q: str):
results = await searcher.ahybrid_search(q, top_k=10)
return [{"text": c.text, "score": s} for c, s in results]Asynchronous Generation & Streaming API
agenerate
async agenerate(
query: str,
search_method: str = 'hybrid',
include_sources: bool = False
) -> strAsync version of generate. Retrieves context using the specified async search method, constructs the same anti-hallucination prompt, and calls the LLM's async generation method if available (llm.generate_async), falling back to a thread-pool call for synchronous LLMs.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The question to answer. |
search_method |
str |
'hybrid' |
'hybrid', 'semantic', or 'keyword'. |
include_sources |
bool |
False |
Append source attribution to the answer. |
Returns
str — The generated answer string. Same fallback messages as generate.
Example
answer = await searcher.agenerate(
"Explain federated learning",
search_method="hybrid",
include_sources=True
)
print(answer)astream
async astream(query: str, search_method: str = 'hybrid') -> AsyncGenerator[str, None]An async streaming generator that retrieves context via ahybrid_search and yields the LLM response as a sequence of text tokens or word fragments in real time. Ideal for chat interfaces and streaming HTTP responses where time-to-first-token matters.
Streaming behaviour:
- If
rag_system.llmexposes anastream(prompt)async generator, each token is yielded as it is produced by the LLM. - If the LLM only has a synchronous
generatemethod, the full answer is generated in a thread and then re-emitted word-by-word withasyncio.sleep(0)between each word to yield control back to the event loop.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The question to answer. |
search_method |
str |
'hybrid' |
Search strategy for context retrieval. Note: this parameter is accepted but the method always uses ahybrid_search internally regardless of the value. |
Returns
AsyncGenerator[str, None] — Yields text fragments (tokens or words). A bilingual fallback string is yielded as a single item if retrieval fails or scores are below threshold.
Example
# CLI streaming
async for token in searcher.astream("اشرح مفهوم الشبكات العصبية"):
print(token, end="", flush=True)
# FastAPI streaming response
from fastapi.responses import StreamingResponse
@app.get("/stream")
async def stream_answer(q: str):
async def generator():
async for token in searcher.astream(q):
yield token
return StreamingResponse(generator(), media_type="text/plain")Analytics & Diagnostics
compare_methods
compare_methods(query: str, top_k: int = 5) -> DictRuns the same query through all three search methods (semantic, keyword, hybrid) sequentially and returns their results alongside pairwise overlap statistics. Designed for offline evaluation, benchmarking, and debugging to understand the contribution of each retrieval method for a given query.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The evaluation query. |
top_k |
int |
5 |
Number of results to fetch from each method for comparison. |
Returns
Dict with the following structure:
{
"query": str, # The original query
"semantic": List[Tuple], # Results from semantic_search
"keyword": List[Tuple], # Results from keyword_search
"hybrid": List[Tuple], # Results from hybrid_search
"overlap": {
"semantic_keyword": int, # Chunks in both semantic & keyword
"semantic_hybrid": int, # Chunks in both semantic & hybrid
"keyword_hybrid": int # Chunks in both keyword & hybrid
}
}Example
comparison = searcher.compare_methods("ما هو التعلم العميق؟", top_k=5)
print(f"Semantic results: {len(comparison['semantic'])}")
print(f"Keyword results: {len(comparison['keyword'])}")
print(f"Hybrid results: {len(comparison['hybrid'])}")
print(f"Semantic∩Keyword: {comparison['overlap']['semantic_keyword']}")
print(f"Semantic∩Hybrid: {comparison['overlap']['semantic_hybrid']}")Use case: Low overlap between semantic and keyword results indicates that hybrid search adds significant value (each method is surfacing different relevant documents). High overlap suggests that one method may be redundant for this query type.
get_stats
get_stats() -> dictReturns a snapshot of the system's operational statistics and configuration state. Useful for monitoring dashboards, logging, and performance analysis.
Parameters
None.
Returns
dict with the following keys:
| Key | Type | Description |
|---|---|---|
total_searches |
int |
Total calls to any search or generate method. |
semantic_searches |
int |
Total calls to semantic_search / asemantic_search. |
keyword_searches |
int |
Total calls to keyword_search / akeyword_search. |
hybrid_searches |
int |
Total calls to hybrid_search / ahybrid_search. |
is_trained |
bool |
Whether the keyword index has been trained. |
indexed_documents |
int |
Number of document chunks in the keyword index. |
keyword_method |
str |
Active scorer: 'bm25' or 'tfidf'. |
language |
str |
Active language code. |
use_lemmatization |
bool |
Whether lemmatization is enabled. |
fusion_method |
str |
Active fusion strategy. |
weights |
dict |
{"semantic": float, "keyword": float} — current weights from config. |
Example
stats = searcher.get_stats()
print(stats)
# {
# 'total_searches': 142,
# 'semantic_searches': 38,
# 'keyword_searches': 22,
# 'hybrid_searches': 82,
# 'is_trained': True,
# 'indexed_documents': 4500,
# 'keyword_method': 'bm25',
# 'language': 'ar',
# 'use_lemmatization': True,
# 'fusion_method': 'rrf',
# 'weights': {'semantic': 0.7, 'keyword': 0.3}
# }Fusion Methods Reference
The fusion_method field in SearchConfig controls how semantic and keyword results are merged. All methods receive top_k × 2 candidates from each retrieval path before fusion.
weighted_sum (default)
Normalises both score lists to [0, 1] using min-max normalisation, then computes the combined score as:
combined_score = sem_score_normalised × semantic_weight
+ key_score_normalised × keyword_weightResults below config.min_score are discarded. Best for scenarios where you have calibrated confidence in both retrievers and want predictable, tunable blending.
rrf — Reciprocal Rank Fusion
Uses only the rank position of each result (not the raw score), making it robust to score scale differences between the two retrievers:
rrf_score(d) = 1/(k + rank_semantic(d)) + 1/(k + rank_keyword(d))where k = 60 (standard constant). Documents appearing in only one result list still receive a contribution from that list. The min_score threshold does not apply to RRF (scores are fractional and not directly comparable to min_score). Best practice when the two scorers have very different score magnitudes.
max
Normalises both score lists to [0, 1], then assigns each document the maximum of its two normalised scores:
max_score(d) = max(sem_score_normalised, key_score_normalised)Documents appearing in only one list receive only that score. Best for sparse queries where one retrieval signal dominates and you do not want the other to dilute it.
Error Reference
| Situation | Behaviour |
|---|---|
rag_system=None |
ValueError raised in constructor immediately. |
| Stanza not installed | Warning logged; simple regex tokenizer used transparently. |
| Stanza model fails to load | Warning logged; simple tokenizer used transparently. |
keyword_search called before fit |
Auto-trains index via train_keyword_index() with a warning log. |
| No chunks in vector store | train_keyword_index logs a warning and returns; _is_trained stays False. |
Unsupported fusion_method |
ValueError: Unsupported fusion method: <value> raised in _fuse_results. |
All results below min_score |
generate / agenerate return a bilingual fallback string; no exception. |
LLM is None |
generate returns "Language model is not available". |
Full Integration Examples
Example 1 — Synchronous Pipeline
from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig
config = SearchConfig(
semantic_weight=0.65,
keyword_weight=0.35,
min_score=0.25,
top_k=10,
fusion_method="rrf"
)
searcher = HybridSearchRAG(
rag_system=my_rag,
config=config,
keyword_method="bm25",
language="ar"
)
# Train once at startup
searcher.train_keyword_index()
# Search
results = searcher.hybrid_search("ما هو التعلم العميق؟")
for chunk, score in results:
print(f"[{score:.3f}] {chunk.text[:120]}")
# Generate
answer = searcher.generate(
"ما هو التعلم العميق؟",
search_method="hybrid",
include_sources=True
)
print(answer)Example 2 — Async FastAPI Service
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig
app = FastAPI()
searcher = HybridSearchRAG(rag_system=my_rag, language="en")
@app.on_event("startup")
async def startup():
import asyncio
await asyncio.to_thread(searcher.train_keyword_index)
@app.get("/answer")
async def answer(q: str, method: str = "hybrid"):
return {"answer": await searcher.agenerate(q, search_method=method)}
@app.get("/stream")
async def stream(q: str):
async def gen():
async for token in searcher.astream(q):
yield token
return StreamingResponse(gen(), media_type="text/plain")
@app.get("/stats")
async def stats():
return searcher.get_stats()Example 3 — Evaluation & Method Comparison
queries = [
"What is federated learning?",
"Explain transformer attention",
"How does RAG work?"
]
for q in queries:
comp = searcher.compare_methods(q, top_k=5)
sem_k_overlap = comp["overlap"]["semantic_keyword"]
print(f"Query: {q}")
print(f" Semantic∩Keyword overlap: {sem_k_overlap}/5")
print(f" Hybrid top result: {comp['hybrid'][0][0].text[:80] if comp['hybrid'] else 'None'}")
print()Example 4 — Custom Keyword Scorer Standalone
from fennec_community.rag.types.hybrid_search import BM25Scorer, TFIDFScorer
# BM25 standalone
bm25 = BM25Scorer(language="en", use_lemmatization=True)
corpus = ["deep learning models", "natural language processing", "computer vision tasks"]
bm25.fit(corpus)
scores = bm25.batch_score("language models", corpus)
best = corpus[scores.index(max(scores))]
print(f"Most relevant: {best}") # "natural language processing"
# TF-IDF standalone
tfidf = TFIDFScorer(language="en")
tfidf.fit(corpus)
s = tfidf.score("vision models", "computer vision tasks")
print(f"TF-IDF score: {s:.4f}")community/rag/hybrid_search.md