Fennec Logo Fennec
Fennec Community community/rag/hybrid_search.md

`hybrid_search` — Enterprise SDK Documentation


Table of Contents

  1. Overview
  2. Architecture
  3. Installation & Quick Start
  4. Module: SearchConfig
  5. Module: StanzaTokenizer
  6. Module: BM25Scorer
  7. Module: TFIDFScorer
  8. Module: HybridSearchRAG
  9. Fusion Methods Reference
  10. Error Reference
  11. Full Integration Examples

Overview

hybrid_search is a production-grade Retrieval-Augmented Generation (RAG) layer that fuses semantic vector search with lexical keyword search (BM25 or TF-IDF) into a single, unified retrieval pipeline. It is designed for multi-lingual corpora — with native Arabic NLP support via Stanza — and ships with three configurable fusion strategies, a fully async API, and built-in observability through runtime statistics.

Core capabilities:

  • Semantic search via an underlying RAG vector database
  • Keyword search via BM25 (recommended) or TF-IDF
  • Three fusion strategies: Weighted Sum, Reciprocal Rank Fusion (RRF), Max Score
  • Synchronous and asynchronous APIs — compatible with FastAPI, Jupyter, and plain scripts
  • Token-level streaming for LLM answer generation
  • Automatic index training and lazy retraining
  • Source attribution in generated answers
  • Side-by-side method comparison for evaluation

Architecture

┌──────────────────────────────────────────────────────────────┐
│                        HybridSearchRAG                        │
│                                                              │
│  ┌────────────────┐          ┌──────────────────────────┐   │
│  │  Semantic Layer│          │     Keyword Layer         │   │
│  │  (RAG vector   │          │  BM25Scorer / TFIDFScorer │   │
│  │   database)    │          │  + StanzaTokenizer        │   │
│  └───────┬────────┘          └────────────┬─────────────┘   │
│          │   semantic_results              │  keyword_results │
│          └──────────────┬─────────────────┘                 │
│                         ▼                                    │
│              ┌──────────────────────┐                        │
│              │   Fusion Engine      │                        │
│              │  weighted_sum        │                        │
│              │  rrf                 │                        │
│              │  max                 │                        │
│              └──────────┬───────────┘                       │
│                         ▼                                    │
│              ┌──────────────────────┐                        │
│              │   LLM Generation     │                        │
│              │  generate / agenerate│                        │
│              │  / astream           │                        │
│              └──────────────────────┘                        │
└──────────────────────────────────────────────────────────────┘

Installation & Quick Start

pip install stanza numpy fennec_community
python -c "import stanza; stanza.download('ar')"   # Arabic model
python -c "import stanza; stanza.download('en')"   # English model (optional)
from fenenc_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

config = SearchConfig(
    semantic_weight=0.7,
    keyword_weight=0.3,
    top_k=10,
    fusion_method="rrf"
)

searcher = HybridSearchRAG(
    rag_system=my_rag,
    config=config,
    keyword_method="bm25",
    language="ar"
)

# Train keyword index once after documents are loaded
searcher.train_keyword_index()

# Run a hybrid search
results = searcher.hybrid_search("ما هو الذكاء الاصطناعي؟")

# Or generate a full answer
answer = searcher.generate("ما هو الذكاء الاصطناعي؟", include_sources=True)
print(answer)

Module: SearchConfig

SearchConfig is a dataclass that centralises all tunable parameters for the hybrid search pipeline. Pass an instance to HybridSearchRAG to override the defaults.

from fennec_community.rag.types.hybrid_search import SearchConfig

config = SearchConfig(
    semantic_weight=0.6,
    keyword_weight=0.4,
    min_score=0.25,
    top_k=15,
    enable_reranking=True,
    fusion_method="rrf"
)

Fields

Field Type Default Description
semantic_weight float 0.7 Weight applied to normalised semantic scores during fusion. Must sum with keyword_weight to a meaningful ratio (not forced to 1.0).
keyword_weight float 0.3 Weight applied to normalised keyword scores during fusion.
min_score float 0.3 Minimum combined score threshold. Results below this value are filtered out before generation.
top_k int 10 Maximum number of results returned by any search method.
enable_reranking bool True Reserved flag for optional post-fusion reranking (configurable by downstream integrations).
fusion_method str 'weighted_sum' Strategy for merging semantic and keyword result lists. Accepted values: 'weighted_sum', 'rrf', 'max'. See Fusion Methods Reference.

Module: StanzaTokenizer

StanzaTokenizer is the NLP pre-processing backbone for both BM25Scorer and TFIDFScorer. It wraps the Stanza NLP pipeline and provides tokenization with optional morphological lemmatization — critical for high-recall retrieval in morphologically rich languages such as Arabic.

If Stanza is unavailable at runtime, the class automatically falls back to a regex-based simple tokenizer with no external dependencies.

from fennec_community.rag.types.hybrid_search import StanzaTokenizer

tokenizer = StanzaTokenizer(language="ar", use_lemmatization=True)
tokens = tokenizer.tokenize("الذكاء الاصطناعي يغير العالم")
# ['ذكاء', 'اصطناعي', 'غير', 'عالم']

StanzaTokenizer.__init__

StanzaTokenizer(language: str = 'ar', use_lemmatization: bool = True)

Initialises the tokenizer and loads the Stanza NLP pipeline for the specified language.

Parameters

Parameter Type Default Description
language str 'ar' BCP-47 language code. Use 'ar' for Arabic, 'en' for English. Must match a downloaded Stanza model.
use_lemmatization bool True When True, activates Stanza's mwt, pos, and lemma processors so each token is reduced to its dictionary root form. Strongly recommended for Arabic.

Notes

  • If Stanza is not installed (ImportError) or the model fails to load, the tokenizer silently falls back to _simple_tokenize and logs a warning.
  • All initialisation errors are non-fatal by design — the system will continue to function with reduced accuracy.

StanzaTokenizer.tokenize

tokenize(text: str) -> List[str]

Tokenizes a single text string into a list of cleaned, lowercase tokens using the Stanza pipeline (or the simple fallback).

Parameters

Parameter Type Description
text str The raw input text to tokenize. Empty strings or whitespace-only input return an empty list immediately.

Returns

List[str] — A list of lowercase tokens. Single-character tokens are removed. If use_lemmatization=True, each token is its lemma form.

Example

tokens = tokenizer.tokenize("الطلاب يدرسون في الجامعات")
# With lemmatization → ['طالب', 'درس', 'جامعة']
# Without           → ['الطلاب', 'يدرسون', 'الجامعات']

Important: This is the core method called by BM25Scorer and TFIDFScorer during both training (fit) and scoring. Consistency between training and query tokenization is guaranteed because both paths use the same StanzaTokenizer instance.


StanzaTokenizer.tokenize_batch

tokenize_batch(texts: List[str]) -> List[List[str]]

Tokenizes multiple texts in sequence. Convenience wrapper over tokenize.

Parameters

Parameter Type Description
texts List[str] A list of raw text strings to tokenize.

Returns

List[List[str]] — Each inner list corresponds to the tokens of the text at the same index in the input.

Example

token_lists = tokenizer.tokenize_batch([
    "البحث عن المعلومات",
    "تعلم الآلة والذكاء الاصطناعي"
])
# [['بحث', 'معلومة'], ['تعلم', 'آلة', 'ذكاء', 'اصطناعي']]

Module: BM25Scorer

BM25Scorer implements the Okapi BM25 ranking function — the industry-standard keyword relevance algorithm that outperforms plain TF-IDF on short-to-medium texts. It uses StanzaTokenizer internally for linguistically aware tokenization.

BM25 formula used:

score(q, d) = Σ IDF(t) × [ tf(t,d) × (k1+1) ] / [ tf(t,d) + k1×(1 - b + b×|d|/avgdl) ]
IDF(t)      = log( (N - df(t) + 0.5) / (df(t) + 0.5) + 1 )
from fennec_community.rag.types.hybrid_search import BM25Scorer

scorer = BM25Scorer(k1=1.5, b=0.75, language="ar")
scorer.fit(documents)
scores = scorer.batch_score("الذكاء الاصطناعي", documents)

BM25Scorer.__init__

BM25Scorer(
    k1: float = 1.5,
    b: float = 0.75,
    language: str = 'ar',
    use_lemmatization: bool = True
)

Constructs a BM25 scorer. The model is not ready for scoring until fit() is called.

Parameters

Parameter Type Default Description
k1 float 1.5 Term-frequency saturation parameter. Higher values give more weight to repeated terms. Recommended range: 1.2–2.0.
b float 0.75 Document-length normalisation factor. 1.0 = full normalisation, 0.0 = no normalisation. Standard value is 0.75.
language str 'ar' Language code forwarded to StanzaTokenizer.
use_lemmatization bool True Forwarded to StanzaTokenizer. Recommended True for Arabic.

BM25Scorer.fit

fit(documents: List[str]) -> None

Trains the BM25 model on the document corpus. Computes per-term IDF values and per-document token lengths. Must be called before any scoring method.

Parameters

Parameter Type Description
documents List[str] The full list of raw document texts forming the corpus. Each string is one document/chunk.

Returns

None

Side effects: Populates self.idf, self.doc_lengths, self.avgdl, and self.corpus_size.

Example

corpus = ["الذكاء الاصطناعي مجال واسع", "تعلم الآلة فرع من الذكاء"]
scorer.fit(corpus)

Warning: Calling fit again overwrites the existing model. For large corpora, this is a CPU-intensive operation because every document is tokenized via Stanza.


BM25Scorer.score

score(query: str, document: str, doc_idx: int) -> float

Calculates the BM25 relevance score between a query and a single document.

Parameters

Parameter Type Description
query str The search query text.
document str The document text to score against the query.
doc_idx int The zero-based index of this document in the corpus used during fit. Used to retrieve the pre-computed document length. If the index is out of range, the document's actual token count is used as fallback.

Returns

float — A non-negative BM25 score. Higher values indicate greater relevance. Returns 0.0 for query terms absent from the training vocabulary.

Example

s = scorer.score("الذكاء الاصطناعي", "تعلم الآلة هو جزء من الذكاء الاصطناعي", 0)
print(s)  # e.g. 2.341

BM25Scorer.batch_score

batch_score(query: str, documents: List[str]) -> List[float]

Computes BM25 scores for a query against an entire list of documents in a single call. This is the primary method used internally by HybridSearchRAG.keyword_search.

Parameters

Parameter Type Description
query str The search query text.
documents List[str] The list of documents to score. Should match the corpus passed to fit to ensure correct document-length normalisation.

Returns

List[float] — A score for each document at the corresponding index. Scores are non-negative; unmatched documents receive 0.0.

Example

scores = scorer.batch_score("الذكاء", ["نص أول", "نص عن الذكاء", "نص ثالث"])
# e.g. [0.0, 1.87, 0.0]

Module: TFIDFScorer

TFIDFScorer implements a smoothed TF-IDF keyword scorer. It is an alternative to BM25Scorer — lighter and faster but less accurate on short documents. Prefer BM25 for most production use cases.

TF-IDF formula used:

score(q, d) = Σ tf(t,d) × IDF(t)
tf(t,d)     = count(t,d) / |d|
IDF(t)      = log( (N+1) / (df(t)+1) ) + 1    [smoothed]
from fennec_community.rag.types.hybrid_search import TFIDFScorer

scorer = TFIDFScorer(language="en", use_lemmatization=True)
scorer.fit(documents)
scores = scorer.batch_score("machine learning", documents)

TFIDFScorer.__init__

TFIDFScorer(language: str = 'ar', use_lemmatization: bool = True)

Constructs a TF-IDF scorer. Not ready for scoring until fit() is called.

Parameters

Parameter Type Default Description
language str 'ar' Language code forwarded to StanzaTokenizer.
use_lemmatization bool True Forwarded to StanzaTokenizer.

TFIDFScorer.fit

fit(documents: List[str]) -> None

Trains the TF-IDF model by computing IDF values for all terms in the corpus. Must be called before scoring.

Parameters

Parameter Type Description
documents List[str] The full corpus as a list of raw document strings.

Returns

None

Side effects: Populates self.idf and self.corpus_size.

Example

scorer.fit(["machine learning basics", "deep learning for NLP"])

TFIDFScorer.score

score(query: str, document: str) -> float

Computes the TF-IDF relevance score between a query and a single document.

Parameters

Parameter Type Description
query str The search query text.
document str The document text to score.

Returns

float — A non-negative TF-IDF score. 0.0 means no query term was found in the document or the vocabulary.

Example

s = scorer.score("deep learning", "deep learning is a subset of machine learning")
print(s)  # e.g. 0.412

TFIDFScorer.batch_score

batch_score(query: str, documents: List[str]) -> List[float]

Computes TF-IDF scores for a query against a list of documents.

Parameters

Parameter Type Description
query str The search query text.
documents List[str] The list of document texts to score.

Returns

List[float] — One score per document, in the same order as the input list.

Example

scores = scorer.batch_score("neural networks", docs)
top_idx = max(range(len(scores)), key=lambda i: scores[i])

Module: HybridSearchRAG

HybridSearchRAG is the central orchestrator of the hybrid_search package. It wraps an existing RAG system (any object exposing retrieve, generate, llm, and vector_db) and augments it with a hybrid retrieval layer. All public search and generation methods are available in both synchronous and asynchronous variants.


Constructor

HybridSearchRAG(
    rag_system,
    config: Optional[SearchConfig] = None,
    keyword_method: str = 'bm25',
    language: str = 'ar',
    use_lemmatization: bool = True
)

Parameters

Parameter Type Default Description
rag_system Any required The underlying RAG backend. Must expose: retrieve(query, top_k), generate(prompt), llm, vector_db.chunks, and context_manager.build(query, chunks). Passing None raises ValueError.
config SearchConfig | None None Search configuration object. A default SearchConfig() is created if not provided.
keyword_method str 'bm25' Which keyword scorer to instantiate. 'bm25' (recommended) or 'tfidf'.
language str 'ar' Language code passed to the tokenizer and used for bilingual response messages.
use_lemmatization bool True Whether to apply morphological lemmatization during tokenization.

Raises

  • ValueError — if rag_system is None.

Example

from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

searcher = HybridSearchRAG(
    rag_system=my_rag,
    config=SearchConfig(fusion_method="rrf", top_k=15),
    keyword_method="bm25",
    language="ar",
    use_lemmatization=True
)

Index Management

train_keyword_index

train_keyword_index() -> None

Builds and trains the keyword index (BM25 or TF-IDF) from all document chunks currently loaded in rag_system.vector_db. This method must be called before any keyword-based or hybrid search if _is_trained is False. hybrid_search and keyword_search will auto-invoke this lazily if it has not been called manually, but explicit upfront training is recommended for production deployments to avoid first-query latency spikes.

Parameters

None.

Returns

None

Side effects:

  • Extracts text from all rag_system.vector_db.chunks.
  • Calls keyword_scorer.fit(documents).
  • Sets self._is_trained = True.
  • Logs the number of indexed documents.

Behaviour when no chunks exist: Logs a warning and returns without raising an exception. _is_trained remains False.

Example

# After documents have been indexed into the RAG vector store:
searcher.train_keyword_index()
# Ready for keyword and hybrid searches.

Performance note: For large corpora (>100k chunks) with Stanza lemmatization enabled, this call can take several minutes. Consider running it in a background thread or async worker at startup.


Synchronous Search API

semantic_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Executes a pure semantic (embedding-based) search by delegating directly to the underlying RAG system's retrieve method. Use this when exact keyword matching is not needed and conceptual similarity is sufficient.

Parameters

Parameter Type Default Description
query str required The natural-language search query.
top_k int | None None Maximum results to return. Falls back to config.top_k if None.

Returns

List[Tuple[DocumentChunk, float]] — A list of (chunk, similarity_score) pairs, ordered by descending similarity. Scores are cosine similarity values in [0.0, 1.0].

Side effects: Increments self.stats['semantic_searches'].

Example

results = searcher.semantic_search("ما هي تطبيقات الذكاء الاصطناعي؟", top_k=5)
for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:100]}")

keyword_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Executes a pure keyword search using the configured BM25 or TF-IDF scorer against the indexed document corpus. Only results with scores strictly greater than 0.0 are included. If the keyword index has not been trained yet, it is automatically trained before the search.

Parameters

Parameter Type Default Description
query str required The keyword search query.
top_k int | None None Maximum results to return. Falls back to config.top_k if None.

Returns

List[Tuple[DocumentChunk, float]] — A list of (chunk, bm25_or_tfidf_score) pairs, ordered by descending keyword relevance.

Side effects: Increments self.stats['keyword_searches']. May trigger train_keyword_index() if not yet trained.

Example

results = searcher.keyword_search("الشبكات العصبية")
for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:100]}")

hybrid_search(
    query: str,
    top_k: Optional[int] = None,
    semantic_weight: Optional[float] = None,
    keyword_weight: Optional[float] = None
) -> List[Tuple]

The flagship retrieval method. Runs both semantic and keyword searches in parallel (fetching top_k × 2 candidates from each), then fuses the results using the configured fusion strategy into a single ranked list. This is the recommended method for production use.

Parameters

Parameter Type Default Description
query str required The natural-language search query.
top_k int | None None Maximum results to return from fusion. Falls back to config.top_k if None.
semantic_weight float | None None Per-call override for the semantic weight. Overrides config.semantic_weight for this call only.
keyword_weight float | None None Per-call override for the keyword weight. Overrides config.keyword_weight for this call only.

Returns

List[Tuple[DocumentChunk, float]] — Fused and ranked (chunk, combined_score) pairs. The combined score definition depends on the active fusion method (see Fusion Methods Reference).

Side effects: Increments self.stats['total_searches'] and self.stats['hybrid_searches']. May trigger train_keyword_index() if not yet trained.

Example

# Using config weights
results = searcher.hybrid_search("تعلم الآلة في الطب")

# Overriding weights for a specific query
results = searcher.hybrid_search(
    "نصوص قانونية",
    semantic_weight=0.4,
    keyword_weight=0.6  # Boost keyword for exact legal terms
)

for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:80]}")

Synchronous Generation API

generate

generate(
    query: str,
    search_method: str = 'hybrid',
    include_sources: bool = False
) -> str

Performs retrieval using the specified search method, applies the min_score threshold to filter weak results, constructs a grounded prompt, and calls the underlying LLM to produce a final natural-language answer. The prompt is specifically engineered to prevent hallucination — the LLM is strictly instructed to answer only from retrieved context and to respond with a clear "insufficient information" message rather than fabricate an answer.

Parameters

Parameter Type Default Description
query str required The natural-language question to answer.
search_method str 'hybrid' Retrieval strategy to use. One of: 'hybrid' (recommended), 'semantic', 'keyword'.
include_sources bool False When True, appends a formatted source attribution block (up to 3 unique source documents with similarity scores) to the answer.

Returns

str — The LLM-generated answer grounded in retrieved context. Returns a bilingual fallback string if no results are found, if all results fall below min_score, or if rag_system.llm is None.

Possible fallback return values:

Condition Arabic message English message
No retrieval results "لا تتوفر معلومات كافية في المستندات للإجابة على هذا السؤال." "No sufficient information found in the documents to answer this question."
All results below min_score "المعلومات المسترجعة غير ذات صلة كافية بسؤالك." "Retrieved information is not relevant enough to answer your question."
LLM unavailable "Language model is not available"

Example

# Hybrid answer without sources
answer = searcher.generate("ما هو الفرق بين الذكاء الاصطناعي والتعلم الآلي؟")
print(answer)

# Semantic-only with sources
answer = searcher.generate(
    "What are transformer architectures?",
    search_method="semantic",
    include_sources=True
)
print(answer)

Asynchronous Search API

All asynchronous methods are non-blocking and safe to use in FastAPI endpoints, async task queues, and concurrent pipelines.

async asemantic_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Async wrapper around semantic_search. Runs the synchronous search in a thread pool via asyncio.to_thread to avoid blocking the event loop.

Parameters

Parameter Type Default Description
query str required The search query.
top_k int | None None Max results. Falls back to config.top_k.

Returns

List[Tuple[DocumentChunk, float]] — Same as semantic_search.

Example

results = await searcher.asemantic_search("neural networks", top_k=5)

async akeyword_search(query: str, top_k: Optional[int] = None) -> List[Tuple]

Async wrapper around keyword_search. Runs the synchronous scorer in a thread pool.

Parameters

Parameter Type Default Description
query str required The search query.
top_k int | None None Max results. Falls back to config.top_k.

Returns

List[Tuple[DocumentChunk, float]] — Same as keyword_search.

Example

results = await searcher.akeyword_search("الشبكات العصبية")

async ahybrid_search(
    query: str,
    top_k: Optional[int] = None,
    semantic_weight: Optional[float] = None,
    keyword_weight: Optional[float] = None
) -> List[Tuple]

The async flagship retrieval method. Unlike its synchronous counterpart which runs both searches sequentially, ahybrid_search dispatches asemantic_search and akeyword_search concurrently via asyncio.gather, reducing total latency by up to 50% on I/O-bound RAG backends. It then applies the configured fusion strategy synchronously.

Parameters

Parameter Type Default Description
query str required The search query.
top_k int | None None Max results from fusion. Falls back to config.top_k.
semantic_weight float | None None Per-call semantic weight override.
keyword_weight float | None None Per-call keyword weight override.

Returns

List[Tuple[DocumentChunk, float]] — Fused and ranked results, identical in format to hybrid_search.

Side effects: Auto-trains keyword index via asyncio.to_thread if not yet trained.

Example

# FastAPI endpoint example
@app.get("/search")
async def search(q: str):
    results = await searcher.ahybrid_search(q, top_k=10)
    return [{"text": c.text, "score": s} for c, s in results]

Asynchronous Generation & Streaming API

agenerate

async agenerate(
    query: str,
    search_method: str = 'hybrid',
    include_sources: bool = False
) -> str

Async version of generate. Retrieves context using the specified async search method, constructs the same anti-hallucination prompt, and calls the LLM's async generation method if available (llm.generate_async), falling back to a thread-pool call for synchronous LLMs.

Parameters

Parameter Type Default Description
query str required The question to answer.
search_method str 'hybrid' 'hybrid', 'semantic', or 'keyword'.
include_sources bool False Append source attribution to the answer.

Returns

str — The generated answer string. Same fallback messages as generate.

Example

answer = await searcher.agenerate(
    "Explain federated learning",
    search_method="hybrid",
    include_sources=True
)
print(answer)

astream

async astream(query: str, search_method: str = 'hybrid') -> AsyncGenerator[str, None]

An async streaming generator that retrieves context via ahybrid_search and yields the LLM response as a sequence of text tokens or word fragments in real time. Ideal for chat interfaces and streaming HTTP responses where time-to-first-token matters.

Streaming behaviour:

  • If rag_system.llm exposes an astream(prompt) async generator, each token is yielded as it is produced by the LLM.
  • If the LLM only has a synchronous generate method, the full answer is generated in a thread and then re-emitted word-by-word with asyncio.sleep(0) between each word to yield control back to the event loop.

Parameters

Parameter Type Default Description
query str required The question to answer.
search_method str 'hybrid' Search strategy for context retrieval. Note: this parameter is accepted but the method always uses ahybrid_search internally regardless of the value.

Returns

AsyncGenerator[str, None] — Yields text fragments (tokens or words). A bilingual fallback string is yielded as a single item if retrieval fails or scores are below threshold.

Example

# CLI streaming
async for token in searcher.astream("اشرح مفهوم الشبكات العصبية"):
    print(token, end="", flush=True)

# FastAPI streaming response
from fastapi.responses import StreamingResponse

@app.get("/stream")
async def stream_answer(q: str):
    async def generator():
        async for token in searcher.astream(q):
            yield token
    return StreamingResponse(generator(), media_type="text/plain")

Analytics & Diagnostics

compare_methods

compare_methods(query: str, top_k: int = 5) -> Dict

Runs the same query through all three search methods (semantic, keyword, hybrid) sequentially and returns their results alongside pairwise overlap statistics. Designed for offline evaluation, benchmarking, and debugging to understand the contribution of each retrieval method for a given query.

Parameters

Parameter Type Default Description
query str required The evaluation query.
top_k int 5 Number of results to fetch from each method for comparison.

Returns

Dict with the following structure:

{
    "query": str,                             # The original query
    "semantic": List[Tuple],                  # Results from semantic_search
    "keyword": List[Tuple],                   # Results from keyword_search
    "hybrid": List[Tuple],                    # Results from hybrid_search
    "overlap": {
        "semantic_keyword": int,              # Chunks in both semantic & keyword
        "semantic_hybrid": int,               # Chunks in both semantic & hybrid
        "keyword_hybrid": int                 # Chunks in both keyword & hybrid
    }
}

Example

comparison = searcher.compare_methods("ما هو التعلم العميق؟", top_k=5)

print(f"Semantic results:  {len(comparison['semantic'])}")
print(f"Keyword results:   {len(comparison['keyword'])}")
print(f"Hybrid results:    {len(comparison['hybrid'])}")
print(f"Semantic∩Keyword:  {comparison['overlap']['semantic_keyword']}")
print(f"Semantic∩Hybrid:   {comparison['overlap']['semantic_hybrid']}")

Use case: Low overlap between semantic and keyword results indicates that hybrid search adds significant value (each method is surfacing different relevant documents). High overlap suggests that one method may be redundant for this query type.


get_stats

get_stats() -> dict

Returns a snapshot of the system's operational statistics and configuration state. Useful for monitoring dashboards, logging, and performance analysis.

Parameters

None.

Returns

dict with the following keys:

Key Type Description
total_searches int Total calls to any search or generate method.
semantic_searches int Total calls to semantic_search / asemantic_search.
keyword_searches int Total calls to keyword_search / akeyword_search.
hybrid_searches int Total calls to hybrid_search / ahybrid_search.
is_trained bool Whether the keyword index has been trained.
indexed_documents int Number of document chunks in the keyword index.
keyword_method str Active scorer: 'bm25' or 'tfidf'.
language str Active language code.
use_lemmatization bool Whether lemmatization is enabled.
fusion_method str Active fusion strategy.
weights dict {"semantic": float, "keyword": float} — current weights from config.

Example

stats = searcher.get_stats()
print(stats)
# {
#   'total_searches': 142,
#   'semantic_searches': 38,
#   'keyword_searches': 22,
#   'hybrid_searches': 82,
#   'is_trained': True,
#   'indexed_documents': 4500,
#   'keyword_method': 'bm25',
#   'language': 'ar',
#   'use_lemmatization': True,
#   'fusion_method': 'rrf',
#   'weights': {'semantic': 0.7, 'keyword': 0.3}
# }

Fusion Methods Reference

The fusion_method field in SearchConfig controls how semantic and keyword results are merged. All methods receive top_k × 2 candidates from each retrieval path before fusion.

weighted_sum (default)

Normalises both score lists to [0, 1] using min-max normalisation, then computes the combined score as:

combined_score = sem_score_normalised × semantic_weight
               + key_score_normalised × keyword_weight

Results below config.min_score are discarded. Best for scenarios where you have calibrated confidence in both retrievers and want predictable, tunable blending.

rrf — Reciprocal Rank Fusion

Uses only the rank position of each result (not the raw score), making it robust to score scale differences between the two retrievers:

rrf_score(d) = 1/(k + rank_semantic(d)) + 1/(k + rank_keyword(d))

where k = 60 (standard constant). Documents appearing in only one result list still receive a contribution from that list. The min_score threshold does not apply to RRF (scores are fractional and not directly comparable to min_score). Best practice when the two scorers have very different score magnitudes.

max

Normalises both score lists to [0, 1], then assigns each document the maximum of its two normalised scores:

max_score(d) = max(sem_score_normalised, key_score_normalised)

Documents appearing in only one list receive only that score. Best for sparse queries where one retrieval signal dominates and you do not want the other to dilute it.


Error Reference

Situation Behaviour
rag_system=None ValueError raised in constructor immediately.
Stanza not installed Warning logged; simple regex tokenizer used transparently.
Stanza model fails to load Warning logged; simple tokenizer used transparently.
keyword_search called before fit Auto-trains index via train_keyword_index() with a warning log.
No chunks in vector store train_keyword_index logs a warning and returns; _is_trained stays False.
Unsupported fusion_method ValueError: Unsupported fusion method: <value> raised in _fuse_results.
All results below min_score generate / agenerate return a bilingual fallback string; no exception.
LLM is None generate returns "Language model is not available".

Full Integration Examples

Example 1 — Synchronous Pipeline

from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

config = SearchConfig(
    semantic_weight=0.65,
    keyword_weight=0.35,
    min_score=0.25,
    top_k=10,
    fusion_method="rrf"
)

searcher = HybridSearchRAG(
    rag_system=my_rag,
    config=config,
    keyword_method="bm25",
    language="ar"
)

# Train once at startup
searcher.train_keyword_index()

# Search
results = searcher.hybrid_search("ما هو التعلم العميق؟")
for chunk, score in results:
    print(f"[{score:.3f}] {chunk.text[:120]}")

# Generate
answer = searcher.generate(
    "ما هو التعلم العميق؟",
    search_method="hybrid",
    include_sources=True
)
print(answer)

Example 2 — Async FastAPI Service

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fennec_community.rag.types.hybrid_search import HybridSearchRAG, SearchConfig

app = FastAPI()
searcher = HybridSearchRAG(rag_system=my_rag, language="en")

@app.on_event("startup")
async def startup():
    import asyncio
    await asyncio.to_thread(searcher.train_keyword_index)

@app.get("/answer")
async def answer(q: str, method: str = "hybrid"):
    return {"answer": await searcher.agenerate(q, search_method=method)}

@app.get("/stream")
async def stream(q: str):
    async def gen():
        async for token in searcher.astream(q):
            yield token
    return StreamingResponse(gen(), media_type="text/plain")

@app.get("/stats")
async def stats():
    return searcher.get_stats()

Example 3 — Evaluation & Method Comparison

queries = [
    "What is federated learning?",
    "Explain transformer attention",
    "How does RAG work?"
]

for q in queries:
    comp = searcher.compare_methods(q, top_k=5)
    sem_k_overlap = comp["overlap"]["semantic_keyword"]
    print(f"Query: {q}")
    print(f"  Semantic∩Keyword overlap: {sem_k_overlap}/5")
    print(f"  Hybrid top result: {comp['hybrid'][0][0].text[:80] if comp['hybrid'] else 'None'}")
    print()

Example 4 — Custom Keyword Scorer Standalone

from fennec_community.rag.types.hybrid_search import BM25Scorer, TFIDFScorer

# BM25 standalone
bm25 = BM25Scorer(language="en", use_lemmatization=True)
corpus = ["deep learning models", "natural language processing", "computer vision tasks"]
bm25.fit(corpus)

scores = bm25.batch_score("language models", corpus)
best = corpus[scores.index(max(scores))]
print(f"Most relevant: {best}")  # "natural language processing"

# TF-IDF standalone
tfidf = TFIDFScorer(language="en")
tfidf.fit(corpus)
s = tfidf.score("vision models", "computer vision tasks")
print(f"TF-IDF score: {s:.4f}")

Source: community/rag/hybrid_search.md