MultiHop-RAG Modular `multi_hop` — Enterprise API Reference
Table of Contents
- Overview
- Architecture
- Quick Start
- Enumeration: HopStrategy
- Data Models
- Class: QueryDecomposer
- Class: MultiHopRAG
- Return Value Reference
- Reasoning Pipeline — Internal Flow
- Hop Strategy Selection Logic
- Confidence & Early Stopping Model
- NER Engine: Stanza vs Regex Fallback
- RAG System Interface Contract
- Language Detection & Prompt Templates
- Complete Examples
Overview
multi_hop is a Reasoning-Guided Multi-Hop RAG layer that wraps any existing RAG system and gives it the ability to answer complex, multi-faceted questions through iterative, self-directed retrieval. Instead of a single retrieve-then-answer pass, MultiHopRAG plans its search at every step, tracks what it has learned so far, identifies knowledge gaps, and adaptively chooses the next query — stopping early as soon as it has collected sufficient evidence.
Key capabilities at a glance:
| Capability | Detail |
|---|---|
| Reasoning-guided hops | Each hop asks "What do I know? What am I still missing?" before planning the next search |
| Smart early stopping | Stops automatically when confidence ≥ threshold — no wasted hops |
| Query decomposition | Splits composite questions into sequential sub-queries |
| NER-powered entity extraction | Uses Stanza NER (Arabic & English) with transparent regex fallback |
| Adaptive strategy selection | Chooses ENTITY_EXPANSION, RELATION_BRIDGING, CLARIFICATION, or VERIFICATION per hop |
| Confidence scoring | Quantitative confidence track per hop, used for early stopping |
| Bilingual prompting | Auto-detects Arabic vs English and selects the correct strict-grounding prompt |
| Async support | Non-blocking aquery for FastAPI / asyncio pipelines |
| Intermediate results | Full reasoning chain, hop details, and knowledge state accessible on demand |
Architecture
┌──────────────────────────────────────────────────────────────────────┐
│ MultiHopRAG │
│ │
│ ┌───────────────────┐ ┌──────────────────────────────────┐ │
│ │ QueryDecomposer │ │ ReasoningState │ │
│ │ (Stanza / Regex) │ │ known_facts · missing_info │ │
│ └────────┬──────────┘ │ reasoning_chain · confidence │ │
│ │ decompose() └──────────────┬───────────────────┘ │
│ │ │ updated every hop │
│ ▼ │ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Reasoning Hop Loop │ │
│ │ │ │
│ │ ① _analyze_question_requirements → question type + slots │ │
│ │ ② _select_initial_strategy → HopStrategy │ │
│ │ ③ rag.retrieve(query) → raw chunks │ │
│ │ ④ _filter_chunks → deduplicated + scored │ │
│ │ ⑤ _extract_entities_from_chunks → NER entities │ │
│ │ ⑥ _update_reasoning_state → facts + gaps + confidence │ │
│ │ ⑦ Early stop check → break if answer_complete │ │
│ │ ⑧ _plan_next_hop → next query + strategy │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ _aggregate_chunks → _build_reasoned_context → _build_reasoned_prompt │
│ │ │
│ ▼ │
│ rag.llm.generate(prompt) → Final Answer │
└──────────────────────────────────────────────────────────────────────┘Quick Start
from fennec_community.rag.types.multi_hop import MultiHopRAG, HopStrategy
# Wrap any RAG system that exposes .retrieve() and .llm.generate()
mh = MultiHopRAG(
rag_system=my_rag,
max_hops=3,
language="ar",
confidence_threshold=0.75,
)
# Simple answer
answer = mh.query("What caused the French Revolution and what were its results?")
print(answer)
# Full reasoning trace
result = mh.query(
"Compare the economic policies of Germany and France after 2008",
return_intermediate=True,
)
print(result["answer"])
print(result["reasoning_chain"])
# Async usage
answer = await mh.aquery("Why did the Ottoman Empire collapse?")Enumeration: HopStrategy
from fennec_community.rag.types.multi_hop import HopStrategyControls the search strategy used at each individual hop. The system selects the strategy automatically based on the question type and the current knowledge state, but you can inspect it in intermediate results.
class HopStrategy(Enum):
ENTITY_EXPANSION = "entity_expansion"
RELATION_BRIDGING = "relation_bridging"
CLARIFICATION = "clarification"
VERIFICATION = "verification"| Member | Value | When used | Description |
|---|---|---|---|
ENTITY_EXPANSION |
"entity_expansion" |
Default first hop; when predefined sub-queries are available | Expands the search around known entities extracted from the query or previous chunks. The most general strategy. |
RELATION_BRIDGING |
"relation_bridging" |
Comparison questions; when only one of two required entities has been found | Bridges two entities or concepts by constructing a query that connects them. Used to close the gap in comparative reasoning. |
CLARIFICATION |
"clarification" |
When a clear knowledge gap is identified; causal questions | Builds a targeted query aimed at filling a specific named gap — e.g., missing causality, missing date, or missing entity info. Also used for why / cause type questions. Gets a +0.1 score bonus during chunk aggregation. |
VERIFICATION |
"verification" |
Reserved | Planned for cross-checking facts from multiple sources. Not yet auto-selected; available for manual use in future extensions. |
Data Models
ReasoningState
from fennec_community.rag.types.multi_hop import ReasoningStateA @dataclass that tracks the accumulated knowledge state across all hops for a single query. Updated after every hop. Passed into the final prompt builder to give the LLM full situational awareness.
@dataclass
class ReasoningState:
known_facts: List[str] = [] # Discovered relevant sentences
missing_info: List[str] = [] # Named knowledge gaps still open
reasoning_chain: List[str] = [] # Per-hop reasoning step summaries
confidence: float = 0.0 # Current answer completeness estimate [0, 1]
answer_complete: bool = False # True when confidence ≥ threshold and facts > 0| Field | Type | Description |
|---|---|---|
known_facts |
List[str] |
List of relevant sentences extracted from retrieved chunks so far. Grows with each hop. |
missing_info |
List[str] |
Named knowledge gaps that have not yet been filled (e.g., "causal relationship", "specific date or time", "info about France"). Updated after each hop. |
reasoning_chain |
List[str] |
Human-readable summary of what each hop searched for, how many facts it found, the current confidence, and what's still missing. |
confidence |
float |
Quantitative estimate of answer completeness in [0.0, 1.0]. Computed as min(facts/5, 0.85) − (gaps × 0.15). |
answer_complete |
bool |
True when confidence ≥ confidence_threshold AND known_facts is non-empty, or when there are no remaining gaps. Triggers early stopping. |
Usage (read-only — managed by MultiHopRAG):
result = mh.query("Why did Rome fall?", return_intermediate=True)
state = result["knowledge_state"]
print("Known facts:", state["known_facts"])
print("Missing info:", state["missing_info"])
print("Confidence:", state["confidence"])HopResult
from fennec_community.rag.types.multi_hop import HopResultA @dataclass capturing everything that happened during one individual hop — which query was used, what was found, the strategy applied, and what gap (if any) was filled.
@dataclass
class HopResult:
query: str # The search query used for this hop
chunks: List[Tuple] # Retrieved (chunk, score) pairs
hop_number: int # 1-based hop index
extracted_entities: List[str] # Entities extracted from this hop's chunks
strategy: HopStrategy # Strategy used for this hop
reasoning_step: str # Human-readable summary of this hop
gap_filled: Optional[str] # The knowledge gap this hop targeted, if any| Field | Type | Description |
|---|---|---|
query |
str |
The exact query string that was sent to the RAG backend for this hop. |
chunks |
List[Tuple] |
List of (chunk, score) pairs returned from the RAG backend after deduplication and score filtering. |
hop_number |
int |
1-based index of this hop within the current query's hop sequence. |
extracted_entities |
List[str] |
Up to 5 unique entities extracted from the top 3 chunks of this hop. Used to plan the next hop. |
strategy |
HopStrategy |
The HopStrategy member that governed how this hop's query was constructed. |
reasoning_step |
str |
Formatted string: "hop N: looking for '…'. found M facts. confidence: X%. missing: …". |
gap_filled |
Optional[str] |
If this hop was a gap-filling hop, the name of the gap it targeted (from ReasoningState.missing_info[0]). None for expansion hops. |
Usage (read-only — produced by MultiHopRAG):
result = mh.query("Who invented the telephone and when?", return_intermediate=True)
for hop in result["hops"]:
print(f"Hop {hop['hop_number']} [{hop['strategy']}]: {hop['query']}")
print(f" Chunks: {hop['chunks_found']} Entities: {hop['entities']}")
print(f" Scores: {hop['top_scores']}")
print(f" Gap filled: {hop['gap_filled']}")Class: QueryDecomposer
from fennec_community.rag.types.multi_hop import QueryDecomposerA bilingual (Arabic / English) query analysis engine that provides two core capabilities: decomposing composite questions into sequential sub-queries, and extracting named entities from text using either Stanza NER (when installed) or a regex fallback.
Can be used independently of MultiHopRAG for standalone NLP preprocessing tasks.
__init__ (QueryDecomposer)
QueryDecomposer(
use_stanza: bool = True,
language: str = "ar",
)Purpose: Initialises the query decomposer and attempts to load a Stanza NLP pipeline for the specified language. If Stanza is unavailable or fails to load, the instance transparently falls back to regex-based extraction — no exception is raised.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
use_stanza |
bool |
True |
Whether to attempt loading a Stanza NLP pipeline. Set to False to force regex-only mode (faster startup, no model download needed). |
language |
str |
"ar" |
Language code for the Stanza pipeline. Supported: "ar" (Arabic), "en" (English), and any other Stanza-supported language code. |
Returns: QueryDecomposer instance.
NER mode resolution at startup:
use_stanza=True AND stanza installed AND model loads → Stanza NER (best accuracy)
use_stanza=True AND stanza installed AND model fails → Regex fallback (logged warning)
use_stanza=True AND stanza NOT installed → Regex fallback (logged warning)
use_stanza=False → Regex fallback (explicit)Installing Stanza for Arabic:
pip install stanza
python -c "import stanza; stanza.download('ar')"Example:
# Best accuracy (requires stanza + model)
decomposer = QueryDecomposer(use_stanza=True, language="ar")
# English, Stanza NER
decomposer_en = QueryDecomposer(use_stanza=True, language="en")
# Fast startup, no model required
decomposer_fast = QueryDecomposer(use_stanza=False)decompose
decomposer.decompose(query: str) -> List[str]Purpose: Splits a composite question into a list of sequential sub-queries by scanning for language-specific composite indicator keywords (e.g., "ثم", "وأيضاً", "then", "also", "because", "لماذا"). Supports Arabic in all major dialects (Modern Standard Arabic, Egyptian, Levantine, Gulf) as well as English.
If no composite indicator is found, the original query is returned as a single-element list — the method always returns a list.
Parameters:
| Parameter | Type | Description |
|---|---|---|
query |
str |
The natural-language question to analyse. |
Returns: List[str] — either a list of two sub-query strings (when a composite indicator is found), or [query] (when no indicator is found and the question is treated as atomic).
Supported indicator categories:
| Category | Examples (Arabic) | Examples (English) |
|---|---|---|
| Sequential | "ثم", "بعد ذلك", "ومن ثم" |
"then", "after that" |
| Additive | "وأيضاً", "كذلك", "بالإضافة" |
"also", "in addition" |
| Causal | "لماذا", "ما السبب" |
"why", "cause" |
| Conditional | "إذا", "في حالة" |
"if", "in case" |
| Comparative | "مقارنةً بـ", "بينما" |
"while", "compared to" |
| Procedural | "كيف", "ما الطريقة" |
"how" |
| Dialectal (EG) | "وبعدين", "كمان", "علشان كده" |
— |
| Dialectal (Levantine) | "عقبها", "بعد هيك", "ليش" |
— |
| Dialectal (Gulf) | "عقب هيك", "شلون", "عشان هيك" |
— |
Example:
# Composite question → two sub-queries
parts = decomposer.decompose("من هو أينشتاين ثم ما أهم اكتشافاته؟")
# ["من هو أينشتاين", "ما أهم اكتشافاته؟"]
# Atomic question → single element
parts = decomposer.decompose("What is the capital of France?")
# ["What is the capital of France?"]extract_entities
decomposer.extract_entities(text: str) -> List[str]Purpose: Extracts the most important named entities from a piece of text. Automatically uses Stanza NER when available (produces typed, scored entities), or falls back to regex-based extraction (quoted strings, English proper nouns, and numbers). Always returns the top 10 entities at most.
This is the primary NER method used by MultiHopRAG both during question analysis and during chunk processing at each hop.
Parameters:
| Parameter | Type | Description |
|---|---|---|
text |
str |
Any text: a query string, a retrieved chunk, or a document sentence. Texts shorter than 3 characters are returned as an empty list immediately. |
Returns: List[str] — list of entity strings, sorted by importance score descending (Stanza mode) or by length descending (regex mode). Maximum 10 entries.
Stanza mode — entity importance scoring formula:
score = entity_type_weight # base weight by type (see table below)
+ (len(entity_text) / 10) # longer = more specific
+ (count_in_text × 0.5) # frequency bonus
+ 1.0 (if contains English chars)
+ 0.5 (if contains digits)Entity type weights (Stanza mode):
| Type | Weight | Meaning |
|---|---|---|
PERS / PER |
3.0 | Persons — highest priority |
ORG |
2.5 | Organizations |
LOC / GPE |
2.0 | Locations / Geo-political entities |
DATE / MONEY |
1.5 | Dates and monetary amounts |
QUANTITY |
1.3 | Quantities |
TIME / PERCENT |
1.2 | Times and percentages |
CARDINAL |
1.1 | Cardinal numbers |
ORDINAL |
1.0 | Ordinal numbers |
Regex mode — extraction rules:
| Priority | Pattern | Example |
|---|---|---|
| 1 | Quoted text "…" or «…» |
"الثورة الفرنسية" |
| 2 | English TitleCase words |
"Napoleon Bonaparte" |
| 3 | Numeric tokens | "1789", "95" |
Results from regex mode are deduplicated and sorted by string length (longer = more specific).
Example:
# With Stanza (Arabic)
entities = decomposer.extract_entities(
"ولد نابليون بونابرت عام 1769 في جزيرة كورسيكا وأصبح إمبراطوراً لفرنسا."
)
# ["نابليون بونابرت", "كورسيكا", "فرنسا", "1769"]
# Without Stanza (regex fallback)
entities = decomposer.extract_entities(
'The "Eiffel Tower" was built in 1889 by Gustave Eiffel in Paris.'
)
# ["Eiffel Tower", "Gustave Eiffel", "Paris", "1889"]get_entities_by_type
decomposer.get_entities_by_type(text: str) -> Dict[str, List[str]]Purpose: Extracts entities from text and returns them grouped by their Stanza NER type rather than as a flat ranked list. Use this for fine-grained analysis — e.g., to separately access all persons, all locations, and all dates mentioned in a text.
⚠️ Requires Stanza: Returns an empty dictionary
{}when Stanza is unavailable or failed to load.
Parameters:
| Parameter | Type | Description |
|---|---|---|
text |
str |
The text to extract typed entities from. |
Returns: Dict[str, List[str]] — keys are Stanza NER type labels (e.g., "PERS", "LOC", "ORG", "DATE"), values are deduplicated lists of entity strings of that type. Returns {} if Stanza is not available or an error occurs.
Example:
typed = decomposer.get_entities_by_type(
"Albert Einstein was born in Ulm, Germany in 1879 and worked at Princeton University."
)
# {
# "PERS": ["Albert Einstein"],
# "LOC": ["Ulm", "Germany"],
# "ORG": ["Princeton University"],
# "DATE": ["1879"],
# }
# Downstream usage example:
persons = typed.get("PERS", [])
dates = typed.get("DATE", [])Class: MultiHopRAG
from fennec_community.rag.types.multi_hop import MultiHopRAGThe central orchestrator. Wraps any existing RAG backend and executes reasoning-guided iterative retrieval to answer complex questions that require synthesising information from multiple sources or reasoning steps.
__init__ (MultiHopRAG)
MultiHopRAG(
rag_system: Any,
max_hops: int = 3,
min_score_threshold: float = 0.3,
enable_query_decomposition: bool = True,
use_stanza_ner: bool = True,
language: str = "ar",
confidence_threshold: float = 0.75,
)Purpose: Initialises the Multi-Hop RAG system. Validates that a RAG system is provided, creates the QueryDecomposer, and initialises the internal statistics counter.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
rag_system |
Any |
— | Required. Any RAG backend that exposes .retrieve(query) -> List and .llm.generate(prompt, **kwargs) -> str. See RAG System Interface Contract. |
max_hops |
int |
3 |
Maximum number of retrieval hops to perform per query. Acts as a hard upper bound — the system may stop earlier via confidence-based early stopping. |
min_score_threshold |
float |
0.3 |
Minimum similarity/relevance score for a chunk to be accepted. Chunks with score below this value are filtered out at every hop. |
enable_query_decomposition |
bool |
True |
When True, composite questions are split into sub-queries by QueryDecomposer.decompose() before the hop loop begins. Each sub-query is used for one hop in order. |
use_stanza_ner |
bool |
True |
Whether to use Stanza NER for entity extraction. Passed directly to QueryDecomposer. Set False for fast startup or when Stanza is not installed. |
language |
str |
"ar" |
Language code for Stanza. Also controls the final LLM prompt language selection (auto-detected from the question text, so this primarily affects NER). |
confidence_threshold |
float |
0.75 |
Confidence level at which early stopping is triggered. When reasoning_state.confidence ≥ threshold and at least one fact has been found, the hop loop terminates early, saving unnecessary network calls. |
Raises: ValueError if rag_system is None.
Internal stats initialised at construction:
| Key | Initial value | Description |
|---|---|---|
total_queries |
0 |
Total calls to query() / aquery(). |
average_hops |
0 |
Rolling average number of hops executed per query. |
decomposed_queries |
0 |
Queries that were split into sub-queries. |
entities_extracted |
0 |
Cumulative count of entities extracted across all hops. |
early_stops |
0 |
Number of queries where early stopping was triggered. |
stanza_enabled |
bool |
Whether Stanza NER is active. |
Example:
from fennec_community.rag.types.multi_hop import MultiHopRAG
mh = MultiHopRAG(
rag_system=my_rag, # Required: any RAG backend
max_hops=4,
min_score_threshold=0.25,
enable_query_decomposition=True,
use_stanza_ner=True,
language="ar",
confidence_threshold=0.75,
)query
mh.query(
question: str,
hops: Optional[int] = None,
return_intermediate: bool = False,
) -> str | Dict[str, Any]Purpose: The primary synchronous query method. Executes the full multi-hop reasoning pipeline end-to-end: question analysis → query decomposition → iterative retrieval hops → chunk aggregation → reasoned answer generation. Returns either a plain answer string or a rich dict with the full reasoning trace, depending on return_intermediate.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
question |
str |
— | The natural-language question to answer. Returns a polite prompt string if empty or whitespace-only. |
hops |
Optional[int] |
None |
Override the instance-level max_hops for this specific query. When None, the instance's max_hops is used. |
return_intermediate |
bool |
False |
When False, returns a plain answer string. When True, returns a full Dict with answer, reasoning chain, per-hop details, knowledge state, chunk count, and stats. |
Returns:
str— the LLM-generated answer string whenreturn_intermediate=False.Dict[str, Any]— full reasoning trace dict whenreturn_intermediate=True. See Return Value Reference.strstarting with"Please enter..."whenquestionis empty or whitespace.strstarting with" Error: ..."when an unrecoverable internal exception occurs.
Execution pipeline (6 stages):
① _analyze_question_requirements() → question type, need flags, seed entities
② _prepare_queries() → [sub_query_1, sub_query_2, ...]
③ _perform_reasoning_hops() → List[HopResult], ReasoningState
④ _aggregate_chunks() → ranked, deduplicated (chunk, score) list
⑤ _generate_reasoned_answer() → context + prompt + LLM call
⑥ _update_stats() → rolling average hopsExample — simple:
answer = mh.query("What were the causes of World War I?")
print(answer)Example — with hop override:
# Use only 1 hop for a simple factual question
answer = mh.query("What is the capital of France?", hops=1)Example — with full trace:
result = mh.query(
"Compare the industrial revolutions of Britain and Germany",
return_intermediate=True,
)
print("Answer:", result["answer"])
print()
print("Reasoning chain:")
for step in result["reasoning_chain"]:
print(" •", step)
print()
print("Hops executed:", len(result["hops"]))
for hop in result["hops"]:
print(f" Hop {hop['hop_number']} [{hop['strategy']}]: {hop['query']}")
print(f" Chunks: {hop['chunks_found']} Top scores: {hop['top_scores']}")
print(f" Entities: {hop['entities']}")
print(f" Gap filled: {hop['gap_filled']}")
print()
print("Confidence:", result["knowledge_state"]["confidence"])
print("Known facts:", len(result["knowledge_state"]["known_facts"]))
print("Total chunks:", result["total_chunks"])aquery
async def aquery(
question: str,
max_hops: Optional[int] = None,
return_intermediate: bool = False,
) -> str | Dict[str, Any]Purpose: Async version of query(). Runs the synchronous query() in a thread-pool executor via asyncio.to_thread, ensuring the event loop is never blocked. Drop-in replacement for query() in async frameworks.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
question |
str |
— | The natural-language question to answer. |
max_hops |
Optional[int] |
None |
Per-call hop override. Maps to the hops parameter of query(). |
return_intermediate |
bool |
False |
When True, returns the full reasoning trace dict instead of a plain answer string. |
Returns: Same as query() — str or Dict[str, Any].
Context manager support: MultiHopRAG implements __aenter__ / __aexit__, so it can be used as an async context manager:
async with MultiHopRAG(rag_system=my_rag) as mh:
answer = await mh.aquery("Why did the Roman Empire fall?")
print(answer)FastAPI example:
from fastapi import FastAPI
from fennec_community.rag.types.multi_hop import MultiHopRAG
app = FastAPI()
mh = MultiHopRAG(rag_system=my_rag, max_hops=3, language="en")
@app.get("/ask")
async def ask(q: str, hops: int = None, trace: bool = False):
result = await mh.aquery(q, max_hops=hops, return_intermediate=trace)
if trace:
return result # dict with full trace
return {"answer": result} # plain string
@app.get("/ask/arabic")
async def ask_arabic(q: str):
return {"answer": await mh.aquery(q)}get_stats
mh.get_stats() -> Dict[str, Any]Purpose: Returns a snapshot of all operational statistics accumulated since the MultiHopRAG instance was created. Use for monitoring query throughput, average hop count, decomposition rate, entity extraction volume, early stopping frequency, and NER engine status.
Parameters: None.
Returns: Dict[str, Any] with the following keys:
| Key | Type | Description |
|---|---|---|
total_queries |
int |
Total number of queries processed by this instance. |
average_hops |
float |
Rolling average number of hops executed per query. Decreases as early stopping kicks in more often. |
decomposed_queries |
int |
Number of queries that were split into multiple sub-queries by QueryDecomposer.decompose(). |
entities_extracted |
int |
Cumulative count of named entities extracted across all hops and all queries. |
early_stops |
int |
Number of queries where the hop loop terminated early due to sufficient confidence. High values indicate efficient operation. |
stanza_enabled |
bool |
True if Stanza NER loaded successfully; False if regex fallback is in use. |
max_hops |
int |
The configured max_hops value for this instance. |
confidence_threshold |
float |
The configured early-stopping threshold. |
ner_method |
str |
Human-readable NER method label: "Stanza" or "Regex". |
Example:
# After running several queries
stats = mh.get_stats()
print(f"Total queries: {stats['total_queries']}")
print(f"Average hops: {stats['average_hops']:.2f}")
print(f"Early stops: {stats['early_stops']} "
f"({stats['early_stops']/max(stats['total_queries'],1):.0%} of queries)")
print(f"Decomposed queries: {stats['decomposed_queries']}")
print(f"Entities extracted: {stats['entities_extracted']}")
print(f"NER engine: {stats['ner_method']}")
print(f"Confidence threshold:{stats['confidence_threshold']}")Output example:
Total queries: 47
Average hops: 1.87
Early stops: 31 (66% of queries)
Decomposed queries: 12
Entities extracted: 284
NER engine: Stanza
Confidence threshold: 0.75Return Value Reference
Simple mode (return_intermediate=False)
Returns a str — the LLM-generated answer grounded strictly in the retrieved context. Auto-detected language (Arabic or English) matches the question language.
Full trace mode (return_intermediate=True)
Returns Dict[str, Any] with the following structure:
{
"answer": str, # LLM-generated final answer
"reasoning_chain": List[str], # Per-hop summaries, e.g.:
# ["hop 1: looking for '…'. found 4 facts. confidence: 60%. missing: causal relationship.",
# "hop 2: looking for '…'. found 6 facts. confidence: 80%. missing: none."]
"hops": List[Dict], # Per-hop details (see below)
"knowledge_state": {
"known_facts": List[str], # All relevant sentences discovered
"missing_info": List[str], # Remaining knowledge gaps (empty = complete)
"confidence": float, # Final confidence score [0.0, 1.0]
},
"total_chunks": int, # Total unique chunks collected across all hops
"stats": Dict, # Output of get_stats() at this moment
}Per-hop dict (inside "hops")
Each hop is represented as:
{
"hop_number": int, # 1-based hop index
"query": str, # Query sent to the RAG backend
"strategy": str, # HopStrategy value string
"reasoning_step": str, # Human-readable summary of this hop
"gap_filled": Optional[str],# Gap this hop targeted, or None
"chunks_found": int, # Number of accepted chunks
"entities": List[str], # Entities extracted from top chunks
"top_scores": List[float], # Relevance scores of top 3 chunks
}Reasoning Pipeline — Internal Flow
Understanding the full internal pipeline helps when debugging, tuning thresholds, or extending the system.
query(question)
│
├─ 1. _analyze_question_requirements(question)
│ ├─ _classify_question_type() → "comparison" | "causal" | "temporal" | "procedural" | "factual"
│ ├─ Flag: needs_comparison → any of [مقارنة, compare, difference, ...]
│ ├─ Flag: needs_causality → any of [لماذا, why, cause, ...]
│ ├─ Flag: needs_timeline → any of [متى, when, date, ...]
│ ├─ Flag: needs_multi_entity → any of [و, مع, and, both, ...]
│ └─ entities → QueryDecomposer.extract_entities(question)
│
├─ 2. _prepare_queries(question)
│ └─ decompose(question) → ["sub_q1", "sub_q2"] or ["question"]
│
├─ 3. _perform_reasoning_hops(question, sub_queries, num_hops, requirements)
│ └─ For each hop:
│ ├─ rag.retrieve(current_query)
│ ├─ _filter_chunks() → deduplicate + score filter
│ ├─ _extract_entities_from_chunks() → top-5 entities from top-3 chunks
│ ├─ _extract_facts_from_chunks() → relevant sentences
│ ├─ _update_reasoning_state()
│ │ ├─ known_facts += new facts
│ │ ├─ missing_info = _identify_missing_info()
│ │ ├─ confidence = _calculate_confidence()
│ │ └─ answer_complete = confidence ≥ threshold AND facts > 0
│ ├─ [EARLY STOP if answer_complete]
│ └─ _plan_next_hop() → next (query, strategy)
│
├─ 4. _aggregate_chunks(hop_results)
│ └─ weight = (1 / hop_number) + (0.1 if CLARIFICATION)
│ Sort by weighted_score → deduplicate by chunk_id
│
├─ 5. _generate_reasoned_answer(question, chunks, hop_results, reasoning_state)
│ ├─ _build_reasoned_context() → reasoning chain + facts + gaps + sources
│ ├─ _build_reasoned_prompt() → language-aware strict-grounding prompt
│ └─ rag.llm.generate(prompt, max_tokens=512)
│
└─ 6. _update_stats(num_hops)Hop Strategy Selection Logic
The system selects strategies automatically using this priority chain:
Initial strategy:
needs_comparison → RELATION_BRIDGING
needs_causality → CLARIFICATION
otherwise → ENTITY_EXPANSION
Next hop strategy (_plan_next_hop):
Priority 1: Use next pre-decomposed sub-query → ENTITY_EXPANSION
Priority 2: missing_info is non-empty → CLARIFICATION (gap-filling query)
Priority 3: comparison + missing entity → RELATION_BRIDGING (bridge query)
Priority 4: entities available → RELATION_BRIDGING (bridge query)
Priority 5: no path found → return None (stop hopping)Confidence & Early Stopping Model
The confidence score is computed after every hop:
base_confidence = min(len(known_facts) / 5.0 , 0.85)
gap_penalty = len(missing_info) × 0.15
confidence = max(0.0, base_confidence - gap_penalty)
# Bonus: if no gaps and ≥ 3 facts
if not missing_info and len(known_facts) >= 3:
confidence = max(confidence, 0.8)
confidence = min(confidence, 1.0)Early stopping fires when:
answer_complete = (confidence >= confidence_threshold AND len(known_facts) > 0)
OR (len(missing_info) == 0 AND len(known_facts) > 0)Confidence progression example (3-fact question, threshold = 0.75):
| Hop | Facts found | Gaps | Confidence | Stop? |
|---|---|---|---|---|
| 1 | 2 | 1 | min(2/5, 0.85) − 0.15 = 0.25 |
No |
| 2 | 4 | 1 | min(4/5, 0.85) − 0.15 = 0.65 |
No |
| 3 | 5 | 0 | max(0.85, 0.8) = 0.85 |
Yes |
NER Engine: Stanza vs Regex Fallback
| Feature | Stanza NER | Regex Fallback |
|---|---|---|
| Entity types | Full typed (PERS, ORG, LOC, DATE, …) | Untyped (quoted, CamelCase, numbers) |
| Importance scoring | Multi-factor (type weight + length + frequency + script + digits) | Length only |
| Arabic support | Full morphological analysis | Quoted text and numbers only |
| English support | Full NER | CamelCase proper nouns |
get_entities_by_type |
✅ Full type grouping | ❌ Returns {} |
| Installation | pip install stanza + model download |
No extras needed |
| Startup time | ~5–15 s (model load) | Instant |
| Runtime accuracy | High | Low–Medium |
Recommendation: Use Stanza in production for Arabic text. The regex fallback is adequate for English queries in non-critical or development environments.
RAG System Interface Contract
MultiHopRAG wraps any RAG object that satisfies this interface:
Required — .retrieve(query)
def retrieve(query: str) -> List[Tuple[chunk, float]] | List[chunk]:
"""
Retrieve relevant chunks for a query.
Returns either:
- List of (chunk, score) tuples ← preferred
- List of chunk objects ← score defaults to 1.0
Each chunk object must have:
- chunk.chunk_id : str (unique identifier — chunks without this are silently skipped)
- chunk.text : str (text content for context building)
- chunk.doc_id : str (document identifier for context headers)
"""Optional but recommended — .llm.generate(prompt, **kwargs)
def generate(prompt: str, max_tokens: int = 512, **kwargs) -> str:
"""
Generate an answer from a prompt.
Required for answer generation. Without it, query() returns
"Language model not available".
"""Minimal compatible RAG example:
from dataclasses import dataclass
@dataclass
class Chunk:
chunk_id: str
text: str
doc_id: str
class MinimalRAG:
def __init__(self):
self.llm = self # expose .llm
def retrieve(self, query: str):
# Return (chunk, score) tuples
return [(Chunk("c1", "Paris is the capital of France.", "doc_1"), 0.92)]
def generate(self, prompt: str, max_tokens: int = 512, **kwargs) -> str:
return "Answer based on context..."
rag = MinimalRAG()
mh = MultiHopRAG(rag_system=rag)Language Detection & Prompt Templates
The final LLM prompt is built by _build_reasoned_prompt(). Language is auto-detected from the question text by counting Arabic Unicode characters (\u0600–\u06FF):
arabic_ratio = arabic_chars / len(question)
language = "ar" if arabic_ratio > 0.2 else "en"Both prompt templates include:
- Reasoning chain — all per-hop summaries.
- Discovered facts — up to 8 relevant sentences.
- Missing info warnings — explicit gap list when gaps remain.
- Supporting sources — up to 8 top chunks with scores and doc IDs.
- Low-confidence warning — appended to the question when
confidence < 0.75. - Strict grounding rules — no inference beyond the provided context; explicit instruction to say "not available" when information is absent.
Complete Examples
Example 1 — Basic multi-hop query
from fennec_community.rag.types.multi_hop import MultiHopRAG
from fennec_community.rag.core import RAGSystem
my_rag=RAGSystem()
mh = MultiHopRAG(
rag_system=my_rag,
max_hops=3,
language="ar",
confidence_threshold=0.75,
)
answer = mh.query("ما هي أسباب الثورة الفرنسية وما نتائجها؟")
print(answer)Example 2 — Full reasoning trace
result = mh.query(
"Compare the economic impacts of World War I and World War II on Europe",
return_intermediate=True,
)
print("=" * 60)
print("ANSWER")
print("=" * 60)
print(result["answer"])
print("\n" + "=" * 60)
print("REASONING CHAIN")
print("=" * 60)
for step in result["reasoning_chain"]:
print("•", step)
print("\n" + "=" * 60)
print("HOP DETAILS")
print("=" * 60)
for hop in result["hops"]:
print(f"\n[Hop {hop['hop_number']}] Strategy: {hop['strategy']}")
print(f" Query: {hop['query']}")
print(f" Chunks: {hop['chunks_found']}")
print(f" Entities: {hop['entities']}")
print(f" Top scores: {[f'{s:.3f}' for s in hop['top_scores']]}")
print(f" Gap filled: {hop['gap_filled']}")
print(f" Reasoning: {hop['reasoning_step']}")
print("\n" + "=" * 60)
print("KNOWLEDGE STATE")
print("=" * 60)
ks = result["knowledge_state"]
print(f"Confidence: {ks['confidence']:.0%}")
print(f"Known facts: {len(ks['known_facts'])}")
print(f"Missing info: {ks['missing_info'] or 'None'}")
print(f"Total chunks: {result['total_chunks']}")Example 3 — Async usage in FastAPI
from fastapi import FastAPI, Query
from fennec_community.rag.types.multi_hop import MultiHopRAG
app = FastAPI()
mh = MultiHopRAG(
rag_system=my_rag,
max_hops=3,
language="ar",
confidence_threshold=0.75,
)
@app.get("/query")
async def answer_question(
q: str,
hops: int = Query(default=None, ge=1, le=5),
trace: bool = False,
):
result = await mh.aquery(q, max_hops=hops, return_intermediate=trace)
if trace:
return result
return {"answer": result}
@app.get("/stats")
def system_stats():
return mh.get_stats()Example 4 — Standalone QueryDecomposer
from fennec_community.rag.types.multi_hop import QueryDecomposer
# Arabic, NER-powered
decomposer = QueryDecomposer(use_stanza=True, language="ar")
# Decompose
parts = decomposer.decompose("من هو ابن سينا ثم ما أهم مؤلفاته؟")
print(parts)
# ["من هو ابن سينا", "ما أهم مؤلفاته؟"]
# Extract entities (flat, ranked)
entities = decomposer.extract_entities(
"أسس جيف بيزوس شركة أمازون عام 1994 في مدينة سياتل الأمريكية."
)
print(entities)
# ["جيف بيزوس", "أمازون", "سياتل", "1994"]
# Extract entities by type (requires Stanza)
typed = decomposer.get_entities_by_type(
"أسس جيف بيزوس شركة أمازون عام 1994 في مدينة سياتل الأمريكية."
)
print(typed)
# {"PERS": ["جيف بيزوس"], "ORG": ["أمازون"], "LOC": ["سياتل"], "DATE": ["1994"]}Example 5 — Monitoring and observability
import time
mh = MultiHopRAG(rag_system=my_rag, max_hops=3, confidence_threshold=0.75)
questions = [
"What is machine learning?",
"Why did the Ottoman Empire fall?",
"Compare GPU and CPU for deep learning",
"When was the Eiffel Tower built and by whom?",
]
start = time.time()
for q in questions:
answer = mh.query(q)
print(f"Q: {q[:50]}")
print(f"A: {answer[:100]}...\n")
elapsed = time.time() - start
stats = mh.get_stats()
print("\n--- System Stats ---")
print(f"Total queries: {stats['total_queries']}")
print(f"Average hops/query: {stats['average_hops']:.2f}")
print(f"Early stopping rate: {stats['early_stops']}/{stats['total_queries']} "
f"({stats['early_stops']/stats['total_queries']:.0%})")
print(f"Total time: {elapsed:.1f}s")
print(f"Avg time/query: {elapsed/stats['total_queries']:.1f}s")
print(f"NER method: {stats['ner_method']}")Example 6 — Custom hop count per question type
from fennec_community.rag.types.multi_hop import MultiHopRAG
mh = MultiHopRAG(rag_system=my_rag, max_hops=5, confidence_threshold=0.75)
# Simple factual — 1 hop is enough
simple = mh.query("What is the capital of Japan?", hops=1)
# Causal — needs 2-3 hops
causal = mh.query("Why did the 2008 financial crisis happen?", hops=3)
# Complex comparison — use maximum hops
comparison = mh.query(
"Compare the AI research output of the US and China between 2020 and 2024",
hops=5,
return_intermediate=True,
)
print("Comparison answer confidence:", comparison["knowledge_state"]["confidence"])Example 7 — Async context manager
import asyncio
from fennec_community.rag.types.multi_hop import MultiHopRAG
async def batch_query(questions: list[str]) -> list[str]:
async with MultiHopRAG(rag_system=my_rag, max_hops=3) as mh:
results = []
for q in questions:
answer = await mh.aquery(q, return_intermediate=False)
results.append(answer)
return results
answers = asyncio.run(batch_query([
"What is quantum computing?",
"Why is the sky blue?",
"When was the first iPhone released?",
]))
for q, a in zip(questions, answers):
print(f"Q: {q}\nA: {a}\n")Simple Real Example
from fennec_community.llm import MistralInterface
from fennec_community.document_loaders import TextLoader
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem
from fennec_community.rag.types.multi_hop import MultiHopRAG
loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = MistralInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
rag_system.add_documents(loader_1)
multi_hop = MultiHopRAG(
rag_system=rag_system,
max_hops=3,
confidence_threshold=0.75,
language="ar",
)
answer = multi_hop.query("ماهي طرق الدفع المتاحة؟")
print(answer)community/rag/multi_hop.md