Fennec Community community/rag/multi_hop.md

MultiHop-RAG Modular `multi_hop` — Enterprise API Reference

Overview
Architecture
Quick Start
Enumeration: HopStrategy
Data Models
- ReasoningState
- HopResult
Class: QueryDecomposer
Class: MultiHopRAG
Return Value Reference
Reasoning Pipeline — Internal Flow
Hop Strategy Selection Logic
Confidence & Early Stopping Model
NER Engine: Stanza vs Regex Fallback
RAG System Interface Contract
Language Detection & Prompt Templates
Complete Examples

Overview

multi_hop is a Reasoning-Guided Multi-Hop RAG layer that wraps any existing RAG system and gives it the ability to answer complex, multi-faceted questions through iterative, self-directed retrieval. Instead of a single retrieve-then-answer pass, MultiHopRAG plans its search at every step, tracks what it has learned so far, identifies knowledge gaps, and adaptively chooses the next query — stopping early as soon as it has collected sufficient evidence.

Key capabilities at a glance:

Capability	Detail
Reasoning-guided hops	Each hop asks "What do I know? What am I still missing?" before planning the next search
Smart early stopping	Stops automatically when confidence ≥ threshold — no wasted hops
Query decomposition	Splits composite questions into sequential sub-queries
NER-powered entity extraction	Uses Stanza NER (Arabic & English) with transparent regex fallback
Adaptive strategy selection	Chooses `ENTITY_EXPANSION`, `RELATION_BRIDGING`, `CLARIFICATION`, or `VERIFICATION` per hop
Confidence scoring	Quantitative confidence track per hop, used for early stopping
Bilingual prompting	Auto-detects Arabic vs English and selects the correct strict-grounding prompt
Async support	Non-blocking `aquery` for FastAPI / asyncio pipelines
Intermediate results	Full reasoning chain, hop details, and knowledge state accessible on demand

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                          MultiHopRAG                                  │
│                                                                        │
│  ┌───────────────────┐        ┌──────────────────────────────────┐    │
│  │  QueryDecomposer  │        │        ReasoningState            │    │
│  │  (Stanza / Regex) │        │  known_facts · missing_info      │    │
│  └────────┬──────────┘        │  reasoning_chain · confidence    │    │
│           │ decompose()       └──────────────┬───────────────────┘    │
│           │                                  │ updated every hop      │
│           ▼                                  │                        │
│  ┌────────────────────────────────────────────────────────────────┐   │
│  │                   Reasoning Hop Loop                            │   │
│  │                                                                  │   │
│  │  ① _analyze_question_requirements  → question type + slots     │   │
│  │  ② _select_initial_strategy        → HopStrategy               │   │
│  │  ③ rag.retrieve(query)             → raw chunks                 │   │
│  │  ④ _filter_chunks                  → deduplicated + scored      │   │
│  │  ⑤ _extract_entities_from_chunks  → NER entities               │   │
│  │  ⑥ _update_reasoning_state        → facts + gaps + confidence  │   │
│  │  ⑦ Early stop check               → break if answer_complete   │   │
│  │  ⑧ _plan_next_hop                 → next query + strategy      │   │
│  └────────────────────────────────────────────────────────────────┘   │
│           │                                                            │
│           ▼                                                            │
│  _aggregate_chunks → _build_reasoned_context → _build_reasoned_prompt │
│           │                                                            │
│           ▼                                                            │
│        rag.llm.generate(prompt)  →  Final Answer                      │
└──────────────────────────────────────────────────────────────────────┘

Quick Start

from fennec_community.rag.types.multi_hop import MultiHopRAG, HopStrategy

# Wrap any RAG system that exposes .retrieve() and .llm.generate()
mh = MultiHopRAG(
    rag_system=my_rag,
    max_hops=3,
    language="ar",
    confidence_threshold=0.75,
)

# Simple answer
answer = mh.query("What caused the French Revolution and what were its results?")
print(answer)

# Full reasoning trace
result = mh.query(
    "Compare the economic policies of Germany and France after 2008",
    return_intermediate=True,
)
print(result["answer"])
print(result["reasoning_chain"])

# Async usage
answer = await mh.aquery("Why did the Ottoman Empire collapse?")

Enumeration: `HopStrategy`

from fennec_community.rag.types.multi_hop import HopStrategy

Controls the search strategy used at each individual hop. The system selects the strategy automatically based on the question type and the current knowledge state, but you can inspect it in intermediate results.

class HopStrategy(Enum):
    ENTITY_EXPANSION  = "entity_expansion"
    RELATION_BRIDGING = "relation_bridging"
    CLARIFICATION     = "clarification"
    VERIFICATION      = "verification"

Member	Value	When used	Description
`ENTITY_EXPANSION`	`"entity_expansion"`	Default first hop; when predefined sub-queries are available	Expands the search around known entities extracted from the query or previous chunks. The most general strategy.
`RELATION_BRIDGING`	`"relation_bridging"`	Comparison questions; when only one of two required entities has been found	Bridges two entities or concepts by constructing a query that connects them. Used to close the gap in comparative reasoning.
`CLARIFICATION`	`"clarification"`	When a clear knowledge gap is identified; causal questions	Builds a targeted query aimed at filling a specific named gap — e.g., missing causality, missing date, or missing entity info. Also used for `why` / `cause` type questions. Gets a +0.1 score bonus during chunk aggregation.
`VERIFICATION`	`"verification"`	Reserved	Planned for cross-checking facts from multiple sources. Not yet auto-selected; available for manual use in future extensions.

Data Models

`ReasoningState`

from fennec_community.rag.types.multi_hop import ReasoningState

A @dataclass that tracks the accumulated knowledge state across all hops for a single query. Updated after every hop. Passed into the final prompt builder to give the LLM full situational awareness.

@dataclass
class ReasoningState:
    known_facts:     List[str] = []     # Discovered relevant sentences
    missing_info:    List[str] = []     # Named knowledge gaps still open
    reasoning_chain: List[str] = []     # Per-hop reasoning step summaries
    confidence:      float     = 0.0    # Current answer completeness estimate [0, 1]
    answer_complete: bool      = False  # True when confidence ≥ threshold and facts > 0

Field	Type	Description
`known_facts`	`List[str]`	List of relevant sentences extracted from retrieved chunks so far. Grows with each hop.
`missing_info`	`List[str]`	Named knowledge gaps that have not yet been filled (e.g., `"causal relationship"`, `"specific date or time"`, `"info about France"`). Updated after each hop.
`reasoning_chain`	`List[str]`	Human-readable summary of what each hop searched for, how many facts it found, the current confidence, and what's still missing.
`confidence`	`float`	Quantitative estimate of answer completeness in `[0.0, 1.0]`. Computed as `min(facts/5, 0.85) − (gaps × 0.15)`.
`answer_complete`	`bool`	`True` when `confidence ≥ confidence_threshold AND known_facts is non-empty`, or when there are no remaining gaps. Triggers early stopping.

Usage (read-only — managed by MultiHopRAG):

result = mh.query("Why did Rome fall?", return_intermediate=True)

state = result["knowledge_state"]
print("Known facts:", state["known_facts"])
print("Missing info:", state["missing_info"])
print("Confidence:", state["confidence"])

`HopResult`

from fennec_community.rag.types.multi_hop import HopResult

A @dataclass capturing everything that happened during one individual hop — which query was used, what was found, the strategy applied, and what gap (if any) was filled.

@dataclass
class HopResult:
    query:               str               # The search query used for this hop
    chunks:              List[Tuple]        # Retrieved (chunk, score) pairs
    hop_number:          int                # 1-based hop index
    extracted_entities:  List[str]          # Entities extracted from this hop's chunks
    strategy:            HopStrategy        # Strategy used for this hop
    reasoning_step:      str               # Human-readable summary of this hop
    gap_filled:          Optional[str]      # The knowledge gap this hop targeted, if any

Field	Type	Description
`query`	`str`	The exact query string that was sent to the RAG backend for this hop.
`chunks`	`List[Tuple]`	List of `(chunk, score)` pairs returned from the RAG backend after deduplication and score filtering.
`hop_number`	`int`	1-based index of this hop within the current query's hop sequence.
`extracted_entities`	`List[str]`	Up to 5 unique entities extracted from the top 3 chunks of this hop. Used to plan the next hop.
`strategy`	`HopStrategy`	The `HopStrategy` member that governed how this hop's query was constructed.
`reasoning_step`	`str`	Formatted string: `"hop N: looking for '…'. found M facts. confidence: X%. missing: …"`.
`gap_filled`	`Optional[str]`	If this hop was a gap-filling hop, the name of the gap it targeted (from `ReasoningState.missing_info[0]`). `None` for expansion hops.

Usage (read-only — produced by MultiHopRAG):

result = mh.query("Who invented the telephone and when?", return_intermediate=True)

for hop in result["hops"]:
    print(f"Hop {hop['hop_number']} [{hop['strategy']}]: {hop['query']}")
    print(f"  Chunks: {hop['chunks_found']}  Entities: {hop['entities']}")
    print(f"  Scores: {hop['top_scores']}")
    print(f"  Gap filled: {hop['gap_filled']}")

Class: `QueryDecomposer`

from fennec_community.rag.types.multi_hop import QueryDecomposer

A bilingual (Arabic / English) query analysis engine that provides two core capabilities: decomposing composite questions into sequential sub-queries, and extracting named entities from text using either Stanza NER (when installed) or a regex fallback.

Can be used independently of MultiHopRAG for standalone NLP preprocessing tasks.

`init` (QueryDecomposer)

QueryDecomposer(
    use_stanza: bool = True,
    language:   str  = "ar",
)

Purpose: Initialises the query decomposer and attempts to load a Stanza NLP pipeline for the specified language. If Stanza is unavailable or fails to load, the instance transparently falls back to regex-based extraction — no exception is raised.

Parameters:

Parameter	Type	Default	Description
`use_stanza`	`bool`	`True`	Whether to attempt loading a Stanza NLP pipeline. Set to `False` to force regex-only mode (faster startup, no model download needed).
`language`	`str`	`"ar"`	Language code for the Stanza pipeline. Supported: `"ar"` (Arabic), `"en"` (English), and any other Stanza-supported language code.

Returns: QueryDecomposer instance.

NER mode resolution at startup:

use_stanza=True  AND  stanza installed  AND  model loads  →  Stanza NER (best accuracy)
use_stanza=True  AND  stanza installed  AND  model fails  →  Regex fallback (logged warning)
use_stanza=True  AND  stanza NOT installed               →  Regex fallback (logged warning)
use_stanza=False                                          →  Regex fallback (explicit)

Installing Stanza for Arabic:

pip install stanza
python -c "import stanza; stanza.download('ar')"

Example:

# Best accuracy (requires stanza + model)
decomposer = QueryDecomposer(use_stanza=True, language="ar")

# English, Stanza NER
decomposer_en = QueryDecomposer(use_stanza=True, language="en")

# Fast startup, no model required
decomposer_fast = QueryDecomposer(use_stanza=False)

`decompose`

decomposer.decompose(query: str) -> List[str]

Purpose: Splits a composite question into a list of sequential sub-queries by scanning for language-specific composite indicator keywords (e.g., "ثم", "وأيضاً", "then", "also", "because", "لماذا"). Supports Arabic in all major dialects (Modern Standard Arabic, Egyptian, Levantine, Gulf) as well as English.

If no composite indicator is found, the original query is returned as a single-element list — the method always returns a list.

Parameters:

Parameter	Type	Description
`query`	`str`	The natural-language question to analyse.

Returns: List[str] — either a list of two sub-query strings (when a composite indicator is found), or [query] (when no indicator is found and the question is treated as atomic).

Supported indicator categories:

Category	Examples (Arabic)	Examples (English)
Sequential	`"ثم"`, `"بعد ذلك"`, `"ومن ثم"`	`"then"`, `"after that"`
Additive	`"وأيضاً"`, `"كذلك"`, `"بالإضافة"`	`"also"`, `"in addition"`
Causal	`"لماذا"`, `"ما السبب"`	`"why"`, `"cause"`
Conditional	`"إذا"`, `"في حالة"`	`"if"`, `"in case"`
Comparative	`"مقارنةً بـ"`, `"بينما"`	`"while"`, `"compared to"`
Procedural	`"كيف"`, `"ما الطريقة"`	`"how"`
Dialectal (EG)	`"وبعدين"`, `"كمان"`, `"علشان كده"`	—
Dialectal (Levantine)	`"عقبها"`, `"بعد هيك"`, `"ليش"`	—
Dialectal (Gulf)	`"عقب هيك"`, `"شلون"`, `"عشان هيك"`	—

Example:

# Composite question → two sub-queries
parts = decomposer.decompose("من هو أينشتاين ثم ما أهم اكتشافاته؟")
# ["من هو أينشتاين", "ما أهم اكتشافاته؟"]

# Atomic question → single element
parts = decomposer.decompose("What is the capital of France?")
# ["What is the capital of France?"]

`extract_entities`

decomposer.extract_entities(text: str) -> List[str]

Purpose: Extracts the most important named entities from a piece of text. Automatically uses Stanza NER when available (produces typed, scored entities), or falls back to regex-based extraction (quoted strings, English proper nouns, and numbers). Always returns the top 10 entities at most.

This is the primary NER method used by MultiHopRAG both during question analysis and during chunk processing at each hop.

Parameters:

Parameter	Type	Description
`text`	`str`	Any text: a query string, a retrieved chunk, or a document sentence. Texts shorter than 3 characters are returned as an empty list immediately.

Returns: List[str] — list of entity strings, sorted by importance score descending (Stanza mode) or by length descending (regex mode). Maximum 10 entries.

Stanza mode — entity importance scoring formula:

score = entity_type_weight            # base weight by type (see table below)
      + (len(entity_text) / 10)      # longer = more specific
      + (count_in_text × 0.5)        # frequency bonus
      + 1.0 (if contains English chars)
      + 0.5 (if contains digits)

Entity type weights (Stanza mode):

Type	Weight	Meaning
`PERS` / `PER`	3.0	Persons — highest priority
`ORG`	2.5	Organizations
`LOC` / `GPE`	2.0	Locations / Geo-political entities
`DATE` / `MONEY`	1.5	Dates and monetary amounts
`QUANTITY`	1.3	Quantities
`TIME` / `PERCENT`	1.2	Times and percentages
`CARDINAL`	1.1	Cardinal numbers
`ORDINAL`	1.0	Ordinal numbers

Regex mode — extraction rules:

Priority	Pattern	Example
1	Quoted text `"…"` or `«…»`	`"الثورة الفرنسية"`
2	English `TitleCase` words	`"Napoleon Bonaparte"`
3	Numeric tokens	`"1789"`, `"95"`

Results from regex mode are deduplicated and sorted by string length (longer = more specific).

Example:

# With Stanza (Arabic)
entities = decomposer.extract_entities(
    "ولد نابليون بونابرت عام 1769 في جزيرة كورسيكا وأصبح إمبراطوراً لفرنسا."
)
# ["نابليون بونابرت", "كورسيكا", "فرنسا", "1769"]

# Without Stanza (regex fallback)
entities = decomposer.extract_entities(
    'The "Eiffel Tower" was built in 1889 by Gustave Eiffel in Paris.'
)
# ["Eiffel Tower", "Gustave Eiffel", "Paris", "1889"]

`get_entities_by_type`

decomposer.get_entities_by_type(text: str) -> Dict[str, List[str]]

Purpose: Extracts entities from text and returns them grouped by their Stanza NER type rather than as a flat ranked list. Use this for fine-grained analysis — e.g., to separately access all persons, all locations, and all dates mentioned in a text.

⚠️ Requires Stanza: Returns an empty dictionary {} when Stanza is unavailable or failed to load.

Parameters:

Parameter	Type	Description
`text`	`str`	The text to extract typed entities from.

Returns: Dict[str, List[str]] — keys are Stanza NER type labels (e.g., "PERS", "LOC", "ORG", "DATE"), values are deduplicated lists of entity strings of that type. Returns {} if Stanza is not available or an error occurs.

Example:

typed = decomposer.get_entities_by_type(
    "Albert Einstein was born in Ulm, Germany in 1879 and worked at Princeton University."
)
# {
#   "PERS": ["Albert Einstein"],
#   "LOC":  ["Ulm", "Germany"],
#   "ORG":  ["Princeton University"],
#   "DATE": ["1879"],
# }

# Downstream usage example:
persons = typed.get("PERS", [])
dates   = typed.get("DATE", [])

Class: `MultiHopRAG`

from fennec_community.rag.types.multi_hop import MultiHopRAG

The central orchestrator. Wraps any existing RAG backend and executes reasoning-guided iterative retrieval to answer complex questions that require synthesising information from multiple sources or reasoning steps.

`init` (MultiHopRAG)

MultiHopRAG(
    rag_system:                  Any,
    max_hops:                    int   = 3,
    min_score_threshold:         float = 0.3,
    enable_query_decomposition:  bool  = True,
    use_stanza_ner:              bool  = True,
    language:                    str   = "ar",
    confidence_threshold:        float = 0.75,
)

Purpose: Initialises the Multi-Hop RAG system. Validates that a RAG system is provided, creates the QueryDecomposer, and initialises the internal statistics counter.

Parameters:

Parameter	Type	Default	Description
`rag_system`	`Any`	—	Required. Any RAG backend that exposes `.retrieve(query) -> List` and `.llm.generate(prompt, **kwargs) -> str`. See RAG System Interface Contract.
`max_hops`	`int`	`3`	Maximum number of retrieval hops to perform per query. Acts as a hard upper bound — the system may stop earlier via confidence-based early stopping.
`min_score_threshold`	`float`	`0.3`	Minimum similarity/relevance score for a chunk to be accepted. Chunks with score below this value are filtered out at every hop.
`enable_query_decomposition`	`bool`	`True`	When `True`, composite questions are split into sub-queries by `QueryDecomposer.decompose()` before the hop loop begins. Each sub-query is used for one hop in order.
`use_stanza_ner`	`bool`	`True`	Whether to use Stanza NER for entity extraction. Passed directly to `QueryDecomposer`. Set `False` for fast startup or when Stanza is not installed.
`language`	`str`	`"ar"`	Language code for Stanza. Also controls the final LLM prompt language selection (auto-detected from the question text, so this primarily affects NER).
`confidence_threshold`	`float`	`0.75`	Confidence level at which early stopping is triggered. When `reasoning_state.confidence ≥ threshold` and at least one fact has been found, the hop loop terminates early, saving unnecessary network calls.

Raises: ValueError if rag_system is None.

Internal stats initialised at construction:

Key	Initial value	Description
`total_queries`	`0`	Total calls to `query()` / `aquery()`.
`average_hops`	`0`	Rolling average number of hops executed per query.
`decomposed_queries`	`0`	Queries that were split into sub-queries.
`entities_extracted`	`0`	Cumulative count of entities extracted across all hops.
`early_stops`	`0`	Number of queries where early stopping was triggered.
`stanza_enabled`	`bool`	Whether Stanza NER is active.

Example:

from fennec_community.rag.types.multi_hop import MultiHopRAG

mh = MultiHopRAG(
    rag_system=my_rag,         # Required: any RAG backend
    max_hops=4,
    min_score_threshold=0.25,
    enable_query_decomposition=True,
    use_stanza_ner=True,
    language="ar",
    confidence_threshold=0.75,
)

`query`

mh.query(
    question:            str,
    hops:                Optional[int] = None,
    return_intermediate: bool          = False,
) -> str | Dict[str, Any]

Purpose: The primary synchronous query method. Executes the full multi-hop reasoning pipeline end-to-end: question analysis → query decomposition → iterative retrieval hops → chunk aggregation → reasoned answer generation. Returns either a plain answer string or a rich dict with the full reasoning trace, depending on return_intermediate.

Parameters:

Parameter	Type	Default	Description
`question`	`str`	—	The natural-language question to answer. Returns a polite prompt string if empty or whitespace-only.
`hops`	`Optional[int]`	`None`	Override the instance-level `max_hops` for this specific query. When `None`, the instance's `max_hops` is used.
`return_intermediate`	`bool`	`False`	When `False`, returns a plain answer string. When `True`, returns a full `Dict` with answer, reasoning chain, per-hop details, knowledge state, chunk count, and stats.

Returns:

str — the LLM-generated answer string when return_intermediate=False.
Dict[str, Any] — full reasoning trace dict when return_intermediate=True. See Return Value Reference.
str starting with "Please enter..." when question is empty or whitespace.
str starting with " Error: ..." when an unrecoverable internal exception occurs.

Execution pipeline (6 stages):

① _analyze_question_requirements()  → question type, need flags, seed entities
② _prepare_queries()                → [sub_query_1, sub_query_2, ...]
③ _perform_reasoning_hops()         → List[HopResult], ReasoningState
④ _aggregate_chunks()               → ranked, deduplicated (chunk, score) list
⑤ _generate_reasoned_answer()       → context + prompt + LLM call
⑥ _update_stats()                   → rolling average hops

Example — simple:

answer = mh.query("What were the causes of World War I?")
print(answer)

Example — with hop override:

# Use only 1 hop for a simple factual question
answer = mh.query("What is the capital of France?", hops=1)

Example — with full trace:

result = mh.query(
    "Compare the industrial revolutions of Britain and Germany",
    return_intermediate=True,
)

print("Answer:", result["answer"])
print()
print("Reasoning chain:")
for step in result["reasoning_chain"]:
    print(" •", step)
print()
print("Hops executed:", len(result["hops"]))
for hop in result["hops"]:
    print(f"  Hop {hop['hop_number']} [{hop['strategy']}]: {hop['query']}")
    print(f"    Chunks: {hop['chunks_found']}  Top scores: {hop['top_scores']}")
    print(f"    Entities: {hop['entities']}")
    print(f"    Gap filled: {hop['gap_filled']}")
print()
print("Confidence:", result["knowledge_state"]["confidence"])
print("Known facts:", len(result["knowledge_state"]["known_facts"]))
print("Total chunks:", result["total_chunks"])

`aquery`

async def aquery(
    question:            str,
    max_hops:            Optional[int] = None,
    return_intermediate: bool          = False,
) -> str | Dict[str, Any]

Purpose: Async version of query(). Runs the synchronous query() in a thread-pool executor via asyncio.to_thread, ensuring the event loop is never blocked. Drop-in replacement for query() in async frameworks.

Parameters:

Parameter	Type	Default	Description
`question`	`str`	—	The natural-language question to answer.
`max_hops`	`Optional[int]`	`None`	Per-call hop override. Maps to the `hops` parameter of `query()`.
`return_intermediate`	`bool`	`False`	When `True`, returns the full reasoning trace dict instead of a plain answer string.

Returns: Same as query() — str or Dict[str, Any].

Context manager support: MultiHopRAG implements __aenter__ / __aexit__, so it can be used as an async context manager:

async with MultiHopRAG(rag_system=my_rag) as mh:
    answer = await mh.aquery("Why did the Roman Empire fall?")
    print(answer)

FastAPI example:

from fastapi import FastAPI
from fennec_community.rag.types.multi_hop import MultiHopRAG

app = FastAPI()
mh = MultiHopRAG(rag_system=my_rag, max_hops=3, language="en")

@app.get("/ask")
async def ask(q: str, hops: int = None, trace: bool = False):
    result = await mh.aquery(q, max_hops=hops, return_intermediate=trace)
    if trace:
        return result          # dict with full trace
    return {"answer": result}  # plain string

@app.get("/ask/arabic")
async def ask_arabic(q: str):
    return {"answer": await mh.aquery(q)}

`get_stats`

mh.get_stats() -> Dict[str, Any]

Purpose: Returns a snapshot of all operational statistics accumulated since the MultiHopRAG instance was created. Use for monitoring query throughput, average hop count, decomposition rate, entity extraction volume, early stopping frequency, and NER engine status.

Parameters: None.

Returns: Dict[str, Any] with the following keys:

Key	Type	Description
`total_queries`	`int`	Total number of queries processed by this instance.
`average_hops`	`float`	Rolling average number of hops executed per query. Decreases as early stopping kicks in more often.
`decomposed_queries`	`int`	Number of queries that were split into multiple sub-queries by `QueryDecomposer.decompose()`.
`entities_extracted`	`int`	Cumulative count of named entities extracted across all hops and all queries.
`early_stops`	`int`	Number of queries where the hop loop terminated early due to sufficient confidence. High values indicate efficient operation.
`stanza_enabled`	`bool`	`True` if Stanza NER loaded successfully; `False` if regex fallback is in use.
`max_hops`	`int`	The configured `max_hops` value for this instance.
`confidence_threshold`	`float`	The configured early-stopping threshold.
`ner_method`	`str`	Human-readable NER method label: `"Stanza"` or `"Regex"`.

Example:

# After running several queries
stats = mh.get_stats()

print(f"Total queries:       {stats['total_queries']}")
print(f"Average hops:        {stats['average_hops']:.2f}")
print(f"Early stops:         {stats['early_stops']} "
      f"({stats['early_stops']/max(stats['total_queries'],1):.0%} of queries)")
print(f"Decomposed queries:  {stats['decomposed_queries']}")
print(f"Entities extracted:  {stats['entities_extracted']}")
print(f"NER engine:          {stats['ner_method']}")
print(f"Confidence threshold:{stats['confidence_threshold']}")

Output example:

Total queries:        47
Average hops:         1.87
Early stops:          31 (66% of queries)
Decomposed queries:   12
Entities extracted:   284
NER engine:           Stanza
Confidence threshold: 0.75

Return Value Reference

Simple mode (`return_intermediate=False`)

Returns a str — the LLM-generated answer grounded strictly in the retrieved context. Auto-detected language (Arabic or English) matches the question language.

Full trace mode (`return_intermediate=True`)

Returns Dict[str, Any] with the following structure:

{
    "answer": str,                      # LLM-generated final answer

    "reasoning_chain": List[str],       # Per-hop summaries, e.g.:
    # ["hop 1: looking for '…'. found 4 facts. confidence: 60%. missing: causal relationship.",
    #  "hop 2: looking for '…'. found 6 facts. confidence: 80%. missing: none."]

    "hops": List[Dict],                 # Per-hop details (see below)

    "knowledge_state": {
        "known_facts": List[str],       # All relevant sentences discovered
        "missing_info": List[str],      # Remaining knowledge gaps (empty = complete)
        "confidence": float,            # Final confidence score [0.0, 1.0]
    },

    "total_chunks": int,                # Total unique chunks collected across all hops

    "stats": Dict,                      # Output of get_stats() at this moment
}

Per-hop dict (inside `"hops"`)

Each hop is represented as:

{
    "hop_number":   int,          # 1-based hop index
    "query":        str,          # Query sent to the RAG backend
    "strategy":     str,          # HopStrategy value string
    "reasoning_step": str,        # Human-readable summary of this hop
    "gap_filled":   Optional[str],# Gap this hop targeted, or None
    "chunks_found": int,          # Number of accepted chunks
    "entities":     List[str],    # Entities extracted from top chunks
    "top_scores":   List[float],  # Relevance scores of top 3 chunks
}

Reasoning Pipeline — Internal Flow

Understanding the full internal pipeline helps when debugging, tuning thresholds, or extending the system.

query(question)
│
├─ 1. _analyze_question_requirements(question)
│      ├─ _classify_question_type()       → "comparison" | "causal" | "temporal" | "procedural" | "factual"
│      ├─ Flag: needs_comparison          → any of [مقارنة, compare, difference, ...]
│      ├─ Flag: needs_causality           → any of [لماذا, why, cause, ...]
│      ├─ Flag: needs_timeline            → any of [متى, when, date, ...]
│      ├─ Flag: needs_multi_entity        → any of [و, مع, and, both, ...]
│      └─ entities                        → QueryDecomposer.extract_entities(question)
│
├─ 2. _prepare_queries(question)
│      └─ decompose(question)             → ["sub_q1", "sub_q2"] or ["question"]
│
├─ 3. _perform_reasoning_hops(question, sub_queries, num_hops, requirements)
│      └─ For each hop:
│           ├─ rag.retrieve(current_query)
│           ├─ _filter_chunks()           → deduplicate + score filter
│           ├─ _extract_entities_from_chunks()   → top-5 entities from top-3 chunks
│           ├─ _extract_facts_from_chunks()      → relevant sentences
│           ├─ _update_reasoning_state()
│           │    ├─ known_facts += new facts
│           │    ├─ missing_info = _identify_missing_info()
│           │    ├─ confidence  = _calculate_confidence()
│           │    └─ answer_complete = confidence ≥ threshold AND facts > 0
│           ├─ [EARLY STOP if answer_complete]
│           └─ _plan_next_hop()           → next (query, strategy)
│
├─ 4. _aggregate_chunks(hop_results)
│      └─ weight = (1 / hop_number) + (0.1 if CLARIFICATION)
│         Sort by weighted_score → deduplicate by chunk_id
│
├─ 5. _generate_reasoned_answer(question, chunks, hop_results, reasoning_state)
│      ├─ _build_reasoned_context()       → reasoning chain + facts + gaps + sources
│      ├─ _build_reasoned_prompt()        → language-aware strict-grounding prompt
│      └─ rag.llm.generate(prompt, max_tokens=512)
│
└─ 6. _update_stats(num_hops)

Hop Strategy Selection Logic

The system selects strategies automatically using this priority chain:

Initial strategy:
  needs_comparison  → RELATION_BRIDGING
  needs_causality   → CLARIFICATION
  otherwise         → ENTITY_EXPANSION

Next hop strategy (_plan_next_hop):
  Priority 1: Use next pre-decomposed sub-query → ENTITY_EXPANSION
  Priority 2: missing_info is non-empty         → CLARIFICATION  (gap-filling query)
  Priority 3: comparison + missing entity       → RELATION_BRIDGING (bridge query)
  Priority 4: entities available                → RELATION_BRIDGING (bridge query)
  Priority 5: no path found                     → return None (stop hopping)

Confidence & Early Stopping Model

The confidence score is computed after every hop:

base_confidence = min(len(known_facts) / 5.0 , 0.85)
gap_penalty     = len(missing_info) × 0.15
confidence      = max(0.0, base_confidence - gap_penalty)

# Bonus: if no gaps and ≥ 3 facts
if not missing_info and len(known_facts) >= 3:
    confidence = max(confidence, 0.8)

confidence = min(confidence, 1.0)

Early stopping fires when:

answer_complete = (confidence >= confidence_threshold AND len(known_facts) > 0)
              OR  (len(missing_info) == 0 AND len(known_facts) > 0)

Confidence progression example (3-fact question, threshold = 0.75):

Hop	Facts found	Gaps	Confidence	Stop?
1	2	1	`min(2/5, 0.85) − 0.15 = 0.25`	No
2	4	1	`min(4/5, 0.85) − 0.15 = 0.65`	No
3	5	0	`max(0.85, 0.8) = 0.85`	Yes

NER Engine: Stanza vs Regex Fallback

Feature	Stanza NER	Regex Fallback
Entity types	Full typed (PERS, ORG, LOC, DATE, …)	Untyped (quoted, CamelCase, numbers)
Importance scoring	Multi-factor (type weight + length + frequency + script + digits)	Length only
Arabic support	Full morphological analysis	Quoted text and numbers only
English support	Full NER	CamelCase proper nouns
`get_entities_by_type`	✅ Full type grouping	❌ Returns `{}`
Installation	`pip install stanza` + model download	No extras needed
Startup time	~5–15 s (model load)	Instant
Runtime accuracy	High	Low–Medium

Recommendation: Use Stanza in production for Arabic text. The regex fallback is adequate for English queries in non-critical or development environments.

RAG System Interface Contract

MultiHopRAG wraps any RAG object that satisfies this interface:

Required — `.retrieve(query)`

def retrieve(query: str) -> List[Tuple[chunk, float]] | List[chunk]:
    """
    Retrieve relevant chunks for a query.

    Returns either:
    - List of (chunk, score) tuples   ← preferred
    - List of chunk objects            ← score defaults to 1.0

    Each chunk object must have:
    - chunk.chunk_id : str    (unique identifier — chunks without this are silently skipped)
    - chunk.text     : str    (text content for context building)
    - chunk.doc_id   : str    (document identifier for context headers)
    """

Optional but recommended — `.llm.generate(prompt, **kwargs)`

def generate(prompt: str, max_tokens: int = 512, **kwargs) -> str:
    """
    Generate an answer from a prompt.
    Required for answer generation. Without it, query() returns
    "Language model not available".
    """

Minimal compatible RAG example:

from dataclasses import dataclass

@dataclass
class Chunk:
    chunk_id: str
    text:     str
    doc_id:   str

class MinimalRAG:
    def __init__(self):
        self.llm = self   # expose .llm

    def retrieve(self, query: str):
        # Return (chunk, score) tuples
        return [(Chunk("c1", "Paris is the capital of France.", "doc_1"), 0.92)]

    def generate(self, prompt: str, max_tokens: int = 512, **kwargs) -> str:
        return "Answer based on context..."

rag = MinimalRAG()
mh = MultiHopRAG(rag_system=rag)

Language Detection & Prompt Templates

The final LLM prompt is built by _build_reasoned_prompt(). Language is auto-detected from the question text by counting Arabic Unicode characters (\u0600–\u06FF):

arabic_ratio = arabic_chars / len(question)
language     = "ar" if arabic_ratio > 0.2 else "en"

Both prompt templates include:

Reasoning chain — all per-hop summaries.
Discovered facts — up to 8 relevant sentences.
Missing info warnings — explicit gap list when gaps remain.
Supporting sources — up to 8 top chunks with scores and doc IDs.
Low-confidence warning — appended to the question when confidence < 0.75.
Strict grounding rules — no inference beyond the provided context; explicit instruction to say "not available" when information is absent.

Complete Examples

Example 1 — Basic multi-hop query

from fennec_community.rag.types.multi_hop import MultiHopRAG
from fennec_community.rag.core import RAGSystem

my_rag=RAGSystem()
mh = MultiHopRAG(
    rag_system=my_rag,
    max_hops=3,
    language="ar",
    confidence_threshold=0.75,
)

answer = mh.query("ما هي أسباب الثورة الفرنسية وما نتائجها؟")
print(answer)

Example 2 — Full reasoning trace

result = mh.query(
    "Compare the economic impacts of World War I and World War II on Europe",
    return_intermediate=True,
)

print("=" * 60)
print("ANSWER")
print("=" * 60)
print(result["answer"])

print("\n" + "=" * 60)
print("REASONING CHAIN")
print("=" * 60)
for step in result["reasoning_chain"]:
    print("•", step)

print("\n" + "=" * 60)
print("HOP DETAILS")
print("=" * 60)
for hop in result["hops"]:
    print(f"\n[Hop {hop['hop_number']}] Strategy: {hop['strategy']}")
    print(f"  Query:      {hop['query']}")
    print(f"  Chunks:     {hop['chunks_found']}")
    print(f"  Entities:   {hop['entities']}")
    print(f"  Top scores: {[f'{s:.3f}' for s in hop['top_scores']]}")
    print(f"  Gap filled: {hop['gap_filled']}")
    print(f"  Reasoning:  {hop['reasoning_step']}")

print("\n" + "=" * 60)
print("KNOWLEDGE STATE")
print("=" * 60)
ks = result["knowledge_state"]
print(f"Confidence:    {ks['confidence']:.0%}")
print(f"Known facts:   {len(ks['known_facts'])}")
print(f"Missing info:  {ks['missing_info'] or 'None'}")
print(f"Total chunks:  {result['total_chunks']}")

Example 3 — Async usage in FastAPI

from fastapi import FastAPI, Query
from fennec_community.rag.types.multi_hop import MultiHopRAG

app = FastAPI()

mh = MultiHopRAG(
    rag_system=my_rag,
    max_hops=3,
    language="ar",
    confidence_threshold=0.75,
)

@app.get("/query")
async def answer_question(
    q:     str,
    hops:  int  = Query(default=None, ge=1, le=5),
    trace: bool = False,
):
    result = await mh.aquery(q, max_hops=hops, return_intermediate=trace)
    if trace:
        return result
    return {"answer": result}


@app.get("/stats")
def system_stats():
    return mh.get_stats()

Example 4 — Standalone QueryDecomposer

from fennec_community.rag.types.multi_hop import QueryDecomposer

# Arabic, NER-powered
decomposer = QueryDecomposer(use_stanza=True, language="ar")

# Decompose
parts = decomposer.decompose("من هو ابن سينا ثم ما أهم مؤلفاته؟")
print(parts)
# ["من هو ابن سينا", "ما أهم مؤلفاته؟"]

# Extract entities (flat, ranked)
entities = decomposer.extract_entities(
    "أسس جيف بيزوس شركة أمازون عام 1994 في مدينة سياتل الأمريكية."
)
print(entities)
# ["جيف بيزوس", "أمازون", "سياتل", "1994"]

# Extract entities by type (requires Stanza)
typed = decomposer.get_entities_by_type(
    "أسس جيف بيزوس شركة أمازون عام 1994 في مدينة سياتل الأمريكية."
)
print(typed)
# {"PERS": ["جيف بيزوس"], "ORG": ["أمازون"], "LOC": ["سياتل"], "DATE": ["1994"]}

Example 5 — Monitoring and observability

import time

mh = MultiHopRAG(rag_system=my_rag, max_hops=3, confidence_threshold=0.75)

questions = [
    "What is machine learning?",
    "Why did the Ottoman Empire fall?",
    "Compare GPU and CPU for deep learning",
    "When was the Eiffel Tower built and by whom?",
]

start = time.time()
for q in questions:
    answer = mh.query(q)
    print(f"Q: {q[:50]}")
    print(f"A: {answer[:100]}...\n")

elapsed = time.time() - start
stats = mh.get_stats()

print("\n--- System Stats ---")
print(f"Total queries:       {stats['total_queries']}")
print(f"Average hops/query:  {stats['average_hops']:.2f}")
print(f"Early stopping rate: {stats['early_stops']}/{stats['total_queries']} "
      f"({stats['early_stops']/stats['total_queries']:.0%})")
print(f"Total time:          {elapsed:.1f}s")
print(f"Avg time/query:      {elapsed/stats['total_queries']:.1f}s")
print(f"NER method:          {stats['ner_method']}")

Example 6 — Custom hop count per question type

from fennec_community.rag.types.multi_hop import MultiHopRAG

mh = MultiHopRAG(rag_system=my_rag, max_hops=5, confidence_threshold=0.75)

# Simple factual — 1 hop is enough
simple = mh.query("What is the capital of Japan?", hops=1)

# Causal — needs 2-3 hops
causal = mh.query("Why did the 2008 financial crisis happen?", hops=3)

# Complex comparison — use maximum hops
comparison = mh.query(
    "Compare the AI research output of the US and China between 2020 and 2024",
    hops=5,
    return_intermediate=True,
)
print("Comparison answer confidence:", comparison["knowledge_state"]["confidence"])

Example 7 — Async context manager

import asyncio
from fennec_community.rag.types.multi_hop import MultiHopRAG

async def batch_query(questions: list[str]) -> list[str]:
    async with MultiHopRAG(rag_system=my_rag, max_hops=3) as mh:
        results = []
        for q in questions:
            answer = await mh.aquery(q, return_intermediate=False)
            results.append(answer)
        return results

answers = asyncio.run(batch_query([
    "What is quantum computing?",
    "Why is the sky blue?",
    "When was the first iPhone released?",
]))

for q, a in zip(questions, answers):
    print(f"Q: {q}\nA: {a}\n")

Simple Real Example

from fennec_community.llm import MistralInterface
from fennec_community.document_loaders import TextLoader 
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem 
from fennec_community.rag.types.multi_hop import MultiHopRAG       

loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = MistralInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)


rag_system.add_documents(loader_1)

multi_hop = MultiHopRAG(
    rag_system=rag_system,
    max_hops=3,
    confidence_threshold=0.75,
    language="ar",
)

answer = multi_hop.query("ماهي طرق الدفع المتاحة؟")
print(answer)

Source: community/rag/multi_hop.md

Table of Contents

Overview

Architecture

Quick Start

Enumeration: HopStrategy

Data Models

ReasoningState

HopResult

Class: QueryDecomposer

__init__ (QueryDecomposer)

decompose

extract_entities

get_entities_by_type

Class: MultiHopRAG

__init__ (MultiHopRAG)

query

aquery

get_stats

Return Value Reference

Simple mode (return_intermediate=False)

Full trace mode (return_intermediate=True)

Per-hop dict (inside "hops")

Reasoning Pipeline — Internal Flow

Hop Strategy Selection Logic

Confidence & Early Stopping Model

NER Engine: Stanza vs Regex Fallback

RAG System Interface Contract

Required — .retrieve(query)

Optional but recommended — .llm.generate(prompt, **kwargs)

Language Detection & Prompt Templates

Complete Examples

Example 1 — Basic multi-hop query

Example 2 — Full reasoning trace

Example 3 — Async usage in FastAPI

Example 4 — Standalone QueryDecomposer

Example 5 — Monitoring and observability

Example 6 — Custom hop count per question type

Example 7 — Async context manager

Simple Real Example

Enumeration: `HopStrategy`

`ReasoningState`

`HopResult`

Class: `QueryDecomposer`

`init` (QueryDecomposer)

`decompose`

`extract_entities`

`get_entities_by_type`

Class: `MultiHopRAG`

`init` (MultiHopRAG)

`query`

`aquery`

`get_stats`

Simple mode (`return_intermediate=False`)

Full trace mode (`return_intermediate=True`)

Per-hop dict (inside `"hops"`)

Required — `.retrieve(query)`

Optional but recommended — `.llm.generate(prompt, **kwargs)`