Fennec Logo Fennec
Fennec Community community/rag/multi_hop.md

MultiHop-RAG Modular `multi_hop` — Enterprise API Reference


Table of Contents

  1. Overview
  2. Architecture
  3. Quick Start
  4. Enumeration: HopStrategy
  5. Data Models
  6. Class: QueryDecomposer
  7. Class: MultiHopRAG
  8. Return Value Reference
  9. Reasoning Pipeline — Internal Flow
  10. Hop Strategy Selection Logic
  11. Confidence & Early Stopping Model
  12. NER Engine: Stanza vs Regex Fallback
  13. RAG System Interface Contract
  14. Language Detection & Prompt Templates
  15. Complete Examples

Overview

multi_hop is a Reasoning-Guided Multi-Hop RAG layer that wraps any existing RAG system and gives it the ability to answer complex, multi-faceted questions through iterative, self-directed retrieval. Instead of a single retrieve-then-answer pass, MultiHopRAG plans its search at every step, tracks what it has learned so far, identifies knowledge gaps, and adaptively chooses the next query — stopping early as soon as it has collected sufficient evidence.

Key capabilities at a glance:

Capability Detail
Reasoning-guided hops Each hop asks "What do I know? What am I still missing?" before planning the next search
Smart early stopping Stops automatically when confidence ≥ threshold — no wasted hops
Query decomposition Splits composite questions into sequential sub-queries
NER-powered entity extraction Uses Stanza NER (Arabic & English) with transparent regex fallback
Adaptive strategy selection Chooses ENTITY_EXPANSION, RELATION_BRIDGING, CLARIFICATION, or VERIFICATION per hop
Confidence scoring Quantitative confidence track per hop, used for early stopping
Bilingual prompting Auto-detects Arabic vs English and selects the correct strict-grounding prompt
Async support Non-blocking aquery for FastAPI / asyncio pipelines
Intermediate results Full reasoning chain, hop details, and knowledge state accessible on demand

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                          MultiHopRAG                                  │
│                                                                        │
│  ┌───────────────────┐        ┌──────────────────────────────────┐    │
│  │  QueryDecomposer  │        │        ReasoningState            │    │
│  │  (Stanza / Regex) │        │  known_facts · missing_info      │    │
│  └────────┬──────────┘        │  reasoning_chain · confidence    │    │
│           │ decompose()       └──────────────┬───────────────────┘    │
│           │                                  │ updated every hop      │
│           ▼                                  │                        │
│  ┌────────────────────────────────────────────────────────────────┐   │
│  │                   Reasoning Hop Loop                            │   │
│  │                                                                  │   │
│  │  ① _analyze_question_requirements  → question type + slots     │   │
│  │  ② _select_initial_strategy        → HopStrategy               │   │
│  │  ③ rag.retrieve(query)             → raw chunks                 │   │
│  │  ④ _filter_chunks                  → deduplicated + scored      │   │
│  │  ⑤ _extract_entities_from_chunks  → NER entities               │   │
│  │  ⑥ _update_reasoning_state        → facts + gaps + confidence  │   │
│  │  ⑦ Early stop check               → break if answer_complete   │   │
│  │  ⑧ _plan_next_hop                 → next query + strategy      │   │
│  └────────────────────────────────────────────────────────────────┘   │
│           │                                                            │
│           ▼                                                            │
│  _aggregate_chunks → _build_reasoned_context → _build_reasoned_prompt │
│           │                                                            │
│           ▼                                                            │
│        rag.llm.generate(prompt)  →  Final Answer                      │
└──────────────────────────────────────────────────────────────────────┘

Quick Start

from fennec_community.rag.types.multi_hop import MultiHopRAG, HopStrategy

# Wrap any RAG system that exposes .retrieve() and .llm.generate()
mh = MultiHopRAG(
    rag_system=my_rag,
    max_hops=3,
    language="ar",
    confidence_threshold=0.75,
)

# Simple answer
answer = mh.query("What caused the French Revolution and what were its results?")
print(answer)

# Full reasoning trace
result = mh.query(
    "Compare the economic policies of Germany and France after 2008",
    return_intermediate=True,
)
print(result["answer"])
print(result["reasoning_chain"])

# Async usage
answer = await mh.aquery("Why did the Ottoman Empire collapse?")

Enumeration: HopStrategy

from fennec_community.rag.types.multi_hop import HopStrategy

Controls the search strategy used at each individual hop. The system selects the strategy automatically based on the question type and the current knowledge state, but you can inspect it in intermediate results.

class HopStrategy(Enum):
    ENTITY_EXPANSION  = "entity_expansion"
    RELATION_BRIDGING = "relation_bridging"
    CLARIFICATION     = "clarification"
    VERIFICATION      = "verification"
Member Value When used Description
ENTITY_EXPANSION "entity_expansion" Default first hop; when predefined sub-queries are available Expands the search around known entities extracted from the query or previous chunks. The most general strategy.
RELATION_BRIDGING "relation_bridging" Comparison questions; when only one of two required entities has been found Bridges two entities or concepts by constructing a query that connects them. Used to close the gap in comparative reasoning.
CLARIFICATION "clarification" When a clear knowledge gap is identified; causal questions Builds a targeted query aimed at filling a specific named gap — e.g., missing causality, missing date, or missing entity info. Also used for why / cause type questions. Gets a +0.1 score bonus during chunk aggregation.
VERIFICATION "verification" Reserved Planned for cross-checking facts from multiple sources. Not yet auto-selected; available for manual use in future extensions.

Data Models

ReasoningState

from fennec_community.rag.types.multi_hop import ReasoningState

A @dataclass that tracks the accumulated knowledge state across all hops for a single query. Updated after every hop. Passed into the final prompt builder to give the LLM full situational awareness.

@dataclass
class ReasoningState:
    known_facts:     List[str] = []     # Discovered relevant sentences
    missing_info:    List[str] = []     # Named knowledge gaps still open
    reasoning_chain: List[str] = []     # Per-hop reasoning step summaries
    confidence:      float     = 0.0    # Current answer completeness estimate [0, 1]
    answer_complete: bool      = False  # True when confidence ≥ threshold and facts > 0
Field Type Description
known_facts List[str] List of relevant sentences extracted from retrieved chunks so far. Grows with each hop.
missing_info List[str] Named knowledge gaps that have not yet been filled (e.g., "causal relationship", "specific date or time", "info about France"). Updated after each hop.
reasoning_chain List[str] Human-readable summary of what each hop searched for, how many facts it found, the current confidence, and what's still missing.
confidence float Quantitative estimate of answer completeness in [0.0, 1.0]. Computed as min(facts/5, 0.85) − (gaps × 0.15).
answer_complete bool True when confidence ≥ confidence_threshold AND known_facts is non-empty, or when there are no remaining gaps. Triggers early stopping.

Usage (read-only — managed by MultiHopRAG):

result = mh.query("Why did Rome fall?", return_intermediate=True)

state = result["knowledge_state"]
print("Known facts:", state["known_facts"])
print("Missing info:", state["missing_info"])
print("Confidence:", state["confidence"])

HopResult

from fennec_community.rag.types.multi_hop import HopResult

A @dataclass capturing everything that happened during one individual hop — which query was used, what was found, the strategy applied, and what gap (if any) was filled.

@dataclass
class HopResult:
    query:               str               # The search query used for this hop
    chunks:              List[Tuple]        # Retrieved (chunk, score) pairs
    hop_number:          int                # 1-based hop index
    extracted_entities:  List[str]          # Entities extracted from this hop's chunks
    strategy:            HopStrategy        # Strategy used for this hop
    reasoning_step:      str               # Human-readable summary of this hop
    gap_filled:          Optional[str]      # The knowledge gap this hop targeted, if any
Field Type Description
query str The exact query string that was sent to the RAG backend for this hop.
chunks List[Tuple] List of (chunk, score) pairs returned from the RAG backend after deduplication and score filtering.
hop_number int 1-based index of this hop within the current query's hop sequence.
extracted_entities List[str] Up to 5 unique entities extracted from the top 3 chunks of this hop. Used to plan the next hop.
strategy HopStrategy The HopStrategy member that governed how this hop's query was constructed.
reasoning_step str Formatted string: "hop N: looking for '…'. found M facts. confidence: X%. missing: …".
gap_filled Optional[str] If this hop was a gap-filling hop, the name of the gap it targeted (from ReasoningState.missing_info[0]). None for expansion hops.

Usage (read-only — produced by MultiHopRAG):

result = mh.query("Who invented the telephone and when?", return_intermediate=True)

for hop in result["hops"]:
    print(f"Hop {hop['hop_number']} [{hop['strategy']}]: {hop['query']}")
    print(f"  Chunks: {hop['chunks_found']}  Entities: {hop['entities']}")
    print(f"  Scores: {hop['top_scores']}")
    print(f"  Gap filled: {hop['gap_filled']}")

Class: QueryDecomposer

from fennec_community.rag.types.multi_hop import QueryDecomposer

A bilingual (Arabic / English) query analysis engine that provides two core capabilities: decomposing composite questions into sequential sub-queries, and extracting named entities from text using either Stanza NER (when installed) or a regex fallback.

Can be used independently of MultiHopRAG for standalone NLP preprocessing tasks.


__init__ (QueryDecomposer)

QueryDecomposer(
    use_stanza: bool = True,
    language:   str  = "ar",
)

Purpose: Initialises the query decomposer and attempts to load a Stanza NLP pipeline for the specified language. If Stanza is unavailable or fails to load, the instance transparently falls back to regex-based extraction — no exception is raised.

Parameters:

Parameter Type Default Description
use_stanza bool True Whether to attempt loading a Stanza NLP pipeline. Set to False to force regex-only mode (faster startup, no model download needed).
language str "ar" Language code for the Stanza pipeline. Supported: "ar" (Arabic), "en" (English), and any other Stanza-supported language code.

Returns: QueryDecomposer instance.

NER mode resolution at startup:

use_stanza=True  AND  stanza installed  AND  model loads  →  Stanza NER (best accuracy)
use_stanza=True  AND  stanza installed  AND  model fails  →  Regex fallback (logged warning)
use_stanza=True  AND  stanza NOT installed               →  Regex fallback (logged warning)
use_stanza=False                                          →  Regex fallback (explicit)

Installing Stanza for Arabic:

pip install stanza
python -c "import stanza; stanza.download('ar')"

Example:

# Best accuracy (requires stanza + model)
decomposer = QueryDecomposer(use_stanza=True, language="ar")

# English, Stanza NER
decomposer_en = QueryDecomposer(use_stanza=True, language="en")

# Fast startup, no model required
decomposer_fast = QueryDecomposer(use_stanza=False)

decompose

decomposer.decompose(query: str) -> List[str]

Purpose: Splits a composite question into a list of sequential sub-queries by scanning for language-specific composite indicator keywords (e.g., "ثم", "وأيضاً", "then", "also", "because", "لماذا"). Supports Arabic in all major dialects (Modern Standard Arabic, Egyptian, Levantine, Gulf) as well as English.

If no composite indicator is found, the original query is returned as a single-element list — the method always returns a list.

Parameters:

Parameter Type Description
query str The natural-language question to analyse.

Returns: List[str] — either a list of two sub-query strings (when a composite indicator is found), or [query] (when no indicator is found and the question is treated as atomic).

Supported indicator categories:

Category Examples (Arabic) Examples (English)
Sequential "ثم", "بعد ذلك", "ومن ثم" "then", "after that"
Additive "وأيضاً", "كذلك", "بالإضافة" "also", "in addition"
Causal "لماذا", "ما السبب" "why", "cause"
Conditional "إذا", "في حالة" "if", "in case"
Comparative "مقارنةً بـ", "بينما" "while", "compared to"
Procedural "كيف", "ما الطريقة" "how"
Dialectal (EG) "وبعدين", "كمان", "علشان كده"
Dialectal (Levantine) "عقبها", "بعد هيك", "ليش"
Dialectal (Gulf) "عقب هيك", "شلون", "عشان هيك"

Example:

# Composite question → two sub-queries
parts = decomposer.decompose("من هو أينشتاين ثم ما أهم اكتشافاته؟")
# ["من هو أينشتاين", "ما أهم اكتشافاته؟"]

# Atomic question → single element
parts = decomposer.decompose("What is the capital of France?")
# ["What is the capital of France?"]

extract_entities

decomposer.extract_entities(text: str) -> List[str]

Purpose: Extracts the most important named entities from a piece of text. Automatically uses Stanza NER when available (produces typed, scored entities), or falls back to regex-based extraction (quoted strings, English proper nouns, and numbers). Always returns the top 10 entities at most.

This is the primary NER method used by MultiHopRAG both during question analysis and during chunk processing at each hop.

Parameters:

Parameter Type Description
text str Any text: a query string, a retrieved chunk, or a document sentence. Texts shorter than 3 characters are returned as an empty list immediately.

Returns: List[str] — list of entity strings, sorted by importance score descending (Stanza mode) or by length descending (regex mode). Maximum 10 entries.

Stanza mode — entity importance scoring formula:

score = entity_type_weight            # base weight by type (see table below)
      + (len(entity_text) / 10)      # longer = more specific
      + (count_in_text × 0.5)        # frequency bonus
      + 1.0 (if contains English chars)
      + 0.5 (if contains digits)

Entity type weights (Stanza mode):

Type Weight Meaning
PERS / PER 3.0 Persons — highest priority
ORG 2.5 Organizations
LOC / GPE 2.0 Locations / Geo-political entities
DATE / MONEY 1.5 Dates and monetary amounts
QUANTITY 1.3 Quantities
TIME / PERCENT 1.2 Times and percentages
CARDINAL 1.1 Cardinal numbers
ORDINAL 1.0 Ordinal numbers

Regex mode — extraction rules:

Priority Pattern Example
1 Quoted text "…" or «…» "الثورة الفرنسية"
2 English TitleCase words "Napoleon Bonaparte"
3 Numeric tokens "1789", "95"

Results from regex mode are deduplicated and sorted by string length (longer = more specific).

Example:

# With Stanza (Arabic)
entities = decomposer.extract_entities(
    "ولد نابليون بونابرت عام 1769 في جزيرة كورسيكا وأصبح إمبراطوراً لفرنسا."
)
# ["نابليون بونابرت", "كورسيكا", "فرنسا", "1769"]

# Without Stanza (regex fallback)
entities = decomposer.extract_entities(
    'The "Eiffel Tower" was built in 1889 by Gustave Eiffel in Paris.'
)
# ["Eiffel Tower", "Gustave Eiffel", "Paris", "1889"]

get_entities_by_type

decomposer.get_entities_by_type(text: str) -> Dict[str, List[str]]

Purpose: Extracts entities from text and returns them grouped by their Stanza NER type rather than as a flat ranked list. Use this for fine-grained analysis — e.g., to separately access all persons, all locations, and all dates mentioned in a text.

⚠️ Requires Stanza: Returns an empty dictionary {} when Stanza is unavailable or failed to load.

Parameters:

Parameter Type Description
text str The text to extract typed entities from.

Returns: Dict[str, List[str]] — keys are Stanza NER type labels (e.g., "PERS", "LOC", "ORG", "DATE"), values are deduplicated lists of entity strings of that type. Returns {} if Stanza is not available or an error occurs.

Example:

typed = decomposer.get_entities_by_type(
    "Albert Einstein was born in Ulm, Germany in 1879 and worked at Princeton University."
)
# {
#   "PERS": ["Albert Einstein"],
#   "LOC":  ["Ulm", "Germany"],
#   "ORG":  ["Princeton University"],
#   "DATE": ["1879"],
# }

# Downstream usage example:
persons = typed.get("PERS", [])
dates   = typed.get("DATE", [])

Class: MultiHopRAG

from fennec_community.rag.types.multi_hop import MultiHopRAG

The central orchestrator. Wraps any existing RAG backend and executes reasoning-guided iterative retrieval to answer complex questions that require synthesising information from multiple sources or reasoning steps.


__init__ (MultiHopRAG)

MultiHopRAG(
    rag_system:                  Any,
    max_hops:                    int   = 3,
    min_score_threshold:         float = 0.3,
    enable_query_decomposition:  bool  = True,
    use_stanza_ner:              bool  = True,
    language:                    str   = "ar",
    confidence_threshold:        float = 0.75,
)

Purpose: Initialises the Multi-Hop RAG system. Validates that a RAG system is provided, creates the QueryDecomposer, and initialises the internal statistics counter.

Parameters:

Parameter Type Default Description
rag_system Any Required. Any RAG backend that exposes .retrieve(query) -> List and .llm.generate(prompt, **kwargs) -> str. See RAG System Interface Contract.
max_hops int 3 Maximum number of retrieval hops to perform per query. Acts as a hard upper bound — the system may stop earlier via confidence-based early stopping.
min_score_threshold float 0.3 Minimum similarity/relevance score for a chunk to be accepted. Chunks with score below this value are filtered out at every hop.
enable_query_decomposition bool True When True, composite questions are split into sub-queries by QueryDecomposer.decompose() before the hop loop begins. Each sub-query is used for one hop in order.
use_stanza_ner bool True Whether to use Stanza NER for entity extraction. Passed directly to QueryDecomposer. Set False for fast startup or when Stanza is not installed.
language str "ar" Language code for Stanza. Also controls the final LLM prompt language selection (auto-detected from the question text, so this primarily affects NER).
confidence_threshold float 0.75 Confidence level at which early stopping is triggered. When reasoning_state.confidence ≥ threshold and at least one fact has been found, the hop loop terminates early, saving unnecessary network calls.

Raises: ValueError if rag_system is None.

Internal stats initialised at construction:

Key Initial value Description
total_queries 0 Total calls to query() / aquery().
average_hops 0 Rolling average number of hops executed per query.
decomposed_queries 0 Queries that were split into sub-queries.
entities_extracted 0 Cumulative count of entities extracted across all hops.
early_stops 0 Number of queries where early stopping was triggered.
stanza_enabled bool Whether Stanza NER is active.

Example:

from fennec_community.rag.types.multi_hop import MultiHopRAG

mh = MultiHopRAG(
    rag_system=my_rag,         # Required: any RAG backend
    max_hops=4,
    min_score_threshold=0.25,
    enable_query_decomposition=True,
    use_stanza_ner=True,
    language="ar",
    confidence_threshold=0.75,
)

query

mh.query(
    question:            str,
    hops:                Optional[int] = None,
    return_intermediate: bool          = False,
) -> str | Dict[str, Any]

Purpose: The primary synchronous query method. Executes the full multi-hop reasoning pipeline end-to-end: question analysis → query decomposition → iterative retrieval hops → chunk aggregation → reasoned answer generation. Returns either a plain answer string or a rich dict with the full reasoning trace, depending on return_intermediate.

Parameters:

Parameter Type Default Description
question str The natural-language question to answer. Returns a polite prompt string if empty or whitespace-only.
hops Optional[int] None Override the instance-level max_hops for this specific query. When None, the instance's max_hops is used.
return_intermediate bool False When False, returns a plain answer string. When True, returns a full Dict with answer, reasoning chain, per-hop details, knowledge state, chunk count, and stats.

Returns:

  • str — the LLM-generated answer string when return_intermediate=False.
  • Dict[str, Any] — full reasoning trace dict when return_intermediate=True. See Return Value Reference.
  • str starting with "Please enter..." when question is empty or whitespace.
  • str starting with " Error: ..." when an unrecoverable internal exception occurs.

Execution pipeline (6 stages):

_analyze_question_requirements()  → question type, need flags, seed entities
② _prepare_queries()                → [sub_query_1, sub_query_2, ...]_perform_reasoning_hops()         → List[HopResult], ReasoningState
④ _aggregate_chunks()               → ranked, deduplicated (chunk, score) list
⑤ _generate_reasoned_answer()       → context + prompt + LLM call
⑥ _update_stats()                   → rolling average hops

Example — simple:

answer = mh.query("What were the causes of World War I?")
print(answer)

Example — with hop override:

# Use only 1 hop for a simple factual question
answer = mh.query("What is the capital of France?", hops=1)

Example — with full trace:

result = mh.query(
    "Compare the industrial revolutions of Britain and Germany",
    return_intermediate=True,
)

print("Answer:", result["answer"])
print()
print("Reasoning chain:")
for step in result["reasoning_chain"]:
    print(" •", step)
print()
print("Hops executed:", len(result["hops"]))
for hop in result["hops"]:
    print(f"  Hop {hop['hop_number']} [{hop['strategy']}]: {hop['query']}")
    print(f"    Chunks: {hop['chunks_found']}  Top scores: {hop['top_scores']}")
    print(f"    Entities: {hop['entities']}")
    print(f"    Gap filled: {hop['gap_filled']}")
print()
print("Confidence:", result["knowledge_state"]["confidence"])
print("Known facts:", len(result["knowledge_state"]["known_facts"]))
print("Total chunks:", result["total_chunks"])

aquery

async def aquery(
    question:            str,
    max_hops:            Optional[int] = None,
    return_intermediate: bool          = False,
) -> str | Dict[str, Any]

Purpose: Async version of query(). Runs the synchronous query() in a thread-pool executor via asyncio.to_thread, ensuring the event loop is never blocked. Drop-in replacement for query() in async frameworks.

Parameters:

Parameter Type Default Description
question str The natural-language question to answer.
max_hops Optional[int] None Per-call hop override. Maps to the hops parameter of query().
return_intermediate bool False When True, returns the full reasoning trace dict instead of a plain answer string.

Returns: Same as query()str or Dict[str, Any].

Context manager support: MultiHopRAG implements __aenter__ / __aexit__, so it can be used as an async context manager:

async with MultiHopRAG(rag_system=my_rag) as mh:
    answer = await mh.aquery("Why did the Roman Empire fall?")
    print(answer)

FastAPI example:

from fastapi import FastAPI
from fennec_community.rag.types.multi_hop import MultiHopRAG

app = FastAPI()
mh = MultiHopRAG(rag_system=my_rag, max_hops=3, language="en")

@app.get("/ask")
async def ask(q: str, hops: int = None, trace: bool = False):
    result = await mh.aquery(q, max_hops=hops, return_intermediate=trace)
    if trace:
        return result          # dict with full trace
    return {"answer": result}  # plain string

@app.get("/ask/arabic")
async def ask_arabic(q: str):
    return {"answer": await mh.aquery(q)}

get_stats

mh.get_stats() -> Dict[str, Any]

Purpose: Returns a snapshot of all operational statistics accumulated since the MultiHopRAG instance was created. Use for monitoring query throughput, average hop count, decomposition rate, entity extraction volume, early stopping frequency, and NER engine status.

Parameters: None.

Returns: Dict[str, Any] with the following keys:

Key Type Description
total_queries int Total number of queries processed by this instance.
average_hops float Rolling average number of hops executed per query. Decreases as early stopping kicks in more often.
decomposed_queries int Number of queries that were split into multiple sub-queries by QueryDecomposer.decompose().
entities_extracted int Cumulative count of named entities extracted across all hops and all queries.
early_stops int Number of queries where the hop loop terminated early due to sufficient confidence. High values indicate efficient operation.
stanza_enabled bool True if Stanza NER loaded successfully; False if regex fallback is in use.
max_hops int The configured max_hops value for this instance.
confidence_threshold float The configured early-stopping threshold.
ner_method str Human-readable NER method label: "Stanza" or "Regex".

Example:

# After running several queries
stats = mh.get_stats()

print(f"Total queries:       {stats['total_queries']}")
print(f"Average hops:        {stats['average_hops']:.2f}")
print(f"Early stops:         {stats['early_stops']} "
      f"({stats['early_stops']/max(stats['total_queries'],1):.0%} of queries)")
print(f"Decomposed queries:  {stats['decomposed_queries']}")
print(f"Entities extracted:  {stats['entities_extracted']}")
print(f"NER engine:          {stats['ner_method']}")
print(f"Confidence threshold:{stats['confidence_threshold']}")

Output example:

Total queries:        47
Average hops:         1.87
Early stops:          31 (66% of queries)
Decomposed queries:   12
Entities extracted:   284
NER engine:           Stanza
Confidence threshold: 0.75

Return Value Reference

Simple mode (return_intermediate=False)

Returns a str — the LLM-generated answer grounded strictly in the retrieved context. Auto-detected language (Arabic or English) matches the question language.

Full trace mode (return_intermediate=True)

Returns Dict[str, Any] with the following structure:

{
    "answer": str,                      # LLM-generated final answer

    "reasoning_chain": List[str],       # Per-hop summaries, e.g.:
    # ["hop 1: looking for '…'. found 4 facts. confidence: 60%. missing: causal relationship.",
    #  "hop 2: looking for '…'. found 6 facts. confidence: 80%. missing: none."]

    "hops": List[Dict],                 # Per-hop details (see below)

    "knowledge_state": {
        "known_facts": List[str],       # All relevant sentences discovered
        "missing_info": List[str],      # Remaining knowledge gaps (empty = complete)
        "confidence": float,            # Final confidence score [0.0, 1.0]
    },

    "total_chunks": int,                # Total unique chunks collected across all hops

    "stats": Dict,                      # Output of get_stats() at this moment
}

Per-hop dict (inside "hops")

Each hop is represented as:

{
    "hop_number":   int,          # 1-based hop index
    "query":        str,          # Query sent to the RAG backend
    "strategy":     str,          # HopStrategy value string
    "reasoning_step": str,        # Human-readable summary of this hop
    "gap_filled":   Optional[str],# Gap this hop targeted, or None
    "chunks_found": int,          # Number of accepted chunks
    "entities":     List[str],    # Entities extracted from top chunks
    "top_scores":   List[float],  # Relevance scores of top 3 chunks
}

Reasoning Pipeline — Internal Flow

Understanding the full internal pipeline helps when debugging, tuning thresholds, or extending the system.

query(question)
│
├─ 1. _analyze_question_requirements(question)
│      ├─ _classify_question_type()       → "comparison" | "causal" | "temporal" | "procedural" | "factual"
│      ├─ Flag: needs_comparison          → any of [مقارنة, compare, difference, ...]
│      ├─ Flag: needs_causality           → any of [لماذا, why, cause, ...]
│      ├─ Flag: needs_timeline            → any of [متى, when, date, ...]
│      ├─ Flag: needs_multi_entity        → any of [و, مع, and, both, ...]
│      └─ entities                        → QueryDecomposer.extract_entities(question)
│
├─ 2. _prepare_queries(question)
│      └─ decompose(question)             → ["sub_q1", "sub_q2"] or ["question"]
│
├─ 3. _perform_reasoning_hops(question, sub_queries, num_hops, requirements)
│      └─ For each hop:
│           ├─ rag.retrieve(current_query)
│           ├─ _filter_chunks()           → deduplicate + score filter
│           ├─ _extract_entities_from_chunks()   → top-5 entities from top-3 chunks
│           ├─ _extract_facts_from_chunks()      → relevant sentences
│           ├─ _update_reasoning_state()
│           │    ├─ known_facts += new facts
│           │    ├─ missing_info = _identify_missing_info()
│           │    ├─ confidence  = _calculate_confidence()
│           │    └─ answer_complete = confidence ≥ threshold AND facts > 0
│           ├─ [EARLY STOP if answer_complete]
│           └─ _plan_next_hop()           → next (query, strategy)
│
├─ 4. _aggregate_chunks(hop_results)
│      └─ weight = (1 / hop_number) + (0.1 if CLARIFICATION)
│         Sort by weighted_score → deduplicate by chunk_id
│
├─ 5. _generate_reasoned_answer(question, chunks, hop_results, reasoning_state)
│      ├─ _build_reasoned_context()       → reasoning chain + facts + gaps + sources
│      ├─ _build_reasoned_prompt()        → language-aware strict-grounding prompt
│      └─ rag.llm.generate(prompt, max_tokens=512)
│
└─ 6. _update_stats(num_hops)

Hop Strategy Selection Logic

The system selects strategies automatically using this priority chain:

Initial strategy:
  needs_comparison  → RELATION_BRIDGING
  needs_causality   → CLARIFICATION
  otherwise         → ENTITY_EXPANSION

Next hop strategy (_plan_next_hop):
  Priority 1: Use next pre-decomposed sub-query → ENTITY_EXPANSION
  Priority 2: missing_info is non-empty         → CLARIFICATION  (gap-filling query)
  Priority 3: comparison + missing entity       → RELATION_BRIDGING (bridge query)
  Priority 4: entities available                → RELATION_BRIDGING (bridge query)
  Priority 5: no path found                     → return None (stop hopping)

Confidence & Early Stopping Model

The confidence score is computed after every hop:

base_confidence = min(len(known_facts) / 5.0 , 0.85)
gap_penalty     = len(missing_info) × 0.15
confidence      = max(0.0, base_confidence - gap_penalty)

# Bonus: if no gaps and ≥ 3 facts
if not missing_info and len(known_facts) >= 3:
    confidence = max(confidence, 0.8)

confidence = min(confidence, 1.0)

Early stopping fires when:

answer_complete = (confidence >= confidence_threshold AND len(known_facts) > 0)
              OR  (len(missing_info) == 0 AND len(known_facts) > 0)

Confidence progression example (3-fact question, threshold = 0.75):

Hop Facts found Gaps Confidence Stop?
1 2 1 min(2/5, 0.85) − 0.15 = 0.25 No
2 4 1 min(4/5, 0.85) − 0.15 = 0.65 No
3 5 0 max(0.85, 0.8) = 0.85 Yes

NER Engine: Stanza vs Regex Fallback

Feature Stanza NER Regex Fallback
Entity types Full typed (PERS, ORG, LOC, DATE, …) Untyped (quoted, CamelCase, numbers)
Importance scoring Multi-factor (type weight + length + frequency + script + digits) Length only
Arabic support Full morphological analysis Quoted text and numbers only
English support Full NER CamelCase proper nouns
get_entities_by_type ✅ Full type grouping ❌ Returns {}
Installation pip install stanza + model download No extras needed
Startup time ~5–15 s (model load) Instant
Runtime accuracy High Low–Medium

Recommendation: Use Stanza in production for Arabic text. The regex fallback is adequate for English queries in non-critical or development environments.


RAG System Interface Contract

MultiHopRAG wraps any RAG object that satisfies this interface:

Required — .retrieve(query)

def retrieve(query: str) -> List[Tuple[chunk, float]] | List[chunk]:
    """
    Retrieve relevant chunks for a query.

    Returns either:
    - List of (chunk, score) tuples   ← preferred
    - List of chunk objects            ← score defaults to 1.0

    Each chunk object must have:
    - chunk.chunk_id : str    (unique identifier — chunks without this are silently skipped)
    - chunk.text     : str    (text content for context building)
    - chunk.doc_id   : str    (document identifier for context headers)
    """
def generate(prompt: str, max_tokens: int = 512, **kwargs) -> str:
    """
    Generate an answer from a prompt.
    Required for answer generation. Without it, query() returns
    "Language model not available".
    """

Minimal compatible RAG example:

from dataclasses import dataclass

@dataclass
class Chunk:
    chunk_id: str
    text:     str
    doc_id:   str

class MinimalRAG:
    def __init__(self):
        self.llm = self   # expose .llm

    def retrieve(self, query: str):
        # Return (chunk, score) tuples
        return [(Chunk("c1", "Paris is the capital of France.", "doc_1"), 0.92)]

    def generate(self, prompt: str, max_tokens: int = 512, **kwargs) -> str:
        return "Answer based on context..."

rag = MinimalRAG()
mh = MultiHopRAG(rag_system=rag)

Language Detection & Prompt Templates

The final LLM prompt is built by _build_reasoned_prompt(). Language is auto-detected from the question text by counting Arabic Unicode characters (\u0600–\u06FF):

arabic_ratio = arabic_chars / len(question)
language     = "ar" if arabic_ratio > 0.2 else "en"

Both prompt templates include:

  1. Reasoning chain — all per-hop summaries.
  2. Discovered facts — up to 8 relevant sentences.
  3. Missing info warnings — explicit gap list when gaps remain.
  4. Supporting sources — up to 8 top chunks with scores and doc IDs.
  5. Low-confidence warning — appended to the question when confidence < 0.75.
  6. Strict grounding rules — no inference beyond the provided context; explicit instruction to say "not available" when information is absent.

Complete Examples

Example 1 — Basic multi-hop query

from fennec_community.rag.types.multi_hop import MultiHopRAG
from fennec_community.rag.core import RAGSystem

my_rag=RAGSystem()
mh = MultiHopRAG(
    rag_system=my_rag,
    max_hops=3,
    language="ar",
    confidence_threshold=0.75,
)

answer = mh.query("ما هي أسباب الثورة الفرنسية وما نتائجها؟")
print(answer)

Example 2 — Full reasoning trace

result = mh.query(
    "Compare the economic impacts of World War I and World War II on Europe",
    return_intermediate=True,
)

print("=" * 60)
print("ANSWER")
print("=" * 60)
print(result["answer"])

print("\n" + "=" * 60)
print("REASONING CHAIN")
print("=" * 60)
for step in result["reasoning_chain"]:
    print("•", step)

print("\n" + "=" * 60)
print("HOP DETAILS")
print("=" * 60)
for hop in result["hops"]:
    print(f"\n[Hop {hop['hop_number']}] Strategy: {hop['strategy']}")
    print(f"  Query:      {hop['query']}")
    print(f"  Chunks:     {hop['chunks_found']}")
    print(f"  Entities:   {hop['entities']}")
    print(f"  Top scores: {[f'{s:.3f}' for s in hop['top_scores']]}")
    print(f"  Gap filled: {hop['gap_filled']}")
    print(f"  Reasoning:  {hop['reasoning_step']}")

print("\n" + "=" * 60)
print("KNOWLEDGE STATE")
print("=" * 60)
ks = result["knowledge_state"]
print(f"Confidence:    {ks['confidence']:.0%}")
print(f"Known facts:   {len(ks['known_facts'])}")
print(f"Missing info:  {ks['missing_info'] or 'None'}")
print(f"Total chunks:  {result['total_chunks']}")

Example 3 — Async usage in FastAPI

from fastapi import FastAPI, Query
from fennec_community.rag.types.multi_hop import MultiHopRAG

app = FastAPI()

mh = MultiHopRAG(
    rag_system=my_rag,
    max_hops=3,
    language="ar",
    confidence_threshold=0.75,
)

@app.get("/query")
async def answer_question(
    q:     str,
    hops:  int  = Query(default=None, ge=1, le=5),
    trace: bool = False,
):
    result = await mh.aquery(q, max_hops=hops, return_intermediate=trace)
    if trace:
        return result
    return {"answer": result}


@app.get("/stats")
def system_stats():
    return mh.get_stats()

Example 4 — Standalone QueryDecomposer

from fennec_community.rag.types.multi_hop import QueryDecomposer

# Arabic, NER-powered
decomposer = QueryDecomposer(use_stanza=True, language="ar")

# Decompose
parts = decomposer.decompose("من هو ابن سينا ثم ما أهم مؤلفاته؟")
print(parts)
# ["من هو ابن سينا", "ما أهم مؤلفاته؟"]

# Extract entities (flat, ranked)
entities = decomposer.extract_entities(
    "أسس جيف بيزوس شركة أمازون عام 1994 في مدينة سياتل الأمريكية."
)
print(entities)
# ["جيف بيزوس", "أمازون", "سياتل", "1994"]

# Extract entities by type (requires Stanza)
typed = decomposer.get_entities_by_type(
    "أسس جيف بيزوس شركة أمازون عام 1994 في مدينة سياتل الأمريكية."
)
print(typed)
# {"PERS": ["جيف بيزوس"], "ORG": ["أمازون"], "LOC": ["سياتل"], "DATE": ["1994"]}

Example 5 — Monitoring and observability

import time

mh = MultiHopRAG(rag_system=my_rag, max_hops=3, confidence_threshold=0.75)

questions = [
    "What is machine learning?",
    "Why did the Ottoman Empire fall?",
    "Compare GPU and CPU for deep learning",
    "When was the Eiffel Tower built and by whom?",
]

start = time.time()
for q in questions:
    answer = mh.query(q)
    print(f"Q: {q[:50]}")
    print(f"A: {answer[:100]}...\n")

elapsed = time.time() - start
stats = mh.get_stats()

print("\n--- System Stats ---")
print(f"Total queries:       {stats['total_queries']}")
print(f"Average hops/query:  {stats['average_hops']:.2f}")
print(f"Early stopping rate: {stats['early_stops']}/{stats['total_queries']} "
      f"({stats['early_stops']/stats['total_queries']:.0%})")
print(f"Total time:          {elapsed:.1f}s")
print(f"Avg time/query:      {elapsed/stats['total_queries']:.1f}s")
print(f"NER method:          {stats['ner_method']}")

Example 6 — Custom hop count per question type

from fennec_community.rag.types.multi_hop import MultiHopRAG

mh = MultiHopRAG(rag_system=my_rag, max_hops=5, confidence_threshold=0.75)

# Simple factual — 1 hop is enough
simple = mh.query("What is the capital of Japan?", hops=1)

# Causal — needs 2-3 hops
causal = mh.query("Why did the 2008 financial crisis happen?", hops=3)

# Complex comparison — use maximum hops
comparison = mh.query(
    "Compare the AI research output of the US and China between 2020 and 2024",
    hops=5,
    return_intermediate=True,
)
print("Comparison answer confidence:", comparison["knowledge_state"]["confidence"])

Example 7 — Async context manager

import asyncio
from fennec_community.rag.types.multi_hop import MultiHopRAG

async def batch_query(questions: list[str]) -> list[str]:
    async with MultiHopRAG(rag_system=my_rag, max_hops=3) as mh:
        results = []
        for q in questions:
            answer = await mh.aquery(q, return_intermediate=False)
            results.append(answer)
        return results

answers = asyncio.run(batch_query([
    "What is quantum computing?",
    "Why is the sky blue?",
    "When was the first iPhone released?",
]))

for q, a in zip(questions, answers):
    print(f"Q: {q}\nA: {a}\n")

Simple Real Example

from fennec_community.llm import MistralInterface
from fennec_community.document_loaders import TextLoader 
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem 
from fennec_community.rag.types.multi_hop import MultiHopRAG       

loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = MistralInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)


rag_system.add_documents(loader_1)

multi_hop = MultiHopRAG(
    rag_system=rag_system,
    max_hops=3,
    confidence_threshold=0.75,
    language="ar",
)

answer = multi_hop.query("ماهي طرق الدفع المتاحة؟")
print(answer)
Source: community/rag/multi_hop.md