Fennec Logo Fennec
Fennec Community community/rag/self_improving_rag.md

Self Improving Rag Modular


Table of Contents

  1. Overview
  2. Architecture & Design Philosophy
  3. Installation & Quick Start
  4. Data Classes Reference
  5. Configuration Reference
  6. Module: SelfIRAG
  7. Module: HyDERAG
  8. Module: RecursiveRAG
  9. Error Reference
  10. Performance & Tuning Guide
  11. Full Integration Examples

Overview

self_improving_rag is an advanced Retrieval-Augmented Generation (RAG) modular that implements three distinct self-improvement strategies on top of any existing RAG backend. Rather than a single retrieval-and-generate pass, each strategy adds an intelligence layer that makes the pipeline more accurate, context-aware, and robust.

Class Strategy Best For
SelfIRAG Iterative self-evaluation and refinement loop High-stakes Q&A where answer quality must be verified and improved
HyDERAG Hypothetical Document Embeddings Queries where the gap between question phrasing and document language is large
RecursiveRAG Recursive question decomposition Complex, multi-faceted questions with distinct sub-topics

All three classes share a common interface contract: they accept any RAG backend and LLM, produce structured typed result objects, and are fully compatible with both synchronous and asynchronous execution environments.


Architecture & Design Philosophy

┌─────────────────────────────────────────────────────────────────────────┐
│                         self_improving_rag                               │
│                                                                          │
│   ┌──────────────────┐  ┌──────────────────┐  ┌─────────────────────┐  │
│   │    SelfIRAG      │  │    HyDERAG       │  │   RecursiveRAG      │  │
│   │                  │  │                  │  │                     │  │
│   │ Retrieve         │  │ Generate N       │  │  Complexity check   │  │
│   │    ↓             │  │  hypothetical    │  │       ↓             │  │
│   │ Generate         │  │  docs (parallel) │  │  Decompose into     │  │
│   │    ↓             │  │    ↓             │  │  sub-questions      │  │
│   │ Evaluate         │  │ Retrieve per doc │  │       ↓             │  │
│   │    ↓             │  │    ↓             │  │  Recurse each sub-  │  │
│   │ Refine query     │  │ Deduplicate      │  │  question (depth N) │  │
│   │    ↓             │  │    ↓             │  │       ↓             │  │
│   │ Repeat until     │  │ Re-rank & answer │  │  Merge sub-answers  │  │
│   │ threshold met    │  │                  │  │                     │  │
│   └──────────────────┘  └──────────────────┘  └─────────────────────┘  │
│                                                                          │
│   Shared utilities: _call_llm, _retrieve_safe, _deduplicate              │
└─────────────────────────────────────────────────────────────────────────┘

Key design decisions:

  • Backend-agnostic: All classes accept any rag_system object that exposes retrieve(query, top_k) and add_document(text, metadata). The internal _retrieve_safe normaliser handles List[Tuple], List[Dict], and {results: [...]} output formats automatically.
  • LLM-agnostic: Any LLM object with a generate(prompt) -> str method is supported.
  • Graceful degradation: Every LLM call uses exponential back-off retries; failures fall back to sensible defaults rather than raising unhandled exceptions to the caller.
  • Typed results: All public methods return structured dataclass instances with .to_dict() serialization support.

Installation & Quick Start

pip install numpy fennec_community # only external dependency
from fennec_community.rag.types.self_improving_rag import SelfIRAG, HyDERAG, RecursiveRAG
from fennec_community.rag.types.self_improving_rag import SelfRAGConfig, RecursiveRAGConfig

# ── SelfIRAG ──────────────────────────────────────
cfg = SelfRAGConfig(
    max_refinement_iterations=3,
    confidence_threshold=0.85,
    fast_mode=True
)
self_rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=cfg)
result = self_rag.query("What is deep learning?")
print(result.answer, f"confidence={result.confidence:.2f}")

# ── HyDERAG ───────────────────────────────────────
hyde = HyDERAG(rag_system=my_rag, llm=my_llm, num_hypothetical_docs=3)
result = hyde.query("What are the benefits of exercise?")
print(result.answer)

# ── RecursiveRAG ──────────────────────────────────
rec_cfg = RecursiveRAGConfig(max_depth=3, max_sub_questions=4)
rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=rec_cfg)
result = rec_rag.query("What are the causes and effects of the Industrial Revolution?")
print(result.answer)
print(result.sub_questions)

Data Classes Reference

The following dataclasses are returned by the public API methods. They are importable directly from the package:

from fennec_community.rag.types.self_improving_rag import SelfRAGResult, HyDEResult, RecursiveRAGResult

SelfRAGResult

The structured return type of SelfIRAG.query() and SelfIRAG.aquery(). Contains the best answer found across all refinement iterations, along with full iteration history for auditing and debugging.

@dataclass
class SelfRAGResult:
    answer:     str
    confidence: float
    iterations: int
    history:    List[RefinementStep]

Fields

Field Type Description
answer str The highest-confidence answer produced across all iterations. This is the answer from the RefinementStep with the maximum .confidence, not necessarily the last iteration.
confidence float The geometric-mean confidence score [0.0, 1.0] of the best answer, computed as (relevance × accuracy × completeness × clarity)^0.25.
iterations int Total number of refinement iterations that were actually executed.
history List[RefinementStep] Full ordered list of all iteration records. Each RefinementStep has .iteration (int), .answer (str), .evaluation (EvaluationResult), and .confidence (float).

Properties

Property Type Description
final_answer str Alias for answer. Provided for backward compatibility with example notebooks.
refinement_steps int Alias for iterations.
initial_quality float Confidence score of the first iteration (history[0].confidence). Useful for measuring improvement delta.
final_quality float Confidence score of the last iteration (history[-1].confidence).
steps List[RefinementStep] Enriched history list where each step also carries .step, .action, and .reason attributes for human-readable logging.

Methods

Method Returns Description
to_dict() Dict[str, Any] Serialises the result to a JSON-safe dictionary. Includes answer, confidence, iterations, initial_quality, final_quality, and a compact history list.

Example

result = self_rag.query("Explain transformer attention")

print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Improved from {result.initial_quality:.2f}{result.final_quality:.2f} over {result.iterations} iterations")

for step in result.steps:
    print(f"  Step {step.step}: {step.reason}")

import json
print(json.dumps(result.to_dict(), indent=2))

HyDEResult

The structured return type of HyDERAG.query().

@dataclass
class HyDEResult:
    answer:            str
    hypothetical_docs: List[str]
    num_retrieved:     int
    num_ranked:        int

Fields

Field Type Description
answer str The final generated answer, grounded in the retrieved and re-ranked real documents.
hypothetical_docs List[str] The list of synthetic "hypothetical" documents generated by the LLM. Useful for debugging and understanding what the LLM imagined the answer space to be.
num_retrieved int Total number of unique real documents retrieved across all search queries (original + hypothetical).
num_ranked int Number of documents passed to the final answer generation step after re-ranking (capped by max_final_docs).

Methods

Method Returns Description
to_dict() Dict[str, Any] Serialises to a JSON-safe dictionary with answer, hypothetical_docs_generated, num_retrieved, and num_ranked.

Example

result = hyde.query("How does photosynthesis work?")

print(result.answer)
print(f"Generated {len(result.hypothetical_docs)} hypothetical docs")
print(f"Retrieved {result.num_retrieved} unique real docs, used {result.num_ranked} for generation")

RecursiveRAGResult

The structured return type of RecursiveRAG.query().

@dataclass
class RecursiveRAGResult:
    answer:        str
    depth:         int
    decomposed:    bool
    sub_questions: List[str]
    sub_answers:   List[SubAnswer]

Fields

Field Type Description
answer str The final synthesised answer — either a direct answer (if question was simple) or a merged answer from all sub-answers.
depth int The recursion depth at which this result was produced. 0 = top-level call.
decomposed bool True if the question was decomposed into sub-questions; False if answered directly.
sub_questions List[str] The sub-questions generated during decomposition. Empty if decomposed=False.
sub_answers List[SubAnswer] Internal SubAnswer records for each sub-question, each carrying .question, .answer, and .depth. Empty if decomposed=False.

Methods

Method Returns Description
to_dict() Dict[str, Any] Serialises to a JSON-safe dictionary with answer, decomposed, depth, and sub_questions.

Example

result = rec_rag.query("What are the causes, effects, and legacy of World War I?")

print(result.answer)
print(f"Decomposed: {result.decomposed}")
if result.decomposed:
    for i, q in enumerate(result.sub_questions, 1):
        print(f"  Sub-question {i}: {q}")

print(result.to_dict())

Configuration Reference

SelfRAGConfig

Controls the behaviour of the SelfIRAG iterative refinement loop.

from fennec_community.rag.types.self_improving_rag import SelfRAGConfig

config = SelfRAGConfig(
    max_refinement_iterations=3,
    confidence_threshold=0.80,
    enable_answer_refinement=True,
    enable_retrieval_refinement=True,
    llm_retries=2,
    fast_mode=True,
    skip_refinement_on_high_confidence=True
)

Fields

Field Type Default Description
max_refinement_iterations int 3 Hard cap on the number of retrieve-generate-evaluate cycles. The loop exits early if confidence_threshold is reached before this limit.
confidence_threshold float 0.80 Minimum acceptable confidence score [0.0, 1.0]. Once any iteration's geometric-mean confidence meets or exceeds this value, the loop stops immediately and returns that answer.
enable_answer_refinement bool True When True, the current answer is rewritten at the end of each non-final iteration using the LLM's identified issues and suggestions. Disable to skip answer rewriting and only adapt the retrieval query.
enable_retrieval_refinement bool True When True, the retrieval query is expanded with keyword suggestions from the evaluator before the next iteration. Helps surface documents that the original query missed.
llm_retries int 2 Number of retry attempts (with exponential back-off) for every LLM call in the pipeline. Total attempts = llm_retries + 1.
fast_mode bool True When True, generation and evaluation are combined into a single LLM call per iteration (saving ~50% of LLM calls). When False, generation and evaluation are separate calls for potentially higher accuracy. Falls back to two-step on JSON parse failure.
skip_refinement_on_high_confidence bool True Reserved flag. When True, a very high first-iteration confidence may short-circuit all subsequent iterations. (Implemented via the confidence_threshold check in the main loop.)

RecursiveRAGConfig

Controls the decomposition behaviour of RecursiveRAG.

from fennec_community.rag.types.self_improving_rag import RecursiveRAGConfig

config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=40,
    complexity_keywords=["and", "or", "causes", "compare", "why", "how"],
    llm_retries=2
)

Fields

Field Type Default Description
max_depth int 3 Maximum recursion depth. Prevents infinite decomposition loops. A query at depth max_depth is always answered directly, regardless of its apparent complexity.
max_sub_questions int 4 Maximum number of sub-questions generated per decomposition. The LLM is instructed to produce at most this many sub-questions, and any excess is truncated.
min_query_length int 40 Character-length threshold below which a query is never decomposed — short queries are assumed simple.
complexity_keywords List[str] See below Arabic and English marker words that indicate a complex, multi-part query. A query is considered complex if it is longer than min_query_length AND contains at least two of these keywords.
llm_retries int 2 Retry attempts for each LLM call (decomposition, direct answering, and merging).

Default complexity keywords: و، أو، ثم، بعد، قبل، أسباب، نتائج، مقارنة، الفرق، العلاقة، التأثير، كيف، لماذا، متى، أين (Arabic) and and, or, causes, effects, compare, difference, relationship, impact, how, why, when (English).


Module: SelfIRAG

SelfIRAG implements the Self-RAG paradigm: a closed-loop pipeline that retrieves documents, generates an answer, evaluates that answer on four dimensions (relevance, accuracy, completeness, clarity), and then refines both the answer and the retrieval query before repeating. The iteration with the highest confidence score is returned as the final result.

RetrieveGenerateEvaluateRefine Answer & QueryRepeat

The confidence score uses a geometric mean of the four evaluation dimensions, meaning a single weak dimension (e.g., low relevance) significantly lowers the overall score — a stricter measure than arithmetic mean.


SelfIRAG Constructor

SelfIRAG(
    rag_system: Any,
    llm: Any,
    config: Optional[SelfRAGConfig] = None
)

Initialises the Self-RAG system. Does not perform any retrieval or LLM calls at construction time.

Parameters

Parameter Type Default Description
rag_system Any required The underlying RAG backend. Must expose retrieve(query: str, top_k: int) -> Any and add_document(text: str, metadata: dict) -> Any. The retrieve output format is normalised automatically — supports List[Tuple[DocumentChunk, float]], List[Dict], and {"results": [...]}.
llm Any required Any LLM object with a synchronous generate(prompt: str) -> str method. An optional generate_async method is used automatically if present. Passing a falsy value raises ValueError.
config SelfRAGConfig | None None Configuration object. A default SelfRAGConfig() is created if not provided.

Raises

  • ValueError — if llm is falsy (None, empty, etc.).

Example

from fennec_community.rag.types.self_improving_rag import SelfIRAG, SelfRAGConfig

config = SelfRAGConfig(
    max_refinement_iterations=4,
    confidence_threshold=0.85,
    fast_mode=True,
    llm_retries=3
)

rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=config)

SelfIRAG.query

query(query: str, context: Optional[Dict[str, Any]] = None) -> SelfRAGResult

The primary public method. Runs the full self-improvement loop: retrieves documents, generates an answer, evaluates it, refines both the answer and retrieval query, and repeats until either the confidence threshold is met or the iteration cap is reached. Returns the highest-confidence answer found across all iterations.

Parameters

Parameter Type Default Description
query str required The natural-language question to answer. Works in Arabic and English. The answer language mirrors the query language per the embedded prompt instructions.
context Dict[str, Any] | None None Optional extra context dictionary forwarded to the RAG backend. Reserved for future use in the current implementation; the backend receives it via the retrieval call signature if supported.

Returns

SelfRAGResult — Contains the best answer, its confidence score, the number of iterations run, and the full iteration history.

Iteration loop logic:

  1. Retrieve top-5 documents for current_query (starts as the original query).
  2. If fast_mode=True: generate answer and evaluate in one LLM call. If False: two separate LLM calls.
  3. Record the RefinementStep in history.
  4. If confidence >= confidence_threshold: stop immediately and return the best step.
  5. If more iterations remain and enable_answer_refinement=True: rewrite the answer using identified issues and suggestions.
  6. If enable_retrieval_refinement=True and suggestions exist: append top-2 suggestion keywords to current_query for the next retrieval.
  7. After all iterations: select the step with the highest confidence as the final answer.

Example

result = rag.query("ما هي أسباب تغير المناخ؟")

print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Ran {result.iterations} iteration(s)")
print(f"Quality improved: {result.initial_quality:.2f}{result.final_quality:.2f}")

Performance notes:

  • With fast_mode=True: each iteration uses 1 LLM call (generate+evaluate combined).
  • With fast_mode=False: each iteration uses 2 LLM calls (generate, then evaluate).
  • With enable_answer_refinement=True: each non-final iteration adds 1 more LLM call (refine).
  • Total LLM calls (worst case, fast_mode=False, refinement enabled): iterations × 3.

SelfIRAG.aquery

async aquery(query: str, context: Optional[Dict[str, Any]] = None) -> SelfRAGResult

Async wrapper around query. Runs the entire synchronous pipeline in a thread pool via asyncio.to_thread, making it safe to call from async contexts (FastAPI endpoints, async scripts, Jupyter async cells) without blocking the event loop.

Parameters

Parameter Type Default Description
query str required The natural-language question.
context Dict[str, Any] | None None Optional extra context.

Returns

SelfRAGResult — Identical to query().

Example

# FastAPI endpoint
from fastapi import FastAPI
app = FastAPI()

@app.get("/answer")
async def answer(q: str):
    result = await rag.aquery(q)
    return result.to_dict()

# Async script
import asyncio
result = asyncio.run(rag.aquery("Explain quantum entanglement"))
print(result.answer)

SelfIRAG.add_document

add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> Any

Adds a document to the underlying RAG backend's vector store. This is a convenience pass-through — equivalent to calling rag_system.add_document(text, metadata) directly.

Parameters

Parameter Type Default Description
text str required The raw document text to index.
metadata Dict[str, Any] | None None Optional metadata dictionary (e.g., {"source": "wiki", "date": "2025-01"}). Defaults to an empty dict {} if None.

Returns

Any — The return value of the backend's add_document method (backend-specific; often a document ID or chunk count).

Example

rag.add_document(
    "Deep learning is a subset of machine learning that uses neural networks.",
    metadata={"source": "textbook", "chapter": 3}
)

SelfIRAG.get_stats

get_stats() -> dict

Returns the current runtime configuration as a plain dictionary. Useful for logging, monitoring dashboards, and debugging configuration state.

Parameters

None.

Returns

dict with the following keys:

Key Type Description
max_refinement_iterations int Hard cap on iterations from SelfRAGConfig.
confidence_threshold float Target confidence score from SelfRAGConfig.
enable_answer_refinement bool Whether answer rewriting is enabled.
enable_retrieval_refinement bool Whether query expansion is enabled.
llm_retries int Retry count per LLM call.
fast_mode bool Whether single-call generate+evaluate mode is active.

Example

stats = rag.get_stats()
print(stats)
# {
#   'max_refinement_iterations': 3,
#   'confidence_threshold': 0.85,
#   'enable_answer_refinement': True,
#   'enable_retrieval_refinement': True,
#   'llm_retries': 2,
#   'fast_mode': True
# }

Module: HyDERAG

HyDERAG implements Hypothetical Document Embeddings (HyDE) — a retrieval strategy that bridges the semantic gap between a question and its answers. Instead of searching directly with the question, it asks the LLM to generate N synthetic documents that would answer the query, then retrieves real documents using each synthetic doc as a search vector. The intuition is that a hypothetical answer is semantically closer to real answer documents than the question itself.

QueryGenerate N hypothetical docs (parallel)
      → Retrieve real docs per hypothetical + original query
      → Deduplicate → Re-rank by scoreGenerate final answer

Hypothetical documents are generated in parallel using a thread pool, reducing latency proportionally to the number of workers.


HyDERAG Constructor

HyDERAG(
    rag_system: Any,
    llm: Any,
    num_hypothetical_docs: int = 3,
    use_original_query: bool = True,
    top_k: int = 5,
    max_final_docs: int = 10,
    llm_retries: int = 2,
    max_workers: int = 4
)

Parameters

Parameter Type Default Description
rag_system Any required The underlying RAG backend. Must expose retrieve(query, top_k) and add_document(text, metadata).
llm Any required LLM object with generate(prompt) -> str. Passing falsy raises ValueError.
num_hypothetical_docs int 3 Number of synthetic hypothetical documents to generate per query. More docs increase recall at the cost of more LLM calls and retrieval passes.
use_original_query bool True When True, the original query is also used as a search vector alongside the hypothetical docs. Recommended — ensures direct matches are never missed.
top_k int 5 Number of real documents to retrieve per search query (original + each hypothetical doc).
max_final_docs int 10 Maximum number of unique documents (after deduplication and re-ranking) passed to the final answer generation step.
llm_retries int 2 Retry attempts per LLM call.
max_workers int 4 Maximum number of parallel threads for hypothetical document generation. Capped at min(max_workers, num_hypothetical_docs).

Raises

  • ValueError — if llm is falsy.

Example

from fennec_community.rag.types.self_improving_rag import HyDERAG

hyde = HyDERAG(
    rag_system=my_rag,
    llm=my_llm,
    num_hypothetical_docs=4,
    use_original_query=True,
    top_k=7,
    max_final_docs=12,
    max_workers=4
)

HyDERAG.query

query(query: str, context: Optional[Dict[str, Any]] = None) -> HyDEResult

Executes the full HyDE retrieval-generation pipeline for a given query.

Parameters

Parameter Type Default Description
query str required The natural-language question.
context Dict[str, Any] | None None Optional extra context (reserved for future use).

Returns

HyDEResult — Contains the generated answer, the list of hypothetical documents, and retrieval statistics.

Pipeline steps:

  1. Hypothetical document generation (parallel): The LLM is prompted to write a factual paragraph that would perfectly answer the query — as if authored by a domain expert, without referencing the question itself. All num_hypothetical_docs are generated concurrently. Any generation that produces fewer than 40 characters is discarded.
  2. Multi-source retrieval: Real documents are retrieved for each hypothetical doc (and for the original query if use_original_query=True). Failed retrievals for individual queries are logged and skipped rather than aborting the pipeline.
  3. Deduplication: Documents are de-duplicated using an MD5 hash of the first 120 characters. When the same document appears multiple times (via different search queries), the copy with the highest score is kept.
  4. Re-ranking: Unique documents are sorted by descending score and trimmed to max_final_docs.
  5. Answer generation: The top-5 re-ranked documents are used to build a context, and the LLM generates the final answer with strict anti-hallucination instructions (answer only from context; respond in the same language as the query).

Fallback behaviour: If all retrieval attempts fail (no documents returned), the answer is "There is insufficient information.".

Example

result = hyde.query("ما هي فوائد ممارسة الرياضة على الصحة النفسية؟")

print(result.answer)
print(f"\nGenerated {len(result.hypothetical_docs)} hypothetical documents:")
for i, doc in enumerate(result.hypothetical_docs, 1):
    print(f"  [{i}] {doc[:120]}...")
print(f"\nRetrieved: {result.num_retrieved} unique docs → used top {result.num_ranked}")

When to use HyDE vs SelfIRAG:

  • Use HyDE when queries are phrased very differently from document language (e.g., a question in a different register or vocabulary than the indexed knowledge base).
  • Use SelfIRAG when you need iterative quality verification and the LLM should evaluate and improve its own answers.

HyDERAG.add_document

add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> Any

Adds a document to the underlying RAG backend. Convenience pass-through identical in behaviour to SelfIRAG.add_document.

Parameters

Parameter Type Default Description
text str required Raw document text to index.
metadata Dict[str, Any] | None None Optional metadata. Defaults to {} if None.

Returns

Any — Backend-specific return value.

Example

hyde.add_document(
    "Exercise releases endorphins, which are natural mood elevators...",
    metadata={"source": "health_journal", "topic": "exercise"}
)

Module: RecursiveRAG

RecursiveRAG handles complex, multi-faceted queries by decomposing them into smaller, self-contained sub-questions and answering each one independently (recursively, if needed), then synthesising all sub-answers into a single coherent final answer.

Query → Complexity check
          ↓ (complex)              ↓ (simple or max_depth)
     Decompose into            Answer directly
     N sub-questions          (retrieve + generate)
          ↓
     Recurse each sub-question
          ↓
     Merge sub-answers into final answer

Complexity detection uses a dual heuristic: the query must both exceed min_query_length characters and contain at least two keywords from complexity_keywords. If either condition fails, the query is answered directly.


RecursiveRAG Constructor

RecursiveRAG(
    rag_system: Any,
    llm: Any,
    config: Optional[RecursiveRAGConfig] = None
)

Parameters

Parameter Type Default Description
rag_system Any required The underlying RAG backend with retrieve(query, top_k) and add_document(text, metadata).
llm Any required LLM object with generate(prompt) -> str. Falsy values raise ValueError.
config RecursiveRAGConfig | None None Decomposition configuration. A default RecursiveRAGConfig() is created if not provided.

Raises

  • ValueError — if llm is falsy.

Example

from fennec_community.rag.types.self_improving_rag import RecursiveRAG, RecursiveRAGConfig

config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=50,
    llm_retries=2
)

rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=config)

RecursiveRAG.query

query(
    query: str,
    context: Optional[Dict[str, Any]] = None,
    _depth: int = 0
) -> RecursiveRAGResult

Recursively answers the query. Simple queries (by length or keyword count) are answered directly via a single retrieve-and-generate pass. Complex queries are decomposed into sub-questions, each of which is recursively answered (possibly decomposed further up to max_depth), and the sub-answers are merged into a final answer via an LLM synthesis call.

⚠️ Important: The _depth parameter is an internal recursion counter. Do not set it manually when calling this method from application code. Always call with only query (and optionally context).

Parameters

Parameter Type Default Description
query str required The natural-language question. Can be simple or complex, in Arabic or English.
context Dict[str, Any] | None None Optional extra context forwarded through all recursive calls.
_depth int 0 Internal use only. The current recursion depth. Do not set this parameter.

Returns

RecursiveRAGResult — Contains the final answer, the decomposition trace (sub_questions, sub_answers), and whether the question was decomposed.

Execution decision tree:

query(q, depth=D)
  ├─ IF D >= max_depth OR q is not complex → _answer_directly(q, depth=D)
  ├─ decompose(q) → sub_questions
  │    ├─ IF len(sub_questions) <= 1_answer_directly(q, depth=D)
  │    └─ FOR each sq IN sub_questions:
  │           sub_result = query(sq, depth=D+1)   ← recursive call
  └─ merge(q, all sub_results) → final answer

Complexity check details: A query is considered complex if len(query) >= min_query_length AND at least 2 words from complexity_keywords appear in the lowercased query.

Decomposition fallback: If the LLM returns malformed JSON for the sub-question list, a regex fallback parser extracts numbered or bulleted items from the raw LLM response.

Direct answer fallback: If retrieval returns no documents, the answer is "There is not enough information." and the result has decomposed=False.

Merge fallback: If the merge LLM call fails, sub-answers are concatenated with double newlines as the final answer.

Example

# Complex query — will be decomposed
result = rec_rag.query(
    "What are the causes and effects of climate change, "
    "and how do they compare to the impacts of deforestation?"
)
print(result.answer)
print(f"Decomposed: {result.decomposed}")
for i, q in enumerate(result.sub_questions, 1):
    print(f"  Sub-question {i}: {q}")

# Simple query — answered directly
result = rec_rag.query("What is photosynthesis?")
print(result.answer)
print(f"Decomposed: {result.decomposed}")  # False

Arabic example

result = rec_rag.query(
    "ما هي أسباب ونتائج الثورة الصناعية وكيف أثرت على المجتمع؟"
)
print(result.answer)
print(result.sub_questions)

When to use RecursiveRAG vs SelfIRAG:

  • Use RecursiveRAG for broad, multi-part questions where different parts require different retrieved context.
  • Use SelfIRAG for single-topic questions where the answer quality needs iterative verification.
  • For maximum robustness, consider chaining: RecursiveRAG for decomposition, with each leaf answered by SelfIRAG.

RecursiveRAG.add_document

add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> Any

Adds a document to the underlying RAG backend. Convenience pass-through identical in behaviour to SelfIRAG.add_document.

Parameters

Parameter Type Default Description
text str required Raw document text to index.
metadata Dict[str, Any] | None None Optional metadata dictionary. Defaults to {} if None.

Returns

Any — Backend-specific return value.

Example

rec_rag.add_document(
    "The Industrial Revolution began in Britain in the late 18th century...",
    metadata={"source": "history_encyclopedia", "era": "18th_century"}
)

Error Reference

All custom exceptions are importable from the module:

from fennec_community.rag.types.self_improving_rag import LLMCallError, RetrievalError, RAGSystemError
Exception Parent When Raised
RAGSystemError Exception Base class for all module errors.
LLMCallError RAGSystemError All retry attempts for an LLM generate() call have been exhausted. Carries the original exception message.
RetrievalError RAGSystemError The underlying RAG backend's retrieve() call raised an exception.

Error handling in practice:

The public methods (query, aquery, HyDERAG.query, RecursiveRAG.query) are designed to be non-raising for end users. All LLMCallError and RetrievalError exceptions are caught internally, logged, and converted to graceful fallback responses (e.g., "error in generating answer", "There is not enough information."). The exceptions are exposed for downstream integrations that want to implement custom error handling at a lower level.

Example of custom error handling at LLM level:

from fennec_community.rag.types.self_improving_rag import _call_llm, LLMCallError

try:
    response = _call_llm(my_llm, "Your prompt here", retries=3)
except LLMCallError as e:
    print(f"LLM permanently unavailable: {e}")
    # implement your own fallback

Performance & Tuning Guide

Choosing the Right Strategy

Scenario Recommended Class Config Tips
Single-topic Q&A, quality-critical SelfIRAG fast_mode=True, max_refinement_iterations=2-4, confidence_threshold=0.80
Cross-domain or vocabulary-gap queries HyDERAG num_hypothetical_docs=3-5, use_original_query=True
Multi-part complex questions RecursiveRAG max_depth=2-3, max_sub_questions=3-4
Multi-part + quality verification Chain RecursiveRAGSelfIRAG per leaf Tune each independently

LLM Call Budget Estimates

Class Min calls/query Max calls/query Notes
SelfIRAG (fast_mode=True) 1 iterations × 2 +1/iter for answer refinement
SelfIRAG (fast_mode=False) 2 iterations × 3 generate + evaluate + refine
HyDERAG N + 1 N + 1 N hypothetical docs + 1 final answer
RecursiveRAG 1 exponential Depth × branching factor

Reducing Latency

  • SelfIRAG: Set fast_mode=True (default). Set skip_refinement_on_high_confidence=True (default). Use aquery() in async environments.
  • HyDERAG: Hypothetical docs are already generated in parallel. Increase max_workers for very high num_hypothetical_docs values.
  • RecursiveRAG: Keep max_depth ≤ 3 and max_sub_questions ≤ 4. Simple queries bypass decomposition entirely.

Improving Answer Quality

  • SelfIRAG: Increase max_refinement_iterations and lower confidence_threshold for harder queries. Keep enable_answer_refinement=True and enable_retrieval_refinement=True.
  • HyDERAG: Increase num_hypothetical_docs for broader semantic coverage. Set use_original_query=True (default) to ensure direct keyword matches are not missed.
  • RecursiveRAG: Expand complexity_keywords for your domain. Lower min_query_length if short complex queries are common in your use case.

Full Integration Examples

Example 1 — SelfIRAG with Full Audit Logging

import json
from fennec_community.rag.types.self_improving_rag import SelfIRAG, SelfRAGConfig

config = SelfRAGConfig(
    max_refinement_iterations=4,
    confidence_threshold=0.82,
    enable_answer_refinement=True,
    enable_retrieval_refinement=True,
    fast_mode=True,
    llm_retries=3
)

rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=config)

# Index documents
rag.add_document("Neural networks are computing systems inspired by the brain...", {"source": "AI_textbook"})
rag.add_document("Deep learning uses multi-layered neural networks...", {"source": "ML_guide"})

# Query
result = rag.query("How does deep learning differ from classical machine learning?")

print("=== Answer ===")
print(result.answer)

print("\n=== Iteration History ===")
for step in result.steps:
    print(f"  Iteration {step.step}: confidence={step.confidence:.2f} | {step.reason}")

print(f"\nQuality delta: {result.initial_quality:.2f}{result.final_quality:.2f}")
print(json.dumps(result.to_dict(), indent=2, ensure_ascii=False))

Example 2 — HyDERAG with Retrieval Statistics

from fennec_community.rag.types.self_improving_rag import HyDERAG

hyde = HyDERAG(
    rag_system=my_rag,
    llm=my_llm,
    num_hypothetical_docs=4,
    use_original_query=True,
    top_k=6,
    max_final_docs=12,
    max_workers=4
)

hyde.add_document("Photosynthesis converts light energy into chemical energy...", {"topic": "biology"})

result = hyde.query("Explain the process by which plants produce energy from sunlight.")

print(result.answer)
print(f"\n[Stats] hypothetical docs: {len(result.hypothetical_docs)}, "
      f"retrieved: {result.num_retrieved}, used: {result.num_ranked}")

print("\n[Hypothetical docs used as search vectors:]")
for i, doc in enumerate(result.hypothetical_docs, 1):
    print(f"  {i}. {doc[:150]}...")

Example 3 — RecursiveRAG for Complex Multi-Part Questions

from self_improving_rag import RecursiveRAG, RecursiveRAGConfig

config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=40,
    llm_retries=2
)

rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=config)

rec_rag.add_document("The Industrial Revolution began in Britain in the 1760s...", {"era": "18th_century"})
rec_rag.add_document("The effects of industrialisation included urbanisation...", {"era": "19th_century"})

result = rec_rag.query(
    "What were the causes and effects of the Industrial Revolution "
    "and how did they compare to the Agricultural Revolution?"
)

print(result.answer)

if result.decomposed:
    print(f"\nDecomposed into {len(result.sub_questions)} sub-questions:")
    for i, (q, a) in enumerate(zip(result.sub_questions, result.sub_answers), 1):
        print(f"\n  [{i}] Q: {q}")
        print(f"       A: {a.answer[:200]}...")

print(result.to_dict())

Example 4 — Async FastAPI Service with All Three Strategies

from fastapi import FastAPI
from fastapi.responses import JSONResponse
from fennec_community.rag.types.self_improving_rag import SelfIRAG, HyDERAG, RecursiveRAG, SelfRAGConfig

app = FastAPI()

self_rag  = SelfIRAG(my_rag, my_llm, config=SelfRAGConfig(fast_mode=True))
hyde_rag  = HyDERAG(my_rag, my_llm, num_hypothetical_docs=3)
rec_rag   = RecursiveRAG(my_rag, my_llm)

@app.get("/query/self")
async def self_answer(q: str):
    result = await self_rag.aquery(q)
    return result.to_dict()

@app.get("/query/hyde")
async def hyde_answer(q: str):
    import asyncio
    result = await asyncio.to_thread(hyde_rag.query, q)
    return result.to_dict()

@app.get("/query/recursive")
async def recursive_answer(q: str):
    import asyncio
    result = await asyncio.to_thread(rec_rag.query, q)
    return result.to_dict()

@app.get("/stats")
def stats():
    return self_rag.get_stats()

Example 5 — Chaining RecursiveRAG + SelfIRAG

import asyncio
from fennec_community.rag.types.self_improving_rag import RecursiveRAG, SelfIRAG, RecursiveRAGConfig, SelfRAGConfig

# Decompose with RecursiveRAG, then verify each leaf with SelfIRAG
rec = RecursiveRAG(my_rag, my_llm, config=RecursiveRAGConfig(max_depth=2, max_sub_questions=3))
self_rag = SelfIRAG(my_rag, my_llm, config=SelfRAGConfig(max_refinement_iterations=2, fast_mode=True))

complex_query = (
    "What are the causes and effects of climate change, "
    "and how do they relate to biodiversity loss?"
)

# Step 1: Decompose
rec_result = rec.query(complex_query)

if rec_result.decomposed:
    print(f"Decomposed into {len(rec_result.sub_questions)} sub-questions")

    # Step 2: Verify each sub-answer with SelfIRAG
    refined_answers = []
    for sq in rec_result.sub_questions:
        verified = self_rag.query(sq)
        refined_answers.append(f"Q: {sq}\nA: {verified.answer} (confidence={verified.confidence:.2f})")
        print(f"  ✓ '{sq[:60]}' → confidence={verified.confidence:.2f}")

    print("\n=== Refined composite answer ===")
    print("\n\n".join(refined_answers))
else:
    print(rec_result.answer)

Simple Real Example

from fennec_community.llm import GeminiInterface
from fennec_community.document_loaders import TextLoader 
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem 
from fennec_community.rag.types.self_improving_rag import SelfIRAG,   SelfRAGConfig   ,HyDERAG , RecursiveRAG , RecursiveRAGConfig


loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = GeminiInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)


# SelfIRAG
rag_system.add_documents(loader_1)
config = SelfRAGConfig(
    max_refinement_iterations=3,
    confidence_threshold=0.85,
    fast_mode=True,
)
self_rag = SelfIRAG(rag_system=rag_system, llm=llm, config=config)
result = self_rag.query("ما هي طرق الدفع المتاحة؟")
result

# HyDERAG
hyde = HyDERAG(
    rag_system=rag_system,
    llm=llm,
    num_hypothetical_docs=4,
    use_original_query=True,
    top_k=5,
    max_final_docs=12,
)
result = hyde.query("ما هي طرق الدفع المتاحة؟")
result

# RecursiveRAG
config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=40,
)
recursive_rag = RecursiveRAG(rag_system=rag_system, llm=llm, config=config)
result = recursive_rag.query("ما هي طرق الدفع المتاحة؟")
result
Source: community/rag/self_improving_rag.md