Fennec Community community/rag/self_improving_rag.md

Self Improving Rag Modular

Overview
Architecture & Design Philosophy
Installation & Quick Start
Data Classes Reference
Configuration Reference
- SelfRAGConfig
- RecursiveRAGConfig
Module: SelfIRAG
- Constructor
- query
- aquery
- add_document
- get_stats
Module: HyDERAG
Module: RecursiveRAG
Error Reference
Performance & Tuning Guide
Full Integration Examples

Overview

self_improving_rag is an advanced Retrieval-Augmented Generation (RAG) modular that implements three distinct self-improvement strategies on top of any existing RAG backend. Rather than a single retrieval-and-generate pass, each strategy adds an intelligence layer that makes the pipeline more accurate, context-aware, and robust.

Class	Strategy	Best For
`SelfIRAG`	Iterative self-evaluation and refinement loop	High-stakes Q&A where answer quality must be verified and improved
`HyDERAG`	Hypothetical Document Embeddings	Queries where the gap between question phrasing and document language is large
`RecursiveRAG`	Recursive question decomposition	Complex, multi-faceted questions with distinct sub-topics

All three classes share a common interface contract: they accept any RAG backend and LLM, produce structured typed result objects, and are fully compatible with both synchronous and asynchronous execution environments.

Architecture & Design Philosophy

┌─────────────────────────────────────────────────────────────────────────┐
│                         self_improving_rag                               │
│                                                                          │
│   ┌──────────────────┐  ┌──────────────────┐  ┌─────────────────────┐  │
│   │    SelfIRAG      │  │    HyDERAG       │  │   RecursiveRAG      │  │
│   │                  │  │                  │  │                     │  │
│   │ Retrieve         │  │ Generate N       │  │  Complexity check   │  │
│   │    ↓             │  │  hypothetical    │  │       ↓             │  │
│   │ Generate         │  │  docs (parallel) │  │  Decompose into     │  │
│   │    ↓             │  │    ↓             │  │  sub-questions      │  │
│   │ Evaluate         │  │ Retrieve per doc │  │       ↓             │  │
│   │    ↓             │  │    ↓             │  │  Recurse each sub-  │  │
│   │ Refine query     │  │ Deduplicate      │  │  question (depth N) │  │
│   │    ↓             │  │    ↓             │  │       ↓             │  │
│   │ Repeat until     │  │ Re-rank & answer │  │  Merge sub-answers  │  │
│   │ threshold met    │  │                  │  │                     │  │
│   └──────────────────┘  └──────────────────┘  └─────────────────────┘  │
│                                                                          │
│   Shared utilities: _call_llm, _retrieve_safe, _deduplicate              │
└─────────────────────────────────────────────────────────────────────────┘

Key design decisions:

Backend-agnostic: All classes accept any rag_system object that exposes retrieve(query, top_k) and add_document(text, metadata). The internal _retrieve_safe normaliser handles List[Tuple], List[Dict], and {results: [...]} output formats automatically.
LLM-agnostic: Any LLM object with a generate(prompt) -> str method is supported.
Graceful degradation: Every LLM call uses exponential back-off retries; failures fall back to sensible defaults rather than raising unhandled exceptions to the caller.
Typed results: All public methods return structured dataclass instances with .to_dict() serialization support.

Installation & Quick Start

pip install numpy fennec_community # only external dependency

from fennec_community.rag.types.self_improving_rag import SelfIRAG, HyDERAG, RecursiveRAG
from fennec_community.rag.types.self_improving_rag import SelfRAGConfig, RecursiveRAGConfig

# ── SelfIRAG ──────────────────────────────────────
cfg = SelfRAGConfig(
    max_refinement_iterations=3,
    confidence_threshold=0.85,
    fast_mode=True
)
self_rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=cfg)
result = self_rag.query("What is deep learning?")
print(result.answer, f"confidence={result.confidence:.2f}")

# ── HyDERAG ───────────────────────────────────────
hyde = HyDERAG(rag_system=my_rag, llm=my_llm, num_hypothetical_docs=3)
result = hyde.query("What are the benefits of exercise?")
print(result.answer)

# ── RecursiveRAG ──────────────────────────────────
rec_cfg = RecursiveRAGConfig(max_depth=3, max_sub_questions=4)
rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=rec_cfg)
result = rec_rag.query("What are the causes and effects of the Industrial Revolution?")
print(result.answer)
print(result.sub_questions)

Data Classes Reference

The following dataclasses are returned by the public API methods. They are importable directly from the package:

from fennec_community.rag.types.self_improving_rag import SelfRAGResult, HyDEResult, RecursiveRAGResult

`SelfRAGResult`

The structured return type of SelfIRAG.query() and SelfIRAG.aquery(). Contains the best answer found across all refinement iterations, along with full iteration history for auditing and debugging.

@dataclass
class SelfRAGResult:
    answer:     str
    confidence: float
    iterations: int
    history:    List[RefinementStep]

Fields

Field	Type	Description
`answer`	`str`	The highest-confidence answer produced across all iterations. This is the answer from the `RefinementStep` with the maximum `.confidence`, not necessarily the last iteration.
`confidence`	`float`	The geometric-mean confidence score `[0.0, 1.0]` of the best answer, computed as `(relevance × accuracy × completeness × clarity)^0.25`.
`iterations`	`int`	Total number of refinement iterations that were actually executed.
`history`	`List[RefinementStep]`	Full ordered list of all iteration records. Each `RefinementStep` has `.iteration` (int), `.answer` (str), `.evaluation` (`EvaluationResult`), and `.confidence` (float).

Properties

Property	Type	Description
`final_answer`	`str`	Alias for `answer`. Provided for backward compatibility with example notebooks.
`refinement_steps`	`int`	Alias for `iterations`.
`initial_quality`	`float`	Confidence score of the first iteration (`history[0].confidence`). Useful for measuring improvement delta.
`final_quality`	`float`	Confidence score of the last iteration (`history[-1].confidence`).
`steps`	`List[RefinementStep]`	Enriched history list where each step also carries `.step`, `.action`, and `.reason` attributes for human-readable logging.

Methods

Method	Returns	Description
`to_dict()`	`Dict[str, Any]`	Serialises the result to a JSON-safe dictionary. Includes answer, confidence, iterations, initial_quality, final_quality, and a compact history list.

Example

result = self_rag.query("Explain transformer attention")

print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Improved from {result.initial_quality:.2f} → {result.final_quality:.2f} over {result.iterations} iterations")

for step in result.steps:
    print(f"  Step {step.step}: {step.reason}")

import json
print(json.dumps(result.to_dict(), indent=2))

`HyDEResult`

The structured return type of HyDERAG.query().

@dataclass
class HyDEResult:
    answer:            str
    hypothetical_docs: List[str]
    num_retrieved:     int
    num_ranked:        int

Fields

Field	Type	Description
`answer`	`str`	The final generated answer, grounded in the retrieved and re-ranked real documents.
`hypothetical_docs`	`List[str]`	The list of synthetic "hypothetical" documents generated by the LLM. Useful for debugging and understanding what the LLM imagined the answer space to be.
`num_retrieved`	`int`	Total number of unique real documents retrieved across all search queries (original + hypothetical).
`num_ranked`	`int`	Number of documents passed to the final answer generation step after re-ranking (capped by `max_final_docs`).

Methods

Method	Returns	Description
`to_dict()`	`Dict[str, Any]`	Serialises to a JSON-safe dictionary with `answer`, `hypothetical_docs_generated`, `num_retrieved`, and `num_ranked`.

Example

result = hyde.query("How does photosynthesis work?")

print(result.answer)
print(f"Generated {len(result.hypothetical_docs)} hypothetical docs")
print(f"Retrieved {result.num_retrieved} unique real docs, used {result.num_ranked} for generation")

`RecursiveRAGResult`

The structured return type of RecursiveRAG.query().

@dataclass
class RecursiveRAGResult:
    answer:        str
    depth:         int
    decomposed:    bool
    sub_questions: List[str]
    sub_answers:   List[SubAnswer]

Fields

Field	Type	Description
`answer`	`str`	The final synthesised answer — either a direct answer (if question was simple) or a merged answer from all sub-answers.
`depth`	`int`	The recursion depth at which this result was produced. `0` = top-level call.
`decomposed`	`bool`	`True` if the question was decomposed into sub-questions; `False` if answered directly.
`sub_questions`	`List[str]`	The sub-questions generated during decomposition. Empty if `decomposed=False`.
`sub_answers`	`List[SubAnswer]`	Internal `SubAnswer` records for each sub-question, each carrying `.question`, `.answer`, and `.depth`. Empty if `decomposed=False`.

Methods

Method	Returns	Description
`to_dict()`	`Dict[str, Any]`	Serialises to a JSON-safe dictionary with `answer`, `decomposed`, `depth`, and `sub_questions`.

Example

result = rec_rag.query("What are the causes, effects, and legacy of World War I?")

print(result.answer)
print(f"Decomposed: {result.decomposed}")
if result.decomposed:
    for i, q in enumerate(result.sub_questions, 1):
        print(f"  Sub-question {i}: {q}")

print(result.to_dict())

Configuration Reference

`SelfRAGConfig`

Controls the behaviour of the SelfIRAG iterative refinement loop.

from fennec_community.rag.types.self_improving_rag import SelfRAGConfig

config = SelfRAGConfig(
    max_refinement_iterations=3,
    confidence_threshold=0.80,
    enable_answer_refinement=True,
    enable_retrieval_refinement=True,
    llm_retries=2,
    fast_mode=True,
    skip_refinement_on_high_confidence=True
)

Fields

Field	Type	Default	Description
`max_refinement_iterations`	`int`	`3`	Hard cap on the number of retrieve-generate-evaluate cycles. The loop exits early if `confidence_threshold` is reached before this limit.
`confidence_threshold`	`float`	`0.80`	Minimum acceptable confidence score `[0.0, 1.0]`. Once any iteration's geometric-mean confidence meets or exceeds this value, the loop stops immediately and returns that answer.
`enable_answer_refinement`	`bool`	`True`	When `True`, the current answer is rewritten at the end of each non-final iteration using the LLM's identified issues and suggestions. Disable to skip answer rewriting and only adapt the retrieval query.
`enable_retrieval_refinement`	`bool`	`True`	When `True`, the retrieval query is expanded with keyword suggestions from the evaluator before the next iteration. Helps surface documents that the original query missed.
`llm_retries`	`int`	`2`	Number of retry attempts (with exponential back-off) for every LLM call in the pipeline. Total attempts = `llm_retries + 1`.
`fast_mode`	`bool`	`True`	When `True`, generation and evaluation are combined into a single LLM call per iteration (saving ~50% of LLM calls). When `False`, generation and evaluation are separate calls for potentially higher accuracy. Falls back to two-step on JSON parse failure.
`skip_refinement_on_high_confidence`	`bool`	`True`	Reserved flag. When `True`, a very high first-iteration confidence may short-circuit all subsequent iterations. (Implemented via the `confidence_threshold` check in the main loop.)

`RecursiveRAGConfig`

Controls the decomposition behaviour of RecursiveRAG.

from fennec_community.rag.types.self_improving_rag import RecursiveRAGConfig

config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=40,
    complexity_keywords=["and", "or", "causes", "compare", "why", "how"],
    llm_retries=2
)

Fields

Field	Type	Default	Description
`max_depth`	`int`	`3`	Maximum recursion depth. Prevents infinite decomposition loops. A query at depth `max_depth` is always answered directly, regardless of its apparent complexity.
`max_sub_questions`	`int`	`4`	Maximum number of sub-questions generated per decomposition. The LLM is instructed to produce at most this many sub-questions, and any excess is truncated.
`min_query_length`	`int`	`40`	Character-length threshold below which a query is never decomposed — short queries are assumed simple.
`complexity_keywords`	`List[str]`	See below	Arabic and English marker words that indicate a complex, multi-part query. A query is considered complex if it is longer than `min_query_length` AND contains at least two of these keywords.
`llm_retries`	`int`	`2`	Retry attempts for each LLM call (decomposition, direct answering, and merging).

Default complexity keywords: و، أو، ثم، بعد، قبل، أسباب، نتائج، مقارنة، الفرق، العلاقة، التأثير، كيف، لماذا، متى، أين (Arabic) and and, or, causes, effects, compare, difference, relationship, impact, how, why, when (English).

Module: `SelfIRAG`

SelfIRAG implements the Self-RAG paradigm: a closed-loop pipeline that retrieves documents, generates an answer, evaluates that answer on four dimensions (relevance, accuracy, completeness, clarity), and then refines both the answer and the retrieval query before repeating. The iteration with the highest confidence score is returned as the final result.

Retrieve → Generate → Evaluate → Refine Answer & Query → Repeat

The confidence score uses a geometric mean of the four evaluation dimensions, meaning a single weak dimension (e.g., low relevance) significantly lowers the overall score — a stricter measure than arithmetic mean.

SelfIRAG Constructor

SelfIRAG(
    rag_system: Any,
    llm: Any,
    config: Optional[SelfRAGConfig] = None
)

Initialises the Self-RAG system. Does not perform any retrieval or LLM calls at construction time.

Parameters

Parameter	Type	Default	Description
`rag_system`	`Any`	required	The underlying RAG backend. Must expose `retrieve(query: str, top_k: int) -> Any` and `add_document(text: str, metadata: dict) -> Any`. The `retrieve` output format is normalised automatically — supports `List[Tuple[DocumentChunk, float]]`, `List[Dict]`, and `{"results": [...]}`.
`llm`	`Any`	required	Any LLM object with a synchronous `generate(prompt: str) -> str` method. An optional `generate_async` method is used automatically if present. Passing a falsy value raises `ValueError`.
`config`	`SelfRAGConfig \| None`	`None`	Configuration object. A default `SelfRAGConfig()` is created if not provided.

Raises

ValueError — if llm is falsy (None, empty, etc.).

Example

from fennec_community.rag.types.self_improving_rag import SelfIRAG, SelfRAGConfig

config = SelfRAGConfig(
    max_refinement_iterations=4,
    confidence_threshold=0.85,
    fast_mode=True,
    llm_retries=3
)

rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=config)

`SelfIRAG.query`

query(query: str, context: Optional[Dict[str, Any]] = None) -> SelfRAGResult

The primary public method. Runs the full self-improvement loop: retrieves documents, generates an answer, evaluates it, refines both the answer and retrieval query, and repeats until either the confidence threshold is met or the iteration cap is reached. Returns the highest-confidence answer found across all iterations.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The natural-language question to answer. Works in Arabic and English. The answer language mirrors the query language per the embedded prompt instructions.
`context`	`Dict[str, Any] \| None`	`None`	Optional extra context dictionary forwarded to the RAG backend. Reserved for future use in the current implementation; the backend receives it via the retrieval call signature if supported.

Returns

SelfRAGResult — Contains the best answer, its confidence score, the number of iterations run, and the full iteration history.

Iteration loop logic:

Retrieve top-5 documents for current_query (starts as the original query).
If fast_mode=True: generate answer and evaluate in one LLM call. If False: two separate LLM calls.
Record the RefinementStep in history.
If confidence >= confidence_threshold: stop immediately and return the best step.
If more iterations remain and enable_answer_refinement=True: rewrite the answer using identified issues and suggestions.
If enable_retrieval_refinement=True and suggestions exist: append top-2 suggestion keywords to current_query for the next retrieval.
After all iterations: select the step with the highest confidence as the final answer.

Example

result = rag.query("ما هي أسباب تغير المناخ؟")

print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Ran {result.iterations} iteration(s)")
print(f"Quality improved: {result.initial_quality:.2f} → {result.final_quality:.2f}")

Performance notes:

With fast_mode=True: each iteration uses 1 LLM call (generate+evaluate combined).
With fast_mode=False: each iteration uses 2 LLM calls (generate, then evaluate).
With enable_answer_refinement=True: each non-final iteration adds 1 more LLM call (refine).
Total LLM calls (worst case, fast_mode=False, refinement enabled): iterations × 3.

`SelfIRAG.aquery`

async aquery(query: str, context: Optional[Dict[str, Any]] = None) -> SelfRAGResult

Async wrapper around query. Runs the entire synchronous pipeline in a thread pool via asyncio.to_thread, making it safe to call from async contexts (FastAPI endpoints, async scripts, Jupyter async cells) without blocking the event loop.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The natural-language question.
`context`	`Dict[str, Any] \| None`	`None`	Optional extra context.

Returns

SelfRAGResult — Identical to query().

Example

# FastAPI endpoint
from fastapi import FastAPI
app = FastAPI()

@app.get("/answer")
async def answer(q: str):
    result = await rag.aquery(q)
    return result.to_dict()

# Async script
import asyncio
result = asyncio.run(rag.aquery("Explain quantum entanglement"))
print(result.answer)

`SelfIRAG.add_document`

add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> Any

Adds a document to the underlying RAG backend's vector store. This is a convenience pass-through — equivalent to calling rag_system.add_document(text, metadata) directly.

Parameters

Parameter	Type	Default	Description
`text`	`str`	required	The raw document text to index.
`metadata`	`Dict[str, Any] \| None`	`None`	Optional metadata dictionary (e.g., `{"source": "wiki", "date": "2025-01"}`). Defaults to an empty dict `{}` if `None`.

Returns

Any — The return value of the backend's add_document method (backend-specific; often a document ID or chunk count).

Example

rag.add_document(
    "Deep learning is a subset of machine learning that uses neural networks.",
    metadata={"source": "textbook", "chapter": 3}
)

`SelfIRAG.get_stats`

get_stats() -> dict

Returns the current runtime configuration as a plain dictionary. Useful for logging, monitoring dashboards, and debugging configuration state.

Parameters

None.

Returns

dict with the following keys:

Key	Type	Description
`max_refinement_iterations`	`int`	Hard cap on iterations from `SelfRAGConfig`.
`confidence_threshold`	`float`	Target confidence score from `SelfRAGConfig`.
`enable_answer_refinement`	`bool`	Whether answer rewriting is enabled.
`enable_retrieval_refinement`	`bool`	Whether query expansion is enabled.
`llm_retries`	`int`	Retry count per LLM call.
`fast_mode`	`bool`	Whether single-call generate+evaluate mode is active.

Example

stats = rag.get_stats()
print(stats)
# {
#   'max_refinement_iterations': 3,
#   'confidence_threshold': 0.85,
#   'enable_answer_refinement': True,
#   'enable_retrieval_refinement': True,
#   'llm_retries': 2,
#   'fast_mode': True
# }

Module: `HyDERAG`

HyDERAG implements Hypothetical Document Embeddings (HyDE) — a retrieval strategy that bridges the semantic gap between a question and its answers. Instead of searching directly with the question, it asks the LLM to generate N synthetic documents that would answer the query, then retrieves real documents using each synthetic doc as a search vector. The intuition is that a hypothetical answer is semantically closer to real answer documents than the question itself.

Query → Generate N hypothetical docs (parallel)
      → Retrieve real docs per hypothetical + original query
      → Deduplicate → Re-rank by score → Generate final answer

Hypothetical documents are generated in parallel using a thread pool, reducing latency proportionally to the number of workers.

HyDERAG Constructor

HyDERAG(
    rag_system: Any,
    llm: Any,
    num_hypothetical_docs: int = 3,
    use_original_query: bool = True,
    top_k: int = 5,
    max_final_docs: int = 10,
    llm_retries: int = 2,
    max_workers: int = 4
)

Parameters

Parameter	Type	Default	Description
`rag_system`	`Any`	required	The underlying RAG backend. Must expose `retrieve(query, top_k)` and `add_document(text, metadata)`.
`llm`	`Any`	required	LLM object with `generate(prompt) -> str`. Passing falsy raises `ValueError`.
`num_hypothetical_docs`	`int`	`3`	Number of synthetic hypothetical documents to generate per query. More docs increase recall at the cost of more LLM calls and retrieval passes.
`use_original_query`	`bool`	`True`	When `True`, the original query is also used as a search vector alongside the hypothetical docs. Recommended — ensures direct matches are never missed.
`top_k`	`int`	`5`	Number of real documents to retrieve per search query (original + each hypothetical doc).
`max_final_docs`	`int`	`10`	Maximum number of unique documents (after deduplication and re-ranking) passed to the final answer generation step.
`llm_retries`	`int`	`2`	Retry attempts per LLM call.
`max_workers`	`int`	`4`	Maximum number of parallel threads for hypothetical document generation. Capped at `min(max_workers, num_hypothetical_docs)`.

Raises

ValueError — if llm is falsy.

Example

from fennec_community.rag.types.self_improving_rag import HyDERAG

hyde = HyDERAG(
    rag_system=my_rag,
    llm=my_llm,
    num_hypothetical_docs=4,
    use_original_query=True,
    top_k=7,
    max_final_docs=12,
    max_workers=4
)

`HyDERAG.query`

query(query: str, context: Optional[Dict[str, Any]] = None) -> HyDEResult

Executes the full HyDE retrieval-generation pipeline for a given query.

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The natural-language question.
`context`	`Dict[str, Any] \| None`	`None`	Optional extra context (reserved for future use).

Returns

HyDEResult — Contains the generated answer, the list of hypothetical documents, and retrieval statistics.

Pipeline steps:

Hypothetical document generation (parallel): The LLM is prompted to write a factual paragraph that would perfectly answer the query — as if authored by a domain expert, without referencing the question itself. All num_hypothetical_docs are generated concurrently. Any generation that produces fewer than 40 characters is discarded.
Multi-source retrieval: Real documents are retrieved for each hypothetical doc (and for the original query if use_original_query=True). Failed retrievals for individual queries are logged and skipped rather than aborting the pipeline.
Deduplication: Documents are de-duplicated using an MD5 hash of the first 120 characters. When the same document appears multiple times (via different search queries), the copy with the highest score is kept.
Re-ranking: Unique documents are sorted by descending score and trimmed to max_final_docs.
Answer generation: The top-5 re-ranked documents are used to build a context, and the LLM generates the final answer with strict anti-hallucination instructions (answer only from context; respond in the same language as the query).

Fallback behaviour: If all retrieval attempts fail (no documents returned), the answer is "There is insufficient information.".

Example

result = hyde.query("ما هي فوائد ممارسة الرياضة على الصحة النفسية؟")

print(result.answer)
print(f"\nGenerated {len(result.hypothetical_docs)} hypothetical documents:")
for i, doc in enumerate(result.hypothetical_docs, 1):
    print(f"  [{i}] {doc[:120]}...")
print(f"\nRetrieved: {result.num_retrieved} unique docs → used top {result.num_ranked}")

When to use HyDE vs SelfIRAG:

Use HyDE when queries are phrased very differently from document language (e.g., a question in a different register or vocabulary than the indexed knowledge base).
Use SelfIRAG when you need iterative quality verification and the LLM should evaluate and improve its own answers.

`HyDERAG.add_document`

add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> Any

Adds a document to the underlying RAG backend. Convenience pass-through identical in behaviour to SelfIRAG.add_document.

Parameters

Parameter	Type	Default	Description
`text`	`str`	required	Raw document text to index.
`metadata`	`Dict[str, Any] \| None`	`None`	Optional metadata. Defaults to `{}` if `None`.

Returns

Any — Backend-specific return value.

Example

hyde.add_document(
    "Exercise releases endorphins, which are natural mood elevators...",
    metadata={"source": "health_journal", "topic": "exercise"}
)

Module: `RecursiveRAG`

RecursiveRAG handles complex, multi-faceted queries by decomposing them into smaller, self-contained sub-questions and answering each one independently (recursively, if needed), then synthesising all sub-answers into a single coherent final answer.

Query → Complexity check
          ↓ (complex)              ↓ (simple or max_depth)
     Decompose into            Answer directly
     N sub-questions          (retrieve + generate)
          ↓
     Recurse each sub-question
          ↓
     Merge sub-answers into final answer

Complexity detection uses a dual heuristic: the query must both exceed min_query_length characters and contain at least two keywords from complexity_keywords. If either condition fails, the query is answered directly.

RecursiveRAG Constructor

RecursiveRAG(
    rag_system: Any,
    llm: Any,
    config: Optional[RecursiveRAGConfig] = None
)

Parameters

Parameter	Type	Default	Description
`rag_system`	`Any`	required	The underlying RAG backend with `retrieve(query, top_k)` and `add_document(text, metadata)`.
`llm`	`Any`	required	LLM object with `generate(prompt) -> str`. Falsy values raise `ValueError`.
`config`	`RecursiveRAGConfig \| None`	`None`	Decomposition configuration. A default `RecursiveRAGConfig()` is created if not provided.

Raises

ValueError — if llm is falsy.

Example

from fennec_community.rag.types.self_improving_rag import RecursiveRAG, RecursiveRAGConfig

config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=50,
    llm_retries=2
)

rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=config)

`RecursiveRAG.query`

query(
    query: str,
    context: Optional[Dict[str, Any]] = None,
    _depth: int = 0
) -> RecursiveRAGResult

Recursively answers the query. Simple queries (by length or keyword count) are answered directly via a single retrieve-and-generate pass. Complex queries are decomposed into sub-questions, each of which is recursively answered (possibly decomposed further up to max_depth), and the sub-answers are merged into a final answer via an LLM synthesis call.

⚠️ Important: The _depth parameter is an internal recursion counter. Do not set it manually when calling this method from application code. Always call with only query (and optionally context).

Parameters

Parameter	Type	Default	Description
`query`	`str`	required	The natural-language question. Can be simple or complex, in Arabic or English.
`context`	`Dict[str, Any] \| None`	`None`	Optional extra context forwarded through all recursive calls.
`_depth`	`int`	`0`	Internal use only. The current recursion depth. Do not set this parameter.

Returns

RecursiveRAGResult — Contains the final answer, the decomposition trace (sub_questions, sub_answers), and whether the question was decomposed.

Execution decision tree:

query(q, depth=D)
  ├─ IF D >= max_depth OR q is not complex → _answer_directly(q, depth=D)
  ├─ decompose(q) → sub_questions
  │    ├─ IF len(sub_questions) <= 1 → _answer_directly(q, depth=D)
  │    └─ FOR each sq IN sub_questions:
  │           sub_result = query(sq, depth=D+1)   ← recursive call
  └─ merge(q, all sub_results) → final answer

Complexity check details: A query is considered complex if len(query) >= min_query_length AND at least 2 words from complexity_keywords appear in the lowercased query.

Decomposition fallback: If the LLM returns malformed JSON for the sub-question list, a regex fallback parser extracts numbered or bulleted items from the raw LLM response.

Direct answer fallback: If retrieval returns no documents, the answer is "There is not enough information." and the result has decomposed=False.

Merge fallback: If the merge LLM call fails, sub-answers are concatenated with double newlines as the final answer.

Example

# Complex query — will be decomposed
result = rec_rag.query(
    "What are the causes and effects of climate change, "
    "and how do they compare to the impacts of deforestation?"
)
print(result.answer)
print(f"Decomposed: {result.decomposed}")
for i, q in enumerate(result.sub_questions, 1):
    print(f"  Sub-question {i}: {q}")

# Simple query — answered directly
result = rec_rag.query("What is photosynthesis?")
print(result.answer)
print(f"Decomposed: {result.decomposed}")  # False

Arabic example

result = rec_rag.query(
    "ما هي أسباب ونتائج الثورة الصناعية وكيف أثرت على المجتمع؟"
)
print(result.answer)
print(result.sub_questions)

When to use RecursiveRAG vs SelfIRAG:

Use RecursiveRAG for broad, multi-part questions where different parts require different retrieved context.
Use SelfIRAG for single-topic questions where the answer quality needs iterative verification.
For maximum robustness, consider chaining: RecursiveRAG for decomposition, with each leaf answered by SelfIRAG.

`RecursiveRAG.add_document`

add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> Any

Adds a document to the underlying RAG backend. Convenience pass-through identical in behaviour to SelfIRAG.add_document.

Parameters

Parameter	Type	Default	Description
`text`	`str`	required	Raw document text to index.
`metadata`	`Dict[str, Any] \| None`	`None`	Optional metadata dictionary. Defaults to `{}` if `None`.

Returns

Any — Backend-specific return value.

Example

rec_rag.add_document(
    "The Industrial Revolution began in Britain in the late 18th century...",
    metadata={"source": "history_encyclopedia", "era": "18th_century"}
)

Error Reference

All custom exceptions are importable from the module:

from fennec_community.rag.types.self_improving_rag import LLMCallError, RetrievalError, RAGSystemError

Exception	Parent	When Raised
`RAGSystemError`	`Exception`	Base class for all module errors.
`LLMCallError`	`RAGSystemError`	All retry attempts for an LLM `generate()` call have been exhausted. Carries the original exception message.
`RetrievalError`	`RAGSystemError`	The underlying RAG backend's `retrieve()` call raised an exception.

Error handling in practice:

The public methods (query, aquery, HyDERAG.query, RecursiveRAG.query) are designed to be non-raising for end users. All LLMCallError and RetrievalError exceptions are caught internally, logged, and converted to graceful fallback responses (e.g., "error in generating answer", "There is not enough information."). The exceptions are exposed for downstream integrations that want to implement custom error handling at a lower level.

Example of custom error handling at LLM level:

from fennec_community.rag.types.self_improving_rag import _call_llm, LLMCallError

try:
    response = _call_llm(my_llm, "Your prompt here", retries=3)
except LLMCallError as e:
    print(f"LLM permanently unavailable: {e}")
    # implement your own fallback

Performance & Tuning Guide

Choosing the Right Strategy

Scenario	Recommended Class	Config Tips
Single-topic Q&A, quality-critical	`SelfIRAG`	`fast_mode=True`, `max_refinement_iterations=2-4`, `confidence_threshold=0.80`
Cross-domain or vocabulary-gap queries	`HyDERAG`	`num_hypothetical_docs=3-5`, `use_original_query=True`
Multi-part complex questions	`RecursiveRAG`	`max_depth=2-3`, `max_sub_questions=3-4`
Multi-part + quality verification	Chain `RecursiveRAG` → `SelfIRAG` per leaf	Tune each independently

LLM Call Budget Estimates

Class	Min calls/query	Max calls/query	Notes
`SelfIRAG` (fast_mode=True)	1	`iterations × 2`	+1/iter for answer refinement
`SelfIRAG` (fast_mode=False)	2	`iterations × 3`	generate + evaluate + refine
`HyDERAG`	`N + 1`	`N + 1`	N hypothetical docs + 1 final answer
`RecursiveRAG`	1	exponential	Depth × branching factor

Reducing Latency

SelfIRAG: Set fast_mode=True (default). Set skip_refinement_on_high_confidence=True (default). Use aquery() in async environments.
HyDERAG: Hypothetical docs are already generated in parallel. Increase max_workers for very high num_hypothetical_docs values.
RecursiveRAG: Keep max_depth ≤ 3 and max_sub_questions ≤ 4. Simple queries bypass decomposition entirely.

Improving Answer Quality

SelfIRAG: Increase max_refinement_iterations and lower confidence_threshold for harder queries. Keep enable_answer_refinement=True and enable_retrieval_refinement=True.
HyDERAG: Increase num_hypothetical_docs for broader semantic coverage. Set use_original_query=True (default) to ensure direct keyword matches are not missed.
RecursiveRAG: Expand complexity_keywords for your domain. Lower min_query_length if short complex queries are common in your use case.

Full Integration Examples

Example 1 — `SelfIRAG` with Full Audit Logging

import json
from fennec_community.rag.types.self_improving_rag import SelfIRAG, SelfRAGConfig

config = SelfRAGConfig(
    max_refinement_iterations=4,
    confidence_threshold=0.82,
    enable_answer_refinement=True,
    enable_retrieval_refinement=True,
    fast_mode=True,
    llm_retries=3
)

rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=config)

# Index documents
rag.add_document("Neural networks are computing systems inspired by the brain...", {"source": "AI_textbook"})
rag.add_document("Deep learning uses multi-layered neural networks...", {"source": "ML_guide"})

# Query
result = rag.query("How does deep learning differ from classical machine learning?")

print("=== Answer ===")
print(result.answer)

print("\n=== Iteration History ===")
for step in result.steps:
    print(f"  Iteration {step.step}: confidence={step.confidence:.2f} | {step.reason}")

print(f"\nQuality delta: {result.initial_quality:.2f} → {result.final_quality:.2f}")
print(json.dumps(result.to_dict(), indent=2, ensure_ascii=False))

Example 2 — `HyDERAG` with Retrieval Statistics

from fennec_community.rag.types.self_improving_rag import HyDERAG

hyde = HyDERAG(
    rag_system=my_rag,
    llm=my_llm,
    num_hypothetical_docs=4,
    use_original_query=True,
    top_k=6,
    max_final_docs=12,
    max_workers=4
)

hyde.add_document("Photosynthesis converts light energy into chemical energy...", {"topic": "biology"})

result = hyde.query("Explain the process by which plants produce energy from sunlight.")

print(result.answer)
print(f"\n[Stats] hypothetical docs: {len(result.hypothetical_docs)}, "
      f"retrieved: {result.num_retrieved}, used: {result.num_ranked}")

print("\n[Hypothetical docs used as search vectors:]")
for i, doc in enumerate(result.hypothetical_docs, 1):
    print(f"  {i}. {doc[:150]}...")

Example 3 — `RecursiveRAG` for Complex Multi-Part Questions

from self_improving_rag import RecursiveRAG, RecursiveRAGConfig

config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=40,
    llm_retries=2
)

rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=config)

rec_rag.add_document("The Industrial Revolution began in Britain in the 1760s...", {"era": "18th_century"})
rec_rag.add_document("The effects of industrialisation included urbanisation...", {"era": "19th_century"})

result = rec_rag.query(
    "What were the causes and effects of the Industrial Revolution "
    "and how did they compare to the Agricultural Revolution?"
)

print(result.answer)

if result.decomposed:
    print(f"\nDecomposed into {len(result.sub_questions)} sub-questions:")
    for i, (q, a) in enumerate(zip(result.sub_questions, result.sub_answers), 1):
        print(f"\n  [{i}] Q: {q}")
        print(f"       A: {a.answer[:200]}...")

print(result.to_dict())

Example 4 — Async FastAPI Service with All Three Strategies

from fastapi import FastAPI
from fastapi.responses import JSONResponse
from fennec_community.rag.types.self_improving_rag import SelfIRAG, HyDERAG, RecursiveRAG, SelfRAGConfig

app = FastAPI()

self_rag  = SelfIRAG(my_rag, my_llm, config=SelfRAGConfig(fast_mode=True))
hyde_rag  = HyDERAG(my_rag, my_llm, num_hypothetical_docs=3)
rec_rag   = RecursiveRAG(my_rag, my_llm)

@app.get("/query/self")
async def self_answer(q: str):
    result = await self_rag.aquery(q)
    return result.to_dict()

@app.get("/query/hyde")
async def hyde_answer(q: str):
    import asyncio
    result = await asyncio.to_thread(hyde_rag.query, q)
    return result.to_dict()

@app.get("/query/recursive")
async def recursive_answer(q: str):
    import asyncio
    result = await asyncio.to_thread(rec_rag.query, q)
    return result.to_dict()

@app.get("/stats")
def stats():
    return self_rag.get_stats()

Example 5 — Chaining RecursiveRAG + SelfIRAG

import asyncio
from fennec_community.rag.types.self_improving_rag import RecursiveRAG, SelfIRAG, RecursiveRAGConfig, SelfRAGConfig

# Decompose with RecursiveRAG, then verify each leaf with SelfIRAG
rec = RecursiveRAG(my_rag, my_llm, config=RecursiveRAGConfig(max_depth=2, max_sub_questions=3))
self_rag = SelfIRAG(my_rag, my_llm, config=SelfRAGConfig(max_refinement_iterations=2, fast_mode=True))

complex_query = (
    "What are the causes and effects of climate change, "
    "and how do they relate to biodiversity loss?"
)

# Step 1: Decompose
rec_result = rec.query(complex_query)

if rec_result.decomposed:
    print(f"Decomposed into {len(rec_result.sub_questions)} sub-questions")

    # Step 2: Verify each sub-answer with SelfIRAG
    refined_answers = []
    for sq in rec_result.sub_questions:
        verified = self_rag.query(sq)
        refined_answers.append(f"Q: {sq}\nA: {verified.answer} (confidence={verified.confidence:.2f})")
        print(f"  ✓ '{sq[:60]}' → confidence={verified.confidence:.2f}")

    print("\n=== Refined composite answer ===")
    print("\n\n".join(refined_answers))
else:
    print(rec_result.answer)

Simple Real Example

from fennec_community.llm import GeminiInterface
from fennec_community.document_loaders import TextLoader 
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem 
from fennec_community.rag.types.self_improving_rag import SelfIRAG,   SelfRAGConfig   ,HyDERAG , RecursiveRAG , RecursiveRAGConfig


loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = GeminiInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)


# SelfIRAG
rag_system.add_documents(loader_1)
config = SelfRAGConfig(
    max_refinement_iterations=3,
    confidence_threshold=0.85,
    fast_mode=True,
)
self_rag = SelfIRAG(rag_system=rag_system, llm=llm, config=config)
result = self_rag.query("ما هي طرق الدفع المتاحة؟")
result

# HyDERAG
hyde = HyDERAG(
    rag_system=rag_system,
    llm=llm,
    num_hypothetical_docs=4,
    use_original_query=True,
    top_k=5,
    max_final_docs=12,
)
result = hyde.query("ما هي طرق الدفع المتاحة؟")
result

# RecursiveRAG
config = RecursiveRAGConfig(
    max_depth=3,
    max_sub_questions=4,
    min_query_length=40,
)
recursive_rag = RecursiveRAG(rag_system=rag_system, llm=llm, config=config)
result = recursive_rag.query("ما هي طرق الدفع المتاحة؟")
result

Source: community/rag/self_improving_rag.md

Table of Contents

Overview

Architecture & Design Philosophy

Installation & Quick Start

Data Classes Reference

SelfRAGResult

HyDEResult

RecursiveRAGResult

Configuration Reference

SelfRAGConfig

RecursiveRAGConfig

Module: SelfIRAG

SelfIRAG Constructor

SelfIRAG.query

SelfIRAG.aquery

SelfIRAG.add_document

SelfIRAG.get_stats

Module: HyDERAG

HyDERAG Constructor

HyDERAG.query

HyDERAG.add_document

Module: RecursiveRAG

RecursiveRAG Constructor

RecursiveRAG.query

RecursiveRAG.add_document

Error Reference

Performance & Tuning Guide

Choosing the Right Strategy

LLM Call Budget Estimates

Reducing Latency

Improving Answer Quality

Full Integration Examples

Example 1 — SelfIRAG with Full Audit Logging

Example 2 — HyDERAG with Retrieval Statistics

Example 3 — RecursiveRAG for Complex Multi-Part Questions

Example 4 — Async FastAPI Service with All Three Strategies

Example 5 — Chaining RecursiveRAG + SelfIRAG

Simple Real Example

`SelfRAGResult`

`HyDEResult`

`RecursiveRAGResult`

`SelfRAGConfig`

`RecursiveRAGConfig`

Module: `SelfIRAG`

`SelfIRAG.query`

`SelfIRAG.aquery`

`SelfIRAG.add_document`

`SelfIRAG.get_stats`

Module: `HyDERAG`

`HyDERAG.query`

`HyDERAG.add_document`

Module: `RecursiveRAG`

`RecursiveRAG.query`

`RecursiveRAG.add_document`

Example 1 — `SelfIRAG` with Full Audit Logging

Example 2 — `HyDERAG` with Retrieval Statistics

Example 3 — `RecursiveRAG` for Complex Multi-Part Questions