Self Improving Rag Modular
Table of Contents
- Overview
- Architecture & Design Philosophy
- Installation & Quick Start
- Data Classes Reference
- Configuration Reference
- Module:
SelfIRAG - Module:
HyDERAG - Module:
RecursiveRAG - Error Reference
- Performance & Tuning Guide
- Full Integration Examples
Overview
self_improving_rag is an advanced Retrieval-Augmented Generation (RAG) modular that implements three distinct self-improvement strategies on top of any existing RAG backend. Rather than a single retrieval-and-generate pass, each strategy adds an intelligence layer that makes the pipeline more accurate, context-aware, and robust.
| Class | Strategy | Best For |
|---|---|---|
SelfIRAG |
Iterative self-evaluation and refinement loop | High-stakes Q&A where answer quality must be verified and improved |
HyDERAG |
Hypothetical Document Embeddings | Queries where the gap between question phrasing and document language is large |
RecursiveRAG |
Recursive question decomposition | Complex, multi-faceted questions with distinct sub-topics |
All three classes share a common interface contract: they accept any RAG backend and LLM, produce structured typed result objects, and are fully compatible with both synchronous and asynchronous execution environments.
Architecture & Design Philosophy
┌─────────────────────────────────────────────────────────────────────────┐
│ self_improving_rag │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ SelfIRAG │ │ HyDERAG │ │ RecursiveRAG │ │
│ │ │ │ │ │ │ │
│ │ Retrieve │ │ Generate N │ │ Complexity check │ │
│ │ ↓ │ │ hypothetical │ │ ↓ │ │
│ │ Generate │ │ docs (parallel) │ │ Decompose into │ │
│ │ ↓ │ │ ↓ │ │ sub-questions │ │
│ │ Evaluate │ │ Retrieve per doc │ │ ↓ │ │
│ │ ↓ │ │ ↓ │ │ Recurse each sub- │ │
│ │ Refine query │ │ Deduplicate │ │ question (depth N) │ │
│ │ ↓ │ │ ↓ │ │ ↓ │ │
│ │ Repeat until │ │ Re-rank & answer │ │ Merge sub-answers │ │
│ │ threshold met │ │ │ │ │ │
│ └──────────────────┘ └──────────────────┘ └─────────────────────┘ │
│ │
│ Shared utilities: _call_llm, _retrieve_safe, _deduplicate │
└─────────────────────────────────────────────────────────────────────────┘Key design decisions:
- Backend-agnostic: All classes accept any
rag_systemobject that exposesretrieve(query, top_k)andadd_document(text, metadata). The internal_retrieve_safenormaliser handlesList[Tuple],List[Dict], and{results: [...]}output formats automatically. - LLM-agnostic: Any LLM object with a
generate(prompt) -> strmethod is supported. - Graceful degradation: Every LLM call uses exponential back-off retries; failures fall back to sensible defaults rather than raising unhandled exceptions to the caller.
- Typed results: All public methods return structured dataclass instances with
.to_dict()serialization support.
Installation & Quick Start
pip install numpy fennec_community # only external dependencyfrom fennec_community.rag.types.self_improving_rag import SelfIRAG, HyDERAG, RecursiveRAG
from fennec_community.rag.types.self_improving_rag import SelfRAGConfig, RecursiveRAGConfig
# ── SelfIRAG ──────────────────────────────────────
cfg = SelfRAGConfig(
max_refinement_iterations=3,
confidence_threshold=0.85,
fast_mode=True
)
self_rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=cfg)
result = self_rag.query("What is deep learning?")
print(result.answer, f"confidence={result.confidence:.2f}")
# ── HyDERAG ───────────────────────────────────────
hyde = HyDERAG(rag_system=my_rag, llm=my_llm, num_hypothetical_docs=3)
result = hyde.query("What are the benefits of exercise?")
print(result.answer)
# ── RecursiveRAG ──────────────────────────────────
rec_cfg = RecursiveRAGConfig(max_depth=3, max_sub_questions=4)
rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=rec_cfg)
result = rec_rag.query("What are the causes and effects of the Industrial Revolution?")
print(result.answer)
print(result.sub_questions)Data Classes Reference
The following dataclasses are returned by the public API methods. They are importable directly from the package:
from fennec_community.rag.types.self_improving_rag import SelfRAGResult, HyDEResult, RecursiveRAGResultSelfRAGResult
The structured return type of SelfIRAG.query() and SelfIRAG.aquery(). Contains the best answer found across all refinement iterations, along with full iteration history for auditing and debugging.
@dataclass
class SelfRAGResult:
answer: str
confidence: float
iterations: int
history: List[RefinementStep]Fields
| Field | Type | Description |
|---|---|---|
answer |
str |
The highest-confidence answer produced across all iterations. This is the answer from the RefinementStep with the maximum .confidence, not necessarily the last iteration. |
confidence |
float |
The geometric-mean confidence score [0.0, 1.0] of the best answer, computed as (relevance × accuracy × completeness × clarity)^0.25. |
iterations |
int |
Total number of refinement iterations that were actually executed. |
history |
List[RefinementStep] |
Full ordered list of all iteration records. Each RefinementStep has .iteration (int), .answer (str), .evaluation (EvaluationResult), and .confidence (float). |
Properties
| Property | Type | Description |
|---|---|---|
final_answer |
str |
Alias for answer. Provided for backward compatibility with example notebooks. |
refinement_steps |
int |
Alias for iterations. |
initial_quality |
float |
Confidence score of the first iteration (history[0].confidence). Useful for measuring improvement delta. |
final_quality |
float |
Confidence score of the last iteration (history[-1].confidence). |
steps |
List[RefinementStep] |
Enriched history list where each step also carries .step, .action, and .reason attributes for human-readable logging. |
Methods
| Method | Returns | Description |
|---|---|---|
to_dict() |
Dict[str, Any] |
Serialises the result to a JSON-safe dictionary. Includes answer, confidence, iterations, initial_quality, final_quality, and a compact history list. |
Example
result = self_rag.query("Explain transformer attention")
print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Improved from {result.initial_quality:.2f} → {result.final_quality:.2f} over {result.iterations} iterations")
for step in result.steps:
print(f" Step {step.step}: {step.reason}")
import json
print(json.dumps(result.to_dict(), indent=2))HyDEResult
The structured return type of HyDERAG.query().
@dataclass
class HyDEResult:
answer: str
hypothetical_docs: List[str]
num_retrieved: int
num_ranked: intFields
| Field | Type | Description |
|---|---|---|
answer |
str |
The final generated answer, grounded in the retrieved and re-ranked real documents. |
hypothetical_docs |
List[str] |
The list of synthetic "hypothetical" documents generated by the LLM. Useful for debugging and understanding what the LLM imagined the answer space to be. |
num_retrieved |
int |
Total number of unique real documents retrieved across all search queries (original + hypothetical). |
num_ranked |
int |
Number of documents passed to the final answer generation step after re-ranking (capped by max_final_docs). |
Methods
| Method | Returns | Description |
|---|---|---|
to_dict() |
Dict[str, Any] |
Serialises to a JSON-safe dictionary with answer, hypothetical_docs_generated, num_retrieved, and num_ranked. |
Example
result = hyde.query("How does photosynthesis work?")
print(result.answer)
print(f"Generated {len(result.hypothetical_docs)} hypothetical docs")
print(f"Retrieved {result.num_retrieved} unique real docs, used {result.num_ranked} for generation")RecursiveRAGResult
The structured return type of RecursiveRAG.query().
@dataclass
class RecursiveRAGResult:
answer: str
depth: int
decomposed: bool
sub_questions: List[str]
sub_answers: List[SubAnswer]Fields
| Field | Type | Description |
|---|---|---|
answer |
str |
The final synthesised answer — either a direct answer (if question was simple) or a merged answer from all sub-answers. |
depth |
int |
The recursion depth at which this result was produced. 0 = top-level call. |
decomposed |
bool |
True if the question was decomposed into sub-questions; False if answered directly. |
sub_questions |
List[str] |
The sub-questions generated during decomposition. Empty if decomposed=False. |
sub_answers |
List[SubAnswer] |
Internal SubAnswer records for each sub-question, each carrying .question, .answer, and .depth. Empty if decomposed=False. |
Methods
| Method | Returns | Description |
|---|---|---|
to_dict() |
Dict[str, Any] |
Serialises to a JSON-safe dictionary with answer, decomposed, depth, and sub_questions. |
Example
result = rec_rag.query("What are the causes, effects, and legacy of World War I?")
print(result.answer)
print(f"Decomposed: {result.decomposed}")
if result.decomposed:
for i, q in enumerate(result.sub_questions, 1):
print(f" Sub-question {i}: {q}")
print(result.to_dict())Configuration Reference
SelfRAGConfig
Controls the behaviour of the SelfIRAG iterative refinement loop.
from fennec_community.rag.types.self_improving_rag import SelfRAGConfig
config = SelfRAGConfig(
max_refinement_iterations=3,
confidence_threshold=0.80,
enable_answer_refinement=True,
enable_retrieval_refinement=True,
llm_retries=2,
fast_mode=True,
skip_refinement_on_high_confidence=True
)Fields
| Field | Type | Default | Description |
|---|---|---|---|
max_refinement_iterations |
int |
3 |
Hard cap on the number of retrieve-generate-evaluate cycles. The loop exits early if confidence_threshold is reached before this limit. |
confidence_threshold |
float |
0.80 |
Minimum acceptable confidence score [0.0, 1.0]. Once any iteration's geometric-mean confidence meets or exceeds this value, the loop stops immediately and returns that answer. |
enable_answer_refinement |
bool |
True |
When True, the current answer is rewritten at the end of each non-final iteration using the LLM's identified issues and suggestions. Disable to skip answer rewriting and only adapt the retrieval query. |
enable_retrieval_refinement |
bool |
True |
When True, the retrieval query is expanded with keyword suggestions from the evaluator before the next iteration. Helps surface documents that the original query missed. |
llm_retries |
int |
2 |
Number of retry attempts (with exponential back-off) for every LLM call in the pipeline. Total attempts = llm_retries + 1. |
fast_mode |
bool |
True |
When True, generation and evaluation are combined into a single LLM call per iteration (saving ~50% of LLM calls). When False, generation and evaluation are separate calls for potentially higher accuracy. Falls back to two-step on JSON parse failure. |
skip_refinement_on_high_confidence |
bool |
True |
Reserved flag. When True, a very high first-iteration confidence may short-circuit all subsequent iterations. (Implemented via the confidence_threshold check in the main loop.) |
RecursiveRAGConfig
Controls the decomposition behaviour of RecursiveRAG.
from fennec_community.rag.types.self_improving_rag import RecursiveRAGConfig
config = RecursiveRAGConfig(
max_depth=3,
max_sub_questions=4,
min_query_length=40,
complexity_keywords=["and", "or", "causes", "compare", "why", "how"],
llm_retries=2
)Fields
| Field | Type | Default | Description |
|---|---|---|---|
max_depth |
int |
3 |
Maximum recursion depth. Prevents infinite decomposition loops. A query at depth max_depth is always answered directly, regardless of its apparent complexity. |
max_sub_questions |
int |
4 |
Maximum number of sub-questions generated per decomposition. The LLM is instructed to produce at most this many sub-questions, and any excess is truncated. |
min_query_length |
int |
40 |
Character-length threshold below which a query is never decomposed — short queries are assumed simple. |
complexity_keywords |
List[str] |
See below | Arabic and English marker words that indicate a complex, multi-part query. A query is considered complex if it is longer than min_query_length AND contains at least two of these keywords. |
llm_retries |
int |
2 |
Retry attempts for each LLM call (decomposition, direct answering, and merging). |
Default complexity keywords:
و، أو، ثم، بعد، قبل، أسباب، نتائج، مقارنة، الفرق، العلاقة، التأثير، كيف، لماذا، متى، أين (Arabic) and and, or, causes, effects, compare, difference, relationship, impact, how, why, when (English).
Module: SelfIRAG
SelfIRAG implements the Self-RAG paradigm: a closed-loop pipeline that retrieves documents, generates an answer, evaluates that answer on four dimensions (relevance, accuracy, completeness, clarity), and then refines both the answer and the retrieval query before repeating. The iteration with the highest confidence score is returned as the final result.
Retrieve → Generate → Evaluate → Refine Answer & Query → RepeatThe confidence score uses a geometric mean of the four evaluation dimensions, meaning a single weak dimension (e.g., low relevance) significantly lowers the overall score — a stricter measure than arithmetic mean.
SelfIRAG Constructor
SelfIRAG(
rag_system: Any,
llm: Any,
config: Optional[SelfRAGConfig] = None
)Initialises the Self-RAG system. Does not perform any retrieval or LLM calls at construction time.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
rag_system |
Any |
required | The underlying RAG backend. Must expose retrieve(query: str, top_k: int) -> Any and add_document(text: str, metadata: dict) -> Any. The retrieve output format is normalised automatically — supports List[Tuple[DocumentChunk, float]], List[Dict], and {"results": [...]}. |
llm |
Any |
required | Any LLM object with a synchronous generate(prompt: str) -> str method. An optional generate_async method is used automatically if present. Passing a falsy value raises ValueError. |
config |
SelfRAGConfig | None |
None |
Configuration object. A default SelfRAGConfig() is created if not provided. |
Raises
ValueError— ifllmis falsy (None, empty, etc.).
Example
from fennec_community.rag.types.self_improving_rag import SelfIRAG, SelfRAGConfig
config = SelfRAGConfig(
max_refinement_iterations=4,
confidence_threshold=0.85,
fast_mode=True,
llm_retries=3
)
rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=config)SelfIRAG.query
query(query: str, context: Optional[Dict[str, Any]] = None) -> SelfRAGResultThe primary public method. Runs the full self-improvement loop: retrieves documents, generates an answer, evaluates it, refines both the answer and retrieval query, and repeats until either the confidence threshold is met or the iteration cap is reached. Returns the highest-confidence answer found across all iterations.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The natural-language question to answer. Works in Arabic and English. The answer language mirrors the query language per the embedded prompt instructions. |
context |
Dict[str, Any] | None |
None |
Optional extra context dictionary forwarded to the RAG backend. Reserved for future use in the current implementation; the backend receives it via the retrieval call signature if supported. |
Returns
SelfRAGResult — Contains the best answer, its confidence score, the number of iterations run, and the full iteration history.
Iteration loop logic:
- Retrieve top-5 documents for
current_query(starts as the original query). - If
fast_mode=True: generate answer and evaluate in one LLM call. IfFalse: two separate LLM calls. - Record the
RefinementStepinhistory. - If
confidence >= confidence_threshold: stop immediately and return the best step. - If more iterations remain and
enable_answer_refinement=True: rewrite the answer using identified issues and suggestions. - If
enable_retrieval_refinement=Trueand suggestions exist: append top-2 suggestion keywords tocurrent_queryfor the next retrieval. - After all iterations: select the step with the highest confidence as the final answer.
Example
result = rag.query("ما هي أسباب تغير المناخ؟")
print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Ran {result.iterations} iteration(s)")
print(f"Quality improved: {result.initial_quality:.2f} → {result.final_quality:.2f}")Performance notes:
- With
fast_mode=True: each iteration uses 1 LLM call (generate+evaluate combined). - With
fast_mode=False: each iteration uses 2 LLM calls (generate, then evaluate). - With
enable_answer_refinement=True: each non-final iteration adds 1 more LLM call (refine). - Total LLM calls (worst case,
fast_mode=False, refinement enabled):iterations × 3.
SelfIRAG.aquery
async aquery(query: str, context: Optional[Dict[str, Any]] = None) -> SelfRAGResultAsync wrapper around query. Runs the entire synchronous pipeline in a thread pool via asyncio.to_thread, making it safe to call from async contexts (FastAPI endpoints, async scripts, Jupyter async cells) without blocking the event loop.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The natural-language question. |
context |
Dict[str, Any] | None |
None |
Optional extra context. |
Returns
SelfRAGResult — Identical to query().
Example
# FastAPI endpoint
from fastapi import FastAPI
app = FastAPI()
@app.get("/answer")
async def answer(q: str):
result = await rag.aquery(q)
return result.to_dict()
# Async script
import asyncio
result = asyncio.run(rag.aquery("Explain quantum entanglement"))
print(result.answer)SelfIRAG.add_document
add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> AnyAdds a document to the underlying RAG backend's vector store. This is a convenience pass-through — equivalent to calling rag_system.add_document(text, metadata) directly.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
required | The raw document text to index. |
metadata |
Dict[str, Any] | None |
None |
Optional metadata dictionary (e.g., {"source": "wiki", "date": "2025-01"}). Defaults to an empty dict {} if None. |
Returns
Any — The return value of the backend's add_document method (backend-specific; often a document ID or chunk count).
Example
rag.add_document(
"Deep learning is a subset of machine learning that uses neural networks.",
metadata={"source": "textbook", "chapter": 3}
)SelfIRAG.get_stats
get_stats() -> dictReturns the current runtime configuration as a plain dictionary. Useful for logging, monitoring dashboards, and debugging configuration state.
Parameters
None.
Returns
dict with the following keys:
| Key | Type | Description |
|---|---|---|
max_refinement_iterations |
int |
Hard cap on iterations from SelfRAGConfig. |
confidence_threshold |
float |
Target confidence score from SelfRAGConfig. |
enable_answer_refinement |
bool |
Whether answer rewriting is enabled. |
enable_retrieval_refinement |
bool |
Whether query expansion is enabled. |
llm_retries |
int |
Retry count per LLM call. |
fast_mode |
bool |
Whether single-call generate+evaluate mode is active. |
Example
stats = rag.get_stats()
print(stats)
# {
# 'max_refinement_iterations': 3,
# 'confidence_threshold': 0.85,
# 'enable_answer_refinement': True,
# 'enable_retrieval_refinement': True,
# 'llm_retries': 2,
# 'fast_mode': True
# }Module: HyDERAG
HyDERAG implements Hypothetical Document Embeddings (HyDE) — a retrieval strategy that bridges the semantic gap between a question and its answers. Instead of searching directly with the question, it asks the LLM to generate N synthetic documents that would answer the query, then retrieves real documents using each synthetic doc as a search vector. The intuition is that a hypothetical answer is semantically closer to real answer documents than the question itself.
Query → Generate N hypothetical docs (parallel)
→ Retrieve real docs per hypothetical + original query
→ Deduplicate → Re-rank by score → Generate final answerHypothetical documents are generated in parallel using a thread pool, reducing latency proportionally to the number of workers.
HyDERAG Constructor
HyDERAG(
rag_system: Any,
llm: Any,
num_hypothetical_docs: int = 3,
use_original_query: bool = True,
top_k: int = 5,
max_final_docs: int = 10,
llm_retries: int = 2,
max_workers: int = 4
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
rag_system |
Any |
required | The underlying RAG backend. Must expose retrieve(query, top_k) and add_document(text, metadata). |
llm |
Any |
required | LLM object with generate(prompt) -> str. Passing falsy raises ValueError. |
num_hypothetical_docs |
int |
3 |
Number of synthetic hypothetical documents to generate per query. More docs increase recall at the cost of more LLM calls and retrieval passes. |
use_original_query |
bool |
True |
When True, the original query is also used as a search vector alongside the hypothetical docs. Recommended — ensures direct matches are never missed. |
top_k |
int |
5 |
Number of real documents to retrieve per search query (original + each hypothetical doc). |
max_final_docs |
int |
10 |
Maximum number of unique documents (after deduplication and re-ranking) passed to the final answer generation step. |
llm_retries |
int |
2 |
Retry attempts per LLM call. |
max_workers |
int |
4 |
Maximum number of parallel threads for hypothetical document generation. Capped at min(max_workers, num_hypothetical_docs). |
Raises
ValueError— ifllmis falsy.
Example
from fennec_community.rag.types.self_improving_rag import HyDERAG
hyde = HyDERAG(
rag_system=my_rag,
llm=my_llm,
num_hypothetical_docs=4,
use_original_query=True,
top_k=7,
max_final_docs=12,
max_workers=4
)HyDERAG.query
query(query: str, context: Optional[Dict[str, Any]] = None) -> HyDEResultExecutes the full HyDE retrieval-generation pipeline for a given query.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The natural-language question. |
context |
Dict[str, Any] | None |
None |
Optional extra context (reserved for future use). |
Returns
HyDEResult — Contains the generated answer, the list of hypothetical documents, and retrieval statistics.
Pipeline steps:
- Hypothetical document generation (parallel): The LLM is prompted to write a factual paragraph that would perfectly answer the query — as if authored by a domain expert, without referencing the question itself. All
num_hypothetical_docsare generated concurrently. Any generation that produces fewer than 40 characters is discarded. - Multi-source retrieval: Real documents are retrieved for each hypothetical doc (and for the original query if
use_original_query=True). Failed retrievals for individual queries are logged and skipped rather than aborting the pipeline. - Deduplication: Documents are de-duplicated using an MD5 hash of the first 120 characters. When the same document appears multiple times (via different search queries), the copy with the highest score is kept.
- Re-ranking: Unique documents are sorted by descending score and trimmed to
max_final_docs. - Answer generation: The top-5 re-ranked documents are used to build a context, and the LLM generates the final answer with strict anti-hallucination instructions (answer only from context; respond in the same language as the query).
Fallback behaviour: If all retrieval attempts fail (no documents returned), the answer is "There is insufficient information.".
Example
result = hyde.query("ما هي فوائد ممارسة الرياضة على الصحة النفسية؟")
print(result.answer)
print(f"\nGenerated {len(result.hypothetical_docs)} hypothetical documents:")
for i, doc in enumerate(result.hypothetical_docs, 1):
print(f" [{i}] {doc[:120]}...")
print(f"\nRetrieved: {result.num_retrieved} unique docs → used top {result.num_ranked}")When to use HyDE vs SelfIRAG:
- Use HyDE when queries are phrased very differently from document language (e.g., a question in a different register or vocabulary than the indexed knowledge base).
- Use SelfIRAG when you need iterative quality verification and the LLM should evaluate and improve its own answers.
HyDERAG.add_document
add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> AnyAdds a document to the underlying RAG backend. Convenience pass-through identical in behaviour to SelfIRAG.add_document.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
required | Raw document text to index. |
metadata |
Dict[str, Any] | None |
None |
Optional metadata. Defaults to {} if None. |
Returns
Any — Backend-specific return value.
Example
hyde.add_document(
"Exercise releases endorphins, which are natural mood elevators...",
metadata={"source": "health_journal", "topic": "exercise"}
)Module: RecursiveRAG
RecursiveRAG handles complex, multi-faceted queries by decomposing them into smaller, self-contained sub-questions and answering each one independently (recursively, if needed), then synthesising all sub-answers into a single coherent final answer.
Query → Complexity check
↓ (complex) ↓ (simple or max_depth)
Decompose into Answer directly
N sub-questions (retrieve + generate)
↓
Recurse each sub-question
↓
Merge sub-answers into final answerComplexity detection uses a dual heuristic: the query must both exceed min_query_length characters and contain at least two keywords from complexity_keywords. If either condition fails, the query is answered directly.
RecursiveRAG Constructor
RecursiveRAG(
rag_system: Any,
llm: Any,
config: Optional[RecursiveRAGConfig] = None
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
rag_system |
Any |
required | The underlying RAG backend with retrieve(query, top_k) and add_document(text, metadata). |
llm |
Any |
required | LLM object with generate(prompt) -> str. Falsy values raise ValueError. |
config |
RecursiveRAGConfig | None |
None |
Decomposition configuration. A default RecursiveRAGConfig() is created if not provided. |
Raises
ValueError— ifllmis falsy.
Example
from fennec_community.rag.types.self_improving_rag import RecursiveRAG, RecursiveRAGConfig
config = RecursiveRAGConfig(
max_depth=3,
max_sub_questions=4,
min_query_length=50,
llm_retries=2
)
rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=config)RecursiveRAG.query
query(
query: str,
context: Optional[Dict[str, Any]] = None,
_depth: int = 0
) -> RecursiveRAGResultRecursively answers the query. Simple queries (by length or keyword count) are answered directly via a single retrieve-and-generate pass. Complex queries are decomposed into sub-questions, each of which is recursively answered (possibly decomposed further up to max_depth), and the sub-answers are merged into a final answer via an LLM synthesis call.
⚠️ Important: The
_depthparameter is an internal recursion counter. Do not set it manually when calling this method from application code. Always call with onlyquery(and optionallycontext).
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The natural-language question. Can be simple or complex, in Arabic or English. |
context |
Dict[str, Any] | None |
None |
Optional extra context forwarded through all recursive calls. |
_depth |
int |
0 |
Internal use only. The current recursion depth. Do not set this parameter. |
Returns
RecursiveRAGResult — Contains the final answer, the decomposition trace (sub_questions, sub_answers), and whether the question was decomposed.
Execution decision tree:
query(q, depth=D)
├─ IF D >= max_depth OR q is not complex → _answer_directly(q, depth=D)
├─ decompose(q) → sub_questions
│ ├─ IF len(sub_questions) <= 1 → _answer_directly(q, depth=D)
│ └─ FOR each sq IN sub_questions:
│ sub_result = query(sq, depth=D+1) ← recursive call
└─ merge(q, all sub_results) → final answerComplexity check details: A query is considered complex if len(query) >= min_query_length AND at least 2 words from complexity_keywords appear in the lowercased query.
Decomposition fallback: If the LLM returns malformed JSON for the sub-question list, a regex fallback parser extracts numbered or bulleted items from the raw LLM response.
Direct answer fallback: If retrieval returns no documents, the answer is "There is not enough information." and the result has decomposed=False.
Merge fallback: If the merge LLM call fails, sub-answers are concatenated with double newlines as the final answer.
Example
# Complex query — will be decomposed
result = rec_rag.query(
"What are the causes and effects of climate change, "
"and how do they compare to the impacts of deforestation?"
)
print(result.answer)
print(f"Decomposed: {result.decomposed}")
for i, q in enumerate(result.sub_questions, 1):
print(f" Sub-question {i}: {q}")
# Simple query — answered directly
result = rec_rag.query("What is photosynthesis?")
print(result.answer)
print(f"Decomposed: {result.decomposed}") # FalseArabic example
result = rec_rag.query(
"ما هي أسباب ونتائج الثورة الصناعية وكيف أثرت على المجتمع؟"
)
print(result.answer)
print(result.sub_questions)When to use RecursiveRAG vs SelfIRAG:
- Use RecursiveRAG for broad, multi-part questions where different parts require different retrieved context.
- Use SelfIRAG for single-topic questions where the answer quality needs iterative verification.
- For maximum robustness, consider chaining:
RecursiveRAGfor decomposition, with each leaf answered bySelfIRAG.
RecursiveRAG.add_document
add_document(text: str, metadata: Optional[Dict[str, Any]] = None) -> AnyAdds a document to the underlying RAG backend. Convenience pass-through identical in behaviour to SelfIRAG.add_document.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
required | Raw document text to index. |
metadata |
Dict[str, Any] | None |
None |
Optional metadata dictionary. Defaults to {} if None. |
Returns
Any — Backend-specific return value.
Example
rec_rag.add_document(
"The Industrial Revolution began in Britain in the late 18th century...",
metadata={"source": "history_encyclopedia", "era": "18th_century"}
)Error Reference
All custom exceptions are importable from the module:
from fennec_community.rag.types.self_improving_rag import LLMCallError, RetrievalError, RAGSystemError| Exception | Parent | When Raised |
|---|---|---|
RAGSystemError |
Exception |
Base class for all module errors. |
LLMCallError |
RAGSystemError |
All retry attempts for an LLM generate() call have been exhausted. Carries the original exception message. |
RetrievalError |
RAGSystemError |
The underlying RAG backend's retrieve() call raised an exception. |
Error handling in practice:
The public methods (query, aquery, HyDERAG.query, RecursiveRAG.query) are designed to be non-raising for end users. All LLMCallError and RetrievalError exceptions are caught internally, logged, and converted to graceful fallback responses (e.g., "error in generating answer", "There is not enough information."). The exceptions are exposed for downstream integrations that want to implement custom error handling at a lower level.
Example of custom error handling at LLM level:
from fennec_community.rag.types.self_improving_rag import _call_llm, LLMCallError
try:
response = _call_llm(my_llm, "Your prompt here", retries=3)
except LLMCallError as e:
print(f"LLM permanently unavailable: {e}")
# implement your own fallbackPerformance & Tuning Guide
Choosing the Right Strategy
| Scenario | Recommended Class | Config Tips |
|---|---|---|
| Single-topic Q&A, quality-critical | SelfIRAG |
fast_mode=True, max_refinement_iterations=2-4, confidence_threshold=0.80 |
| Cross-domain or vocabulary-gap queries | HyDERAG |
num_hypothetical_docs=3-5, use_original_query=True |
| Multi-part complex questions | RecursiveRAG |
max_depth=2-3, max_sub_questions=3-4 |
| Multi-part + quality verification | Chain RecursiveRAG → SelfIRAG per leaf |
Tune each independently |
LLM Call Budget Estimates
| Class | Min calls/query | Max calls/query | Notes |
|---|---|---|---|
SelfIRAG (fast_mode=True) |
1 | iterations × 2 |
+1/iter for answer refinement |
SelfIRAG (fast_mode=False) |
2 | iterations × 3 |
generate + evaluate + refine |
HyDERAG |
N + 1 |
N + 1 |
N hypothetical docs + 1 final answer |
RecursiveRAG |
1 | exponential | Depth × branching factor |
Reducing Latency
SelfIRAG: Setfast_mode=True(default). Setskip_refinement_on_high_confidence=True(default). Useaquery()in async environments.HyDERAG: Hypothetical docs are already generated in parallel. Increasemax_workersfor very highnum_hypothetical_docsvalues.RecursiveRAG: Keepmax_depth ≤ 3andmax_sub_questions ≤ 4. Simple queries bypass decomposition entirely.
Improving Answer Quality
SelfIRAG: Increasemax_refinement_iterationsand lowerconfidence_thresholdfor harder queries. Keepenable_answer_refinement=Trueandenable_retrieval_refinement=True.HyDERAG: Increasenum_hypothetical_docsfor broader semantic coverage. Setuse_original_query=True(default) to ensure direct keyword matches are not missed.RecursiveRAG: Expandcomplexity_keywordsfor your domain. Lowermin_query_lengthif short complex queries are common in your use case.
Full Integration Examples
Example 1 — SelfIRAG with Full Audit Logging
import json
from fennec_community.rag.types.self_improving_rag import SelfIRAG, SelfRAGConfig
config = SelfRAGConfig(
max_refinement_iterations=4,
confidence_threshold=0.82,
enable_answer_refinement=True,
enable_retrieval_refinement=True,
fast_mode=True,
llm_retries=3
)
rag = SelfIRAG(rag_system=my_rag, llm=my_llm, config=config)
# Index documents
rag.add_document("Neural networks are computing systems inspired by the brain...", {"source": "AI_textbook"})
rag.add_document("Deep learning uses multi-layered neural networks...", {"source": "ML_guide"})
# Query
result = rag.query("How does deep learning differ from classical machine learning?")
print("=== Answer ===")
print(result.answer)
print("\n=== Iteration History ===")
for step in result.steps:
print(f" Iteration {step.step}: confidence={step.confidence:.2f} | {step.reason}")
print(f"\nQuality delta: {result.initial_quality:.2f} → {result.final_quality:.2f}")
print(json.dumps(result.to_dict(), indent=2, ensure_ascii=False))Example 2 — HyDERAG with Retrieval Statistics
from fennec_community.rag.types.self_improving_rag import HyDERAG
hyde = HyDERAG(
rag_system=my_rag,
llm=my_llm,
num_hypothetical_docs=4,
use_original_query=True,
top_k=6,
max_final_docs=12,
max_workers=4
)
hyde.add_document("Photosynthesis converts light energy into chemical energy...", {"topic": "biology"})
result = hyde.query("Explain the process by which plants produce energy from sunlight.")
print(result.answer)
print(f"\n[Stats] hypothetical docs: {len(result.hypothetical_docs)}, "
f"retrieved: {result.num_retrieved}, used: {result.num_ranked}")
print("\n[Hypothetical docs used as search vectors:]")
for i, doc in enumerate(result.hypothetical_docs, 1):
print(f" {i}. {doc[:150]}...")Example 3 — RecursiveRAG for Complex Multi-Part Questions
from self_improving_rag import RecursiveRAG, RecursiveRAGConfig
config = RecursiveRAGConfig(
max_depth=3,
max_sub_questions=4,
min_query_length=40,
llm_retries=2
)
rec_rag = RecursiveRAG(rag_system=my_rag, llm=my_llm, config=config)
rec_rag.add_document("The Industrial Revolution began in Britain in the 1760s...", {"era": "18th_century"})
rec_rag.add_document("The effects of industrialisation included urbanisation...", {"era": "19th_century"})
result = rec_rag.query(
"What were the causes and effects of the Industrial Revolution "
"and how did they compare to the Agricultural Revolution?"
)
print(result.answer)
if result.decomposed:
print(f"\nDecomposed into {len(result.sub_questions)} sub-questions:")
for i, (q, a) in enumerate(zip(result.sub_questions, result.sub_answers), 1):
print(f"\n [{i}] Q: {q}")
print(f" A: {a.answer[:200]}...")
print(result.to_dict())Example 4 — Async FastAPI Service with All Three Strategies
from fastapi import FastAPI
from fastapi.responses import JSONResponse
from fennec_community.rag.types.self_improving_rag import SelfIRAG, HyDERAG, RecursiveRAG, SelfRAGConfig
app = FastAPI()
self_rag = SelfIRAG(my_rag, my_llm, config=SelfRAGConfig(fast_mode=True))
hyde_rag = HyDERAG(my_rag, my_llm, num_hypothetical_docs=3)
rec_rag = RecursiveRAG(my_rag, my_llm)
@app.get("/query/self")
async def self_answer(q: str):
result = await self_rag.aquery(q)
return result.to_dict()
@app.get("/query/hyde")
async def hyde_answer(q: str):
import asyncio
result = await asyncio.to_thread(hyde_rag.query, q)
return result.to_dict()
@app.get("/query/recursive")
async def recursive_answer(q: str):
import asyncio
result = await asyncio.to_thread(rec_rag.query, q)
return result.to_dict()
@app.get("/stats")
def stats():
return self_rag.get_stats()Example 5 — Chaining RecursiveRAG + SelfIRAG
import asyncio
from fennec_community.rag.types.self_improving_rag import RecursiveRAG, SelfIRAG, RecursiveRAGConfig, SelfRAGConfig
# Decompose with RecursiveRAG, then verify each leaf with SelfIRAG
rec = RecursiveRAG(my_rag, my_llm, config=RecursiveRAGConfig(max_depth=2, max_sub_questions=3))
self_rag = SelfIRAG(my_rag, my_llm, config=SelfRAGConfig(max_refinement_iterations=2, fast_mode=True))
complex_query = (
"What are the causes and effects of climate change, "
"and how do they relate to biodiversity loss?"
)
# Step 1: Decompose
rec_result = rec.query(complex_query)
if rec_result.decomposed:
print(f"Decomposed into {len(rec_result.sub_questions)} sub-questions")
# Step 2: Verify each sub-answer with SelfIRAG
refined_answers = []
for sq in rec_result.sub_questions:
verified = self_rag.query(sq)
refined_answers.append(f"Q: {sq}\nA: {verified.answer} (confidence={verified.confidence:.2f})")
print(f" ✓ '{sq[:60]}' → confidence={verified.confidence:.2f}")
print("\n=== Refined composite answer ===")
print("\n\n".join(refined_answers))
else:
print(rec_result.answer)Simple Real Example
from fennec_community.llm import GeminiInterface
from fennec_community.document_loaders import TextLoader
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem
from fennec_community.rag.types.self_improving_rag import SelfIRAG, SelfRAGConfig ,HyDERAG , RecursiveRAG , RecursiveRAGConfig
loader_1 = TextLoader("./data_kn/faq.txt").load()
chunker = ArabicTextChunker(chunk_size=100, overlap=20)
embedder = OllamaEmbedder()
vector_db = FAISSVectorDatabase(embedder=embedder)
llm = GeminiInterface(api_key=llm_api)
context_manager = ContextManager()
rag_system = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
# SelfIRAG
rag_system.add_documents(loader_1)
config = SelfRAGConfig(
max_refinement_iterations=3,
confidence_threshold=0.85,
fast_mode=True,
)
self_rag = SelfIRAG(rag_system=rag_system, llm=llm, config=config)
result = self_rag.query("ما هي طرق الدفع المتاحة؟")
result
# HyDERAG
hyde = HyDERAG(
rag_system=rag_system,
llm=llm,
num_hypothetical_docs=4,
use_original_query=True,
top_k=5,
max_final_docs=12,
)
result = hyde.query("ما هي طرق الدفع المتاحة؟")
result
# RecursiveRAG
config = RecursiveRAGConfig(
max_depth=3,
max_sub_questions=4,
min_query_length=40,
)
recursive_rag = RecursiveRAG(rag_system=rag_system, llm=llm, config=config)
result = recursive_rag.query("ما هي طرق الدفع المتاحة؟")
result
community/rag/self_improving_rag.md