Agentic RAG Module — Public API Reference
Table of Contents
- Module Overview
- AgenticConfig
- AgenticRAG
- AgenticResult
- ActionStep
- Document
- TTLCache
- Enumerations
- Exception Hierarchy
- Agent Decision Flow
- Quick-Start Example
1. Module Overview
The agentic_rag package wraps any standard RAG backend (e.g., the core.RAGSystem) with an autonomous decision-making loop. Instead of executing a single retrieve-and-generate pass, the agent iteratively decides what to do next based on the quality of information collected so far — retrieve more documents, refine the query, verify sufficiency, self-correct the answer, or stop.
Key capabilities:
- Autonomous decision loop — up to
max_iterationssteps, each selecting the optimal action. - Query refinement — LLM rewrites weak queries given current low-confidence documents.
- Sufficiency checking — heuristic-first, LLM-fallback validation before committing to generation.
- Self-correction — post-generation LLM pass to strip unsupported claims.
- Heterogeneous backend normalisation — accepts
Document,Tuple[chunk, score], anddictformats from any retriever. - Two-level caching — TTL retrieval cache + in-memory result cache.
- Full async support — including token-level streaming generation.
- Action callback — fire an
on_actionhook after every decision step for tracing or UI streaming.
Publicly exported symbols:
from fennec_community.rag.types.agentic_rag import AgenticRAG, AgenticConfig, AgentAction, ReasoningDepth2. AgenticConfig
from fennec_community.rag.types.agentic_rag import AgenticConfigAgenticConfig is a dataclass holding all tunable parameters for an AgenticRAG instance. Every field has a production-ready default.
Constructor
AgenticConfig(
max_iterations: int = 5,
min_confidence: float = 0.7,
min_docs_required: int = 2,
enable_query_refinement: bool = True,
enable_self_correction: bool = True,
enable_sufficiency_check: bool = True,
reasoning_depth: ReasoningDepth = ReasoningDepth.MEDIUM,
cache_ttl_seconds: int = 300,
retry_attempts: int = 2,
retry_backoff_base: float = 0.5,
)| Parameter | Type | Default | Description |
|---|---|---|---|
max_iterations |
int |
5 |
Hard cap on the number of agent decision cycles. The agent always generates an answer before hitting this limit. |
min_confidence |
float |
0.7 |
Minimum weighted average relevance score required to consider retrieved documents sufficient. |
min_docs_required |
int |
2 |
Minimum number of documents that must be retrieved before the sufficiency check can return True. |
enable_query_refinement |
bool |
True |
Allow the agent to rewrite low-quality queries using the LLM before re-retrieving. |
enable_self_correction |
bool |
True |
After generation, ask the LLM to validate and clean the answer against the retrieved context. |
enable_sufficiency_check |
bool |
True |
Every even-numbered iteration, ask whether the current documents are sufficient to answer. |
reasoning_depth |
ReasoningDepth |
MEDIUM |
Controls the verbosity and complexity of LLM prompts. |
cache_ttl_seconds |
int |
300 |
Lifetime in seconds for cached retrieval results. Set to 0 to disable caching. |
retry_attempts |
int |
2 |
Number of additional retries for retrieval and generation calls before giving up. |
retry_backoff_base |
float |
0.5 |
Base seconds for exponential back-off between retries (base * 2^attempt). |
Example:
config = AgenticConfig(
max_iterations=6,
min_confidence=0.75,
enable_self_correction=True,
reasoning_depth=ReasoningDepth.DEEP,
cache_ttl_seconds=600,
)3. AgenticRAG
from fennec_community.rag.types.agentic_rag import AgenticRAGAgenticRAG is the main class of the package. It wraps any RAG backend and adds an autonomous agent loop that adaptively retrieves, refines, validates, and generates answers.
3.1 Constructor
AgenticRAG(
rag_system: Any,
config: Optional[AgenticConfig] = None,
llm: Optional[Any] = None,
on_action: Optional[Callable[[ActionStep], None]] = None,
)Purpose: Instantiate the agentic RAG agent by binding it to an existing RAG backend.
| Parameter | Type | Required | Description |
|---|---|---|---|
rag_system |
Any |
Yes | Underlying RAG backend. Must expose retrieve(query) and either generate(query) or query(query). |
config |
Optional[AgenticConfig] |
No | Runtime configuration. AgenticConfig() defaults are used when None. |
llm |
Optional[Any] |
No | Language model with a .generate(prompt) -> str method used for query refinement, sufficiency checks, and self-correction. Falls back to rag_system.llm if None. |
on_action |
Optional[Callable[[ActionStep], None]] |
No | Callback invoked after every agent action step. Receives the ActionStep record. Exceptions in the callback are silently swallowed so they never interrupt the agent loop. |
Returns: AgenticRAG instance.
Example:
from fennec_community.rag.types.agentic_rag import AgenticRAG, AgenticConfig
def log_step(step):
print(f"[Step {step.iteration}] {step.action.value} — {step.details}")
config = AgenticConfig(max_iterations=4, min_confidence=0.75)
agent = AgenticRAG(
rag_system=my_rag,
config=config,
on_action=log_step,
)3.2 Core Query Methods
query()
agent.query(
query: str,
context: Optional[Dict[str, Any]] = None,
) -> AgenticResultPurpose: The primary entry point. Runs the full autonomous agent loop: decides actions, retrieves documents, optionally refines the query, checks sufficiency, generates the answer, and applies self-correction. Results are cached by query hash to avoid redundant processing.
Agent loop summary:
| Iteration | Decision rule |
|---|---|
| 1 | Always RETRIEVE |
| Even iteration, docs present | CHECK_SUFFICIENCY (if enabled) |
| Low avg score + early iter | REFINE_QUERY (if enabled) |
>= max_iterations |
GENERATE (forced) |
| Default | GENERATE |
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
str |
Yes | Natural-language question. Leading/trailing whitespace is stripped automatically. |
context |
Optional[Dict[str, Any]] |
No | Arbitrary extra key-value pairs forwarded to the generation step as supplemental context. The key "retrieved_docs" is excluded to prevent duplication. |
Returns: AgenticResult — contains the answer, retrieved documents, full action trace, iteration count, reasoning summary, confidence score, and a cache-hit flag.
Raises:
ValueError— ifqueryis empty after stripping.MaxIterationsExceeded— if the loop overrunsmax_iterationswithout reaching a terminal action. (In practice, the loop is bounded by thewhileguard; this exception is available for manual raising in subclasses.)
Example:
result = agent.query("What are the latest advances in transformer architectures?")
print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Iterations used: {result.iterations}")
print(result.reasoning)3.3 Document Ingestion
add_document()
agent.add_document(
text: str,
metadata: Optional[Dict[str, Any]] = None,
) -> AnyPurpose: Add a single plain-text document to the underlying RAG index. Delegates directly to rag_system.add_document().
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
str |
Yes | Raw text content of the document to index. |
metadata |
Optional[Dict[str, Any]] |
No | Arbitrary key-value metadata attached to the document (e.g., {"source": "wiki", "date": "2025-01"}). Defaults to {} if None. |
Returns: Whatever rag_system.add_document() returns (implementation-dependent).
Example:
agent.add_document(
"Transformer models use self-attention to process sequences.",
metadata={"source": "deep_learning_textbook", "chapter": 12},
)add_documents()
agent.add_documents(
documents: List[str],
metadatas: Optional[List[Dict[str, Any]]] = None,
) -> AnyPurpose: Batch-add multiple documents to the underlying RAG index. Preferred over calling add_document() in a loop for better throughput. Delegates to rag_system.add_documents().
| Parameter | Type | Required | Description |
|---|---|---|---|
documents |
List[str] |
Yes | List of raw text strings to index. |
metadatas |
Optional[List[Dict[str, Any]]] |
No | Parallel list of metadata dicts, one per document. Defaults to [] if None. |
Returns: Whatever rag_system.add_documents() returns (implementation-dependent).
Example:
agent.add_documents(
[
"BERT is a bidirectional transformer pre-trained on masked language modelling.",
"GPT uses a causal (left-to-right) attention mask during pre-training.",
],
metadatas=[
{"model": "BERT", "year": 2018},
{"model": "GPT", "year": 2018},
],
)3.4 Cache Management
clear_cache()
agent.clear_cache() -> NonePurpose: Flush both internal caches simultaneously:
- TTL retrieval cache (
TTLCache) — stores raw document lists per query. - Result cache (in-memory
dict) — stores completeAgenticResultobjects per query hash.
Call this after adding new documents to ensure subsequent queries reflect the updated index.
Parameters: None.
Returns: None
Example:
agent.add_documents(new_batch)
agent.clear_cache() # Force fresh retrieval on next query
result = agent.query("What changed recently?")3.5 Async API
aquery()
await agent.aquery(
query: str,
**kwargs,
) -> AgenticResultPurpose: Async wrapper around query(). Runs the full agent loop in a thread pool via asyncio.to_thread, making it safe to await from an asyncio event loop without blocking it.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
str |
Yes | Natural-language question. |
**kwargs |
Any |
No | Forwarded to query() (e.g., context={"key": "val"}). |
Returns: AgenticResult — identical to the synchronous query() return value.
Example:
import asyncio
async def main():
result = await agent.aquery("Explain attention mechanisms.")
print(result.answer)
asyncio.run(main())astream_query()
async for token in agent.astream_query(query: str):
print(token, end="", flush=True)Purpose: Runs the full agent decision loop (retrieve, refine, check sufficiency) and then streams the final answer token by token using the LLM's astream() method. Enables real-time display in chat interfaces and streaming APIs.
Requirements:
agent.llmmust be set (notNone).agent.llmmust expose anastream(prompt)async generator method.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
str |
Yes | Natural-language question. |
Yields: str — individual tokens as they are emitted by the LLM.
Sentinel yields on error:
"⚠️ query is empty"— ifqueryis blank after stripping."❌ llm not configured correctly for streaming."— if the LLM is absent or lacksastream."❌ not enough information to answer."— if no documents were retrieved after all iterations.
Example:
import asyncio
async def stream():
async for token in agent.astream_query("What is retrieval-augmented generation?"):
print(token, end="", flush=True)
print() # newline after stream
asyncio.run(stream())3.6 Context Manager
AgenticRAG supports the async context manager protocol:
async with AgenticRAG(rag_system=my_rag) as agent:
result = await agent.aquery("Summarise the latest research.")
print(result.answer)
# __aexit__ is a no-op — safe to use as a scope delimiterNote: The synchronous
withstatement is not supported. Use the async form (async with) or instantiate directly.
4. AgenticResult
from fennec_community.rag.types.agentic_rag import AgenticResultThe complete result object returned by AgenticRAG.query() and AgenticRAG.aquery().
Attributes
| Attribute | Type | Description |
|---|---|---|
answer |
str |
The final generated (and optionally self-corrected) answer string. |
retrieved_docs |
List[Document] |
Deduplicated list of all documents used during generation. |
action_history |
List[ActionStep] |
Ordered record of every decision the agent made. |
iterations |
int |
Total number of agent iterations consumed. |
reasoning |
str |
Human-readable multi-line summary of all action steps. |
confidence |
float |
Weighted confidence score in [0, 1] — top-ranked documents count more. |
cached |
bool |
True if this result was served from the in-memory result cache. |
to_dict()
result.to_dict() -> Dict[str, Any]Purpose: Serialize the key result fields to a plain dictionary suitable for JSON encoding, API responses, or logging.
Parameters: None.
Returns: Dict[str, Any] with the following keys:
| Key | Type | Description |
|---|---|---|
answer |
str |
The generated answer. |
docs_used |
int |
Number of documents used (len(retrieved_docs)). |
iterations |
int |
Total iterations consumed. |
confidence |
float |
Confidence score rounded to 4 decimal places. |
cached |
bool |
Whether the result was cache-served. |
reasoning |
str |
The formatted reasoning trace. |
Example:
result = agent.query("What is RAG?")
import json
print(json.dumps(result.to_dict(), indent=2, ensure_ascii=False))5. ActionStep
from fennec_community.rag.types.agentic_rag import ActionStepImmutable record of a single agent decision step, appended to AgenticResult.action_history after every iteration.
Attributes
| Attribute | Type | Description |
|---|---|---|
iteration |
int |
1-based iteration number within the current agent run. |
action |
AgentAction |
The action chosen by the agent at this step. |
query |
str |
The query string active at this step (may be a refined version). |
details |
str |
Human-readable summary of what happened (e.g., "fetched 3 docs", "sufficient"). |
duration_ms |
float |
Wall-clock time spent on the _decide_action call in milliseconds. |
confidence |
float |
Agent's computed confidence after this step. |
refined_query |
Optional[str] |
The rewritten query if this step was a REFINE_QUERY action; None otherwise. |
Typical usage — inspecting the action trace:
for step in result.action_history:
print(f"[{step.iteration}] {step.action.value:20s} conf={step.confidence:.2f} {step.details}")6. Document
from fennec_community.rag.types.agentic_rag import DocumentUnified, backend-agnostic document representation used throughout the agentic pipeline.
Attributes
| Attribute | Type | Description |
|---|---|---|
text |
str |
Raw text content of the document chunk. |
score |
float |
Relevance score in [0, 1] assigned by the retriever. |
metadata |
Dict[str, Any] |
Arbitrary key-value metadata (e.g., source file, page number). |
source |
str |
Origin identifier — chunk ID, document ID, or file name. |
snippet()
doc.snippet(max_chars: int = 300) -> strPurpose: Return a truncated text preview of the document, appending "…" when the full text exceeds max_chars. Useful for displaying context in logs, UIs, and LLM prompts without exceeding token limits.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_chars |
int |
300 |
Maximum number of characters to include in the preview. |
Returns: str — a truncated (or full) version of self.text.
Example:
for doc in result.retrieved_docs:
print(f"[{doc.score:.2f}] {doc.snippet(150)}")7. TTLCache
from fennec_community.rag.types.agentic_rag import TTLCacheA lightweight in-process TTL cache for storing retrieval results. Used internally by AgenticRAG to avoid redundant vector DB queries within the same session. Can also be used standalone.
Thread safety: This cache is not thread-safe. For concurrent access, wrap operations with a
threading.Lock.
Constructor
TTLCache(ttl_seconds: int = 300)| Parameter | Type | Default | Description |
|---|---|---|---|
ttl_seconds |
int |
300 |
Entry lifetime in seconds. Expired entries are evicted lazily on the next get() call. |
get()
cache.get(query: str) -> Optional[List[Document]]Purpose: Look up cached retrieval results for a query. Returns None on a miss or if the entry has expired (and deletes the stale entry).
| Parameter | Type | Description |
|---|---|---|
query |
str |
The exact query string used as a cache key (SHA-256 hashed internally). |
Returns: List[Document] on a cache hit; None on a miss or expiry.
set()
cache.set(query: str, docs: List[Document]) -> NonePurpose: Store a retrieval result in the cache under the given query.
| Parameter | Type | Description |
|---|---|---|
query |
str |
The query string. |
docs |
List[Document] |
The retrieved documents to cache. |
Returns: None
clear()
cache.clear() -> NonePurpose: Evict all entries from the cache immediately. Useful after bulk document updates when all cached results may be stale.
Parameters: None.
Returns: None
Example:
cache = TTLCache(ttl_seconds=120)
cache.set("transformer attention", docs)
cached = cache.get("transformer attention") # → List[Document]
cache.clear()
cached = cache.get("transformer attention") # → None8. Enumerations
AgentAction
from fennec_community.rag.types.agentic_rag import AgentActionDefines all discrete actions the agent can choose at each decision step.
| Value | String | Description |
|---|---|---|
RETRIEVE |
"retrieve" |
Fetch documents from the vector database for the current query. |
GENERATE |
"generate" |
Produce the final answer using retrieved documents — triggers the end of the agent loop. |
REFINE_QUERY |
"refine_query" |
Ask the LLM to rewrite the query to improve retrieval quality. |
CHECK_SUFFICIENCY |
"check_sufficiency" |
Evaluate whether the current documents are sufficient to answer the query. |
STOP |
"stop" |
Unconditionally terminate the agent loop without generating. |
ReasoningDepth
from fennec_community.rag.types.agentic_rag import ReasoningDepthControls the complexity and verbosity of LLM prompts used in query refinement.
| Value | String | LLM prompt hint |
|---|---|---|
SHALLOW |
"shallow" |
"in one concise sentence" — minimal, fast rewriting. |
MEDIUM |
"medium" |
"in one clear sentence, adding specificity" — balanced rewriting. |
DEEP |
"deep" |
"expanding scope and adding sub-questions if needed" — thorough, slower rewriting. |
9. Exception Hierarchy
All exceptions inherit from AgenticRAGError so callers can catch either a specific subclass or the general base.
AgenticRAGError (base)
├── RetrievalError — raised when all retrieval attempts (with retries) fail
├── GenerationError — raised when all generation attempts (with retries) fail
└── MaxIterationsExceeded — raised when the agent exceeds the configured max_iterationsAttributes: All exceptions carry a descriptive str message.
Example error handling:
from fennec_community.rag.types.agentic_rag import (
AgenticRAGError,
RetrievalError,
GenerationError,
MaxIterationsExceeded,
)
try:
result = agent.query("What is the capital of France?")
except RetrievalError as e:
print(f"Retrieval failed: {e}")
except GenerationError as e:
print(f"Generation failed: {e}")
except MaxIterationsExceeded as e:
print(f"Agent loop overran: {e}")
except AgenticRAGError as e:
print(f"Unexpected agentic error: {e}")10. Agent Decision Flow
The following diagram shows how the agent progresses through iterations for a single query() call:
query("...") called
│
▼
┌─────────────────────────┐
│ Result cache lookup │──── HIT ──▶ return cached AgenticResult
└─────────────────────────┘
│ MISS
▼
┌─────────────────────────────────────────────────────────────────┐
│ Agent Loop (iteration 1 … max_iterations) │
│ │
│ _decide_action() │
│ ├─ iter == 1 or no docs → RETRIEVE │
│ ├─ iter >= max_iterations → GENERATE (exit loop) │
│ ├─ even iter + check enabled → CHECK_SUFFICIENCY │
│ ├─ low avg score + early iter → REFINE_QUERY │
│ └─ default → GENERATE (exit loop) │
│ │
│ RETRIEVE → _retrieve_with_retry() → TTLCache │
│ REFINE_QUERY → _refine_query() via LLM │
│ CHECK_SUFFICIENCY → _is_information_sufficient() │
│ ├─ heuristic (score >= min_confidence) → True / False │
│ └─ borderline → LLM "yes/no" call │
│ GENERATE / STOP / sufficient → exit loop │
└─────────────────────────────────────────────────────────────────┘
│
▼
_deduplicate(retrieved_docs) ← remove text-hash duplicates
│
▼
_generate_answer_with_retry() ← LLM prompt or rag_system.generate()
│
▼
_self_correct() (if enabled) ← LLM validation pass
│
▼
Build AgenticResult
Store in result cache
│
▼
return AgenticResult11. Quick-Start Example
import asyncio
from fennec_community.rag.types.agentic_rag import AgenticRAG, AgenticConfig, ReasoningDepth
from fennec_community.rag.core import RAGSystem # your existing RAG backend
# ── 1. Build configuration ────────────────────────────────────────────────
config = AgenticConfig(
max_iterations=5,
min_confidence=0.72,
enable_query_refinement=True,
enable_self_correction=True,
reasoning_depth=ReasoningDepth.MEDIUM,
cache_ttl_seconds=300,
retry_attempts=2,
)
# ── 2. Optional: action callback for live tracing ─────────────────────────
def on_step(step):
print(
f" [{step.iteration:02d}] {step.action.value:<22} "
f"conf={step.confidence:.2f} {step.details or step.query[:50]}"
)
# ── 3. Instantiate ────────────────────────────────────────────────────────
my_rag_system=RAGSystem(
vector_db=my_vector_db,
llm=my_llm,
chunker=my_chunker,
context_manager=my_ctx_mgr,
config=config,
enable_query_expansion=True,
query_expansion_variants=3,
)
agent = AgenticRAG(
rag_system=my_rag_system, # must expose .retrieve() and .generate()
config=config,
on_action=on_step,
)
# ── 4. Index documents ────────────────────────────────────────────────────
agent.add_documents(
["Transformers use self-attention for sequence modelling.",
"BERT is pre-trained on masked language modelling."],
metadatas=[{"src": "book"}, {"src": "paper"}],
)
# ── 5. Synchronous query ──────────────────────────────────────────────────
result = agent.query("How does BERT differ from GPT?")
print(f"\nAnswer : {result.answer}")
print(f"Confidence : {result.confidence:.2%}")
print(f"Iterations : {result.iterations}")
print(f"Cached : {result.cached}")
print(f"\nReasoning trace:\n{result.reasoning}")
print(f"\nDocs used : {len(result.retrieved_docs)}")
for doc in result.retrieved_docs:
print(f" [{doc.score:.2f}] {doc.snippet(120)}")
# ── 6. JSON-serialisable summary ──────────────────────────────────────────
import json
print(json.dumps(result.to_dict(), indent=2, ensure_ascii=False))
# ── 7. Async query ────────────────────────────────────────────────────────
async def async_example():
result = await agent.aquery("What is self-attention?")
print(result.answer)
asyncio.run(async_example())
# ── 8. Streaming query ────────────────────────────────────────────────────
async def stream_example():
print("Streaming answer: ", end="")
async for token in agent.astream_query("Explain BERT pre-training."):
print(token, end="", flush=True)
print()
asyncio.run(stream_example())
# ── 9. Clear caches after new ingestion ──────────────────────────────────
agent.add_document("New paper on RLHF published in 2025.")
agent.clear_cache()
fresh_result = agent.query("What is RLHF?")Simple Real Example
from fennec_community.llm import MistralInterface
from fennec_community.document_loaders import TextLoader
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem
from fennec_community.rag.types.agentic_rag import AgenticRAG , AgenticConfig, ReasoningDepth
glob = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
reader = TextLoader("./data_kn/faq.txt").load()
glob.add_documents(reader)
def on_action(step):
icons = {
"retrieve": "📥",
"generate": "✍️",
"refine_query": "🔄",
"check_sufficiency": "🔍",
"stop": "🛑",
}
icon = icons.get(step.action.value, "▶️")
print(
f" {icon} [{step.iteration}] {step.action.value:<22} "
f"| confidence: {step.confidence:.2f} "
f"| {step.duration_ms:.0f}ms"
+ (f" | {step.details}" if step.details else "")
)
agentic_config = AgenticConfig(
max_iterations=5,
min_confidence=0.65,
min_docs_required=2,
enable_query_refinement=True,
enable_sufficiency_check=True,
enable_self_correction=True,
reasoning_depth=ReasoningDepth.MEDIUM,
cache_ttl_seconds=300,
retry_attempts=2,
)
agent = AgenticRAG(
rag_system=glob,
config=agentic_config,
on_action=on_action,
)
response = agent.query("ما هي طرق الدفع المتاحة؟")
print("\nFinal Response:")
print(response)
community/rag/agentic_rag.md