Fennec Logo Fennec
Fennec Community community/rag/agentic_rag.md

Agentic RAG Module — Public API Reference


Table of Contents

  1. Module Overview
  2. AgenticConfig
  3. AgenticRAG
  4. AgenticResult
  5. ActionStep
  6. Document
  7. TTLCache
  8. Enumerations
  9. Exception Hierarchy
  10. Agent Decision Flow
  11. Quick-Start Example

1. Module Overview

The agentic_rag package wraps any standard RAG backend (e.g., the core.RAGSystem) with an autonomous decision-making loop. Instead of executing a single retrieve-and-generate pass, the agent iteratively decides what to do next based on the quality of information collected so far — retrieve more documents, refine the query, verify sufficiency, self-correct the answer, or stop.

Key capabilities:

  • Autonomous decision loop — up to max_iterations steps, each selecting the optimal action.
  • Query refinement — LLM rewrites weak queries given current low-confidence documents.
  • Sufficiency checking — heuristic-first, LLM-fallback validation before committing to generation.
  • Self-correction — post-generation LLM pass to strip unsupported claims.
  • Heterogeneous backend normalisation — accepts Document, Tuple[chunk, score], and dict formats from any retriever.
  • Two-level caching — TTL retrieval cache + in-memory result cache.
  • Full async support — including token-level streaming generation.
  • Action callback — fire an on_action hook after every decision step for tracing or UI streaming.

Publicly exported symbols:

from fennec_community.rag.types.agentic_rag import AgenticRAG, AgenticConfig, AgentAction, ReasoningDepth

2. AgenticConfig

from fennec_community.rag.types.agentic_rag import AgenticConfig

AgenticConfig is a dataclass holding all tunable parameters for an AgenticRAG instance. Every field has a production-ready default.

Constructor

AgenticConfig(
    max_iterations: int = 5,
    min_confidence: float = 0.7,
    min_docs_required: int = 2,
    enable_query_refinement: bool = True,
    enable_self_correction: bool = True,
    enable_sufficiency_check: bool = True,
    reasoning_depth: ReasoningDepth = ReasoningDepth.MEDIUM,
    cache_ttl_seconds: int = 300,
    retry_attempts: int = 2,
    retry_backoff_base: float = 0.5,
)
Parameter Type Default Description
max_iterations int 5 Hard cap on the number of agent decision cycles. The agent always generates an answer before hitting this limit.
min_confidence float 0.7 Minimum weighted average relevance score required to consider retrieved documents sufficient.
min_docs_required int 2 Minimum number of documents that must be retrieved before the sufficiency check can return True.
enable_query_refinement bool True Allow the agent to rewrite low-quality queries using the LLM before re-retrieving.
enable_self_correction bool True After generation, ask the LLM to validate and clean the answer against the retrieved context.
enable_sufficiency_check bool True Every even-numbered iteration, ask whether the current documents are sufficient to answer.
reasoning_depth ReasoningDepth MEDIUM Controls the verbosity and complexity of LLM prompts.
cache_ttl_seconds int 300 Lifetime in seconds for cached retrieval results. Set to 0 to disable caching.
retry_attempts int 2 Number of additional retries for retrieval and generation calls before giving up.
retry_backoff_base float 0.5 Base seconds for exponential back-off between retries (base * 2^attempt).

Example:

config = AgenticConfig(
    max_iterations=6,
    min_confidence=0.75,
    enable_self_correction=True,
    reasoning_depth=ReasoningDepth.DEEP,
    cache_ttl_seconds=600,
)

3. AgenticRAG

from fennec_community.rag.types.agentic_rag import AgenticRAG

AgenticRAG is the main class of the package. It wraps any RAG backend and adds an autonomous agent loop that adaptively retrieves, refines, validates, and generates answers.


3.1 Constructor

AgenticRAG(
    rag_system: Any,
    config: Optional[AgenticConfig] = None,
    llm: Optional[Any] = None,
    on_action: Optional[Callable[[ActionStep], None]] = None,
)

Purpose: Instantiate the agentic RAG agent by binding it to an existing RAG backend.

Parameter Type Required Description
rag_system Any Yes Underlying RAG backend. Must expose retrieve(query) and either generate(query) or query(query).
config Optional[AgenticConfig] No Runtime configuration. AgenticConfig() defaults are used when None.
llm Optional[Any] No Language model with a .generate(prompt) -> str method used for query refinement, sufficiency checks, and self-correction. Falls back to rag_system.llm if None.
on_action Optional[Callable[[ActionStep], None]] No Callback invoked after every agent action step. Receives the ActionStep record. Exceptions in the callback are silently swallowed so they never interrupt the agent loop.

Returns: AgenticRAG instance.

Example:

from fennec_community.rag.types.agentic_rag import AgenticRAG, AgenticConfig

def log_step(step):
    print(f"[Step {step.iteration}] {step.action.value}{step.details}")

config = AgenticConfig(max_iterations=4, min_confidence=0.75)
agent  = AgenticRAG(
    rag_system=my_rag,
    config=config,
    on_action=log_step,
)

3.2 Core Query Methods

query()

agent.query(
    query: str,
    context: Optional[Dict[str, Any]] = None,
) -> AgenticResult

Purpose: The primary entry point. Runs the full autonomous agent loop: decides actions, retrieves documents, optionally refines the query, checks sufficiency, generates the answer, and applies self-correction. Results are cached by query hash to avoid redundant processing.

Agent loop summary:

Iteration Decision rule
1 Always RETRIEVE
Even iteration, docs present CHECK_SUFFICIENCY (if enabled)
Low avg score + early iter REFINE_QUERY (if enabled)
>= max_iterations GENERATE (forced)
Default GENERATE
Parameter Type Required Description
query str Yes Natural-language question. Leading/trailing whitespace is stripped automatically.
context Optional[Dict[str, Any]] No Arbitrary extra key-value pairs forwarded to the generation step as supplemental context. The key "retrieved_docs" is excluded to prevent duplication.

Returns: AgenticResult — contains the answer, retrieved documents, full action trace, iteration count, reasoning summary, confidence score, and a cache-hit flag.

Raises:

  • ValueError — if query is empty after stripping.
  • MaxIterationsExceeded — if the loop overruns max_iterations without reaching a terminal action. (In practice, the loop is bounded by the while guard; this exception is available for manual raising in subclasses.)

Example:

result = agent.query("What are the latest advances in transformer architectures?")

print(result.answer)
print(f"Confidence: {result.confidence:.2%}")
print(f"Iterations used: {result.iterations}")
print(result.reasoning)

3.3 Document Ingestion

add_document()

agent.add_document(
    text: str,
    metadata: Optional[Dict[str, Any]] = None,
) -> Any

Purpose: Add a single plain-text document to the underlying RAG index. Delegates directly to rag_system.add_document().

Parameter Type Required Description
text str Yes Raw text content of the document to index.
metadata Optional[Dict[str, Any]] No Arbitrary key-value metadata attached to the document (e.g., {"source": "wiki", "date": "2025-01"}). Defaults to {} if None.

Returns: Whatever rag_system.add_document() returns (implementation-dependent).

Example:

agent.add_document(
    "Transformer models use self-attention to process sequences.",
    metadata={"source": "deep_learning_textbook", "chapter": 12},
)

add_documents()

agent.add_documents(
    documents: List[str],
    metadatas: Optional[List[Dict[str, Any]]] = None,
) -> Any

Purpose: Batch-add multiple documents to the underlying RAG index. Preferred over calling add_document() in a loop for better throughput. Delegates to rag_system.add_documents().

Parameter Type Required Description
documents List[str] Yes List of raw text strings to index.
metadatas Optional[List[Dict[str, Any]]] No Parallel list of metadata dicts, one per document. Defaults to [] if None.

Returns: Whatever rag_system.add_documents() returns (implementation-dependent).

Example:

agent.add_documents(
    [
        "BERT is a bidirectional transformer pre-trained on masked language modelling.",
        "GPT uses a causal (left-to-right) attention mask during pre-training.",
    ],
    metadatas=[
        {"model": "BERT", "year": 2018},
        {"model": "GPT",  "year": 2018},
    ],
)

3.4 Cache Management

clear_cache()

agent.clear_cache() -> None

Purpose: Flush both internal caches simultaneously:

  1. TTL retrieval cache (TTLCache) — stores raw document lists per query.
  2. Result cache (in-memory dict) — stores complete AgenticResult objects per query hash.

Call this after adding new documents to ensure subsequent queries reflect the updated index.

Parameters: None.

Returns: None

Example:

agent.add_documents(new_batch)
agent.clear_cache()  # Force fresh retrieval on next query
result = agent.query("What changed recently?")

3.5 Async API

aquery()

await agent.aquery(
    query: str,
    **kwargs,
) -> AgenticResult

Purpose: Async wrapper around query(). Runs the full agent loop in a thread pool via asyncio.to_thread, making it safe to await from an asyncio event loop without blocking it.

Parameter Type Required Description
query str Yes Natural-language question.
**kwargs Any No Forwarded to query() (e.g., context={"key": "val"}).

Returns: AgenticResult — identical to the synchronous query() return value.

Example:

import asyncio

async def main():
    result = await agent.aquery("Explain attention mechanisms.")
    print(result.answer)

asyncio.run(main())

astream_query()

async for token in agent.astream_query(query: str):
    print(token, end="", flush=True)

Purpose: Runs the full agent decision loop (retrieve, refine, check sufficiency) and then streams the final answer token by token using the LLM's astream() method. Enables real-time display in chat interfaces and streaming APIs.

Requirements:

  • agent.llm must be set (not None).
  • agent.llm must expose an astream(prompt) async generator method.
Parameter Type Required Description
query str Yes Natural-language question.

Yields: str — individual tokens as they are emitted by the LLM.

Sentinel yields on error:

  • "⚠️ query is empty" — if query is blank after stripping.
  • "❌ llm not configured correctly for streaming." — if the LLM is absent or lacks astream.
  • "❌ not enough information to answer." — if no documents were retrieved after all iterations.

Example:

import asyncio

async def stream():
    async for token in agent.astream_query("What is retrieval-augmented generation?"):
        print(token, end="", flush=True)
    print()  # newline after stream

asyncio.run(stream())

3.6 Context Manager

AgenticRAG supports the async context manager protocol:

async with AgenticRAG(rag_system=my_rag) as agent:
    result = await agent.aquery("Summarise the latest research.")
    print(result.answer)
# __aexit__ is a no-op — safe to use as a scope delimiter

Note: The synchronous with statement is not supported. Use the async form (async with) or instantiate directly.


4. AgenticResult

from fennec_community.rag.types.agentic_rag import AgenticResult

The complete result object returned by AgenticRAG.query() and AgenticRAG.aquery().

Attributes

Attribute Type Description
answer str The final generated (and optionally self-corrected) answer string.
retrieved_docs List[Document] Deduplicated list of all documents used during generation.
action_history List[ActionStep] Ordered record of every decision the agent made.
iterations int Total number of agent iterations consumed.
reasoning str Human-readable multi-line summary of all action steps.
confidence float Weighted confidence score in [0, 1] — top-ranked documents count more.
cached bool True if this result was served from the in-memory result cache.

to_dict()

result.to_dict() -> Dict[str, Any]

Purpose: Serialize the key result fields to a plain dictionary suitable for JSON encoding, API responses, or logging.

Parameters: None.

Returns: Dict[str, Any] with the following keys:

Key Type Description
answer str The generated answer.
docs_used int Number of documents used (len(retrieved_docs)).
iterations int Total iterations consumed.
confidence float Confidence score rounded to 4 decimal places.
cached bool Whether the result was cache-served.
reasoning str The formatted reasoning trace.

Example:

result = agent.query("What is RAG?")
import json
print(json.dumps(result.to_dict(), indent=2, ensure_ascii=False))

5. ActionStep

from fennec_community.rag.types.agentic_rag import ActionStep

Immutable record of a single agent decision step, appended to AgenticResult.action_history after every iteration.

Attributes

Attribute Type Description
iteration int 1-based iteration number within the current agent run.
action AgentAction The action chosen by the agent at this step.
query str The query string active at this step (may be a refined version).
details str Human-readable summary of what happened (e.g., "fetched 3 docs", "sufficient").
duration_ms float Wall-clock time spent on the _decide_action call in milliseconds.
confidence float Agent's computed confidence after this step.
refined_query Optional[str] The rewritten query if this step was a REFINE_QUERY action; None otherwise.

Typical usage — inspecting the action trace:

for step in result.action_history:
    print(f"[{step.iteration}] {step.action.value:20s}  conf={step.confidence:.2f}  {step.details}")

6. Document

from fennec_community.rag.types.agentic_rag import Document

Unified, backend-agnostic document representation used throughout the agentic pipeline.

Attributes

Attribute Type Description
text str Raw text content of the document chunk.
score float Relevance score in [0, 1] assigned by the retriever.
metadata Dict[str, Any] Arbitrary key-value metadata (e.g., source file, page number).
source str Origin identifier — chunk ID, document ID, or file name.

snippet()

doc.snippet(max_chars: int = 300) -> str

Purpose: Return a truncated text preview of the document, appending "…" when the full text exceeds max_chars. Useful for displaying context in logs, UIs, and LLM prompts without exceeding token limits.

Parameter Type Default Description
max_chars int 300 Maximum number of characters to include in the preview.

Returns: str — a truncated (or full) version of self.text.

Example:

for doc in result.retrieved_docs:
    print(f"[{doc.score:.2f}] {doc.snippet(150)}")

7. TTLCache

from fennec_community.rag.types.agentic_rag import TTLCache

A lightweight in-process TTL cache for storing retrieval results. Used internally by AgenticRAG to avoid redundant vector DB queries within the same session. Can also be used standalone.

Thread safety: This cache is not thread-safe. For concurrent access, wrap operations with a threading.Lock.


Constructor

TTLCache(ttl_seconds: int = 300)
Parameter Type Default Description
ttl_seconds int 300 Entry lifetime in seconds. Expired entries are evicted lazily on the next get() call.

get()

cache.get(query: str) -> Optional[List[Document]]

Purpose: Look up cached retrieval results for a query. Returns None on a miss or if the entry has expired (and deletes the stale entry).

Parameter Type Description
query str The exact query string used as a cache key (SHA-256 hashed internally).

Returns: List[Document] on a cache hit; None on a miss or expiry.


set()

cache.set(query: str, docs: List[Document]) -> None

Purpose: Store a retrieval result in the cache under the given query.

Parameter Type Description
query str The query string.
docs List[Document] The retrieved documents to cache.

Returns: None


clear()

cache.clear() -> None

Purpose: Evict all entries from the cache immediately. Useful after bulk document updates when all cached results may be stale.

Parameters: None.

Returns: None

Example:

cache = TTLCache(ttl_seconds=120)
cache.set("transformer attention", docs)
cached = cache.get("transformer attention")   # → List[Document]
cache.clear()
cached = cache.get("transformer attention")   # → None

8. Enumerations

AgentAction

from fennec_community.rag.types.agentic_rag import AgentAction

Defines all discrete actions the agent can choose at each decision step.

Value String Description
RETRIEVE "retrieve" Fetch documents from the vector database for the current query.
GENERATE "generate" Produce the final answer using retrieved documents — triggers the end of the agent loop.
REFINE_QUERY "refine_query" Ask the LLM to rewrite the query to improve retrieval quality.
CHECK_SUFFICIENCY "check_sufficiency" Evaluate whether the current documents are sufficient to answer the query.
STOP "stop" Unconditionally terminate the agent loop without generating.

ReasoningDepth

from fennec_community.rag.types.agentic_rag import ReasoningDepth

Controls the complexity and verbosity of LLM prompts used in query refinement.

Value String LLM prompt hint
SHALLOW "shallow" "in one concise sentence" — minimal, fast rewriting.
MEDIUM "medium" "in one clear sentence, adding specificity" — balanced rewriting.
DEEP "deep" "expanding scope and adding sub-questions if needed" — thorough, slower rewriting.

9. Exception Hierarchy

All exceptions inherit from AgenticRAGError so callers can catch either a specific subclass or the general base.

AgenticRAGError  (base)
├── RetrievalError         — raised when all retrieval attempts (with retries) fail
├── GenerationError        — raised when all generation attempts (with retries) fail
└── MaxIterationsExceeded  — raised when the agent exceeds the configured max_iterations

Attributes: All exceptions carry a descriptive str message.

Example error handling:

from fennec_community.rag.types.agentic_rag import (
    AgenticRAGError,
    RetrievalError,
    GenerationError,
    MaxIterationsExceeded,
)

try:
    result = agent.query("What is the capital of France?")
except RetrievalError as e:
    print(f"Retrieval failed: {e}")
except GenerationError as e:
    print(f"Generation failed: {e}")
except MaxIterationsExceeded as e:
    print(f"Agent loop overran: {e}")
except AgenticRAGError as e:
    print(f"Unexpected agentic error: {e}")

10. Agent Decision Flow

The following diagram shows how the agent progresses through iterations for a single query() call:

query("...") called
       │
       ▼
┌─────────────────────────┐
│  Result cache lookup    │──── HIT ──▶ return cached AgenticResult
└─────────────────────────┘
       │ MISS
       ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Agent Loop  (iteration 1 … max_iterations)   │
│                                                                 │
│  _decide_action()                                               │
│    ├─ iter == 1 or no docs          → RETRIEVE                 │
│    ├─ iter >= max_iterations        → GENERATE (exit loop)     │
│    ├─ even iter + check enabled     → CHECK_SUFFICIENCY        │
│    ├─ low avg score + early iter    → REFINE_QUERY             │
│    └─ default                       → GENERATE (exit loop)     │
│                                                                 │
│  RETRIEVE       → _retrieve_with_retry() → TTLCache            │
│  REFINE_QUERY   → _refine_query() via LLM                      │
│  CHECK_SUFFICIENCY → _is_information_sufficient()               │
│    ├─ heuristic (score >= min_confidence) → True / False       │
│    └─ borderline → LLM "yes/no" call                          │
│  GENERATE / STOP / sufficient → exit loop                      │
└─────────────────────────────────────────────────────────────────┘
       │
       ▼
  _deduplicate(retrieved_docs)         ← remove text-hash duplicates
       │
       ▼
  _generate_answer_with_retry()        ← LLM prompt or rag_system.generate()
       │
       ▼
  _self_correct()  (if enabled)        ← LLM validation pass
       │
       ▼
  Build AgenticResult
  Store in result cache
       │
       ▼
  return AgenticResult

11. Quick-Start Example

import asyncio
from fennec_community.rag.types.agentic_rag import AgenticRAG, AgenticConfig, ReasoningDepth
from fennec_community.rag.core import RAGSystem  # your existing RAG backend

# ── 1. Build configuration ────────────────────────────────────────────────
config = AgenticConfig(
    max_iterations=5,
    min_confidence=0.72,
    enable_query_refinement=True,
    enable_self_correction=True,
    reasoning_depth=ReasoningDepth.MEDIUM,
    cache_ttl_seconds=300,
    retry_attempts=2,
)

# ── 2. Optional: action callback for live tracing ─────────────────────────
def on_step(step):
    print(
        f"  [{step.iteration:02d}] {step.action.value:<22} "
        f"conf={step.confidence:.2f}  {step.details or step.query[:50]}"
    )

# ── 3. Instantiate ────────────────────────────────────────────────────────
my_rag_system=RAGSystem(
    vector_db=my_vector_db,
    llm=my_llm,
    chunker=my_chunker,
    context_manager=my_ctx_mgr,
    config=config,
    enable_query_expansion=True,
    query_expansion_variants=3,
)

agent = AgenticRAG(
    rag_system=my_rag_system,   # must expose .retrieve() and .generate()
    config=config,
    on_action=on_step,
)

# ── 4. Index documents ────────────────────────────────────────────────────
agent.add_documents(
    ["Transformers use self-attention for sequence modelling.",
     "BERT is pre-trained on masked language modelling."],
    metadatas=[{"src": "book"}, {"src": "paper"}],
)

# ── 5. Synchronous query ──────────────────────────────────────────────────
result = agent.query("How does BERT differ from GPT?")
print(f"\nAnswer     : {result.answer}")
print(f"Confidence : {result.confidence:.2%}")
print(f"Iterations : {result.iterations}")
print(f"Cached     : {result.cached}")
print(f"\nReasoning trace:\n{result.reasoning}")
print(f"\nDocs used  : {len(result.retrieved_docs)}")
for doc in result.retrieved_docs:
    print(f"  [{doc.score:.2f}] {doc.snippet(120)}")

# ── 6. JSON-serialisable summary ──────────────────────────────────────────
import json
print(json.dumps(result.to_dict(), indent=2, ensure_ascii=False))

# ── 7. Async query ────────────────────────────────────────────────────────
async def async_example():
    result = await agent.aquery("What is self-attention?")
    print(result.answer)

asyncio.run(async_example())

# ── 8. Streaming query ────────────────────────────────────────────────────
async def stream_example():
    print("Streaming answer: ", end="")
    async for token in agent.astream_query("Explain BERT pre-training."):
        print(token, end="", flush=True)
    print()

asyncio.run(stream_example())

# ── 9. Clear caches after new ingestion ──────────────────────────────────
agent.add_document("New paper on RLHF published in 2025.")
agent.clear_cache()
fresh_result = agent.query("What is RLHF?")

Simple Real Example



from fennec_community.llm import MistralInterface
from fennec_community.document_loaders import TextLoader
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem
from fennec_community.rag.types.agentic_rag import AgenticRAG , AgenticConfig, ReasoningDepth

glob = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
reader = TextLoader("./data_kn/faq.txt").load()
glob.add_documents(reader)

def on_action(step):
    icons = {
        "retrieve":           "📥",
        "generate":           "✍️",
        "refine_query":       "🔄",
        "check_sufficiency":  "🔍",
        "stop":               "🛑",
    }
    icon = icons.get(step.action.value, "▶️")
    print(
        f"  {icon} [{step.iteration}] {step.action.value:<22} "
        f"| confidence: {step.confidence:.2f} "
        f"| {step.duration_ms:.0f}ms"
        + (f" | {step.details}" if step.details else "")
    )


agentic_config = AgenticConfig(
    max_iterations=5,
    min_confidence=0.65,
    min_docs_required=2,
    enable_query_refinement=True,
    enable_sufficiency_check=True,
    enable_self_correction=True,
    reasoning_depth=ReasoningDepth.MEDIUM,
    cache_ttl_seconds=300,
    retry_attempts=2,
)

agent = AgenticRAG(
    rag_system=glob,
    config=agentic_config,
    on_action=on_action,
)

response = agent.query("ما هي طرق الدفع المتاحة؟")
print("\nFinal Response:")
print(response)

Source: community/rag/agentic_rag.md