Fennec Logo Fennec
Fennec Community community/rag/conversional_rag.md

Conversational RAG — `conversational_rag` Module — Public API Reference


Table of Contents

  1. Module Overview
  2. RAGConfigConverstion
  3. ConversationalRAG
  4. ConversationHistory
  5. ConversationTurn
  6. Data Flow Diagram
  7. Quick-Start Example

1. Module Overview

The conversational_rag package extends any standard RAG backend with full multi-turn conversation awareness. Instead of treating every query as an isolated retrieval, the system maintains a sliding window of previous exchanges, injects that history into LLM prompts, and automatically enriches ambiguous queries (containing pronouns or references like "this", "that", "it", "ده", "هذا") with context from the previous turn.

Key capabilities:

  • Multi-turn context window — configurable sliding history injected into every prompt.
  • Automatic query enrichment — detects anaphoric references across Arabic dialects and English, appending the previous question to resolve ambiguity.
  • Live JSON persistence — every turn is saved immediately to a timestamped JSON file; the history survives process restarts.
  • Pluggable instructions — pass a custom system prompt or rely on built-in Arabic/English/generic defaults.
  • Source attribution — optionally append ranked source citations to every answer.
  • Export & reload — export sessions as json or text, and reload prior sessions from disk.
  • Full async supportaask() for non-blocking queries; astream() for token-by-token streaming.
  • Usage statistics — per-session query, chunk, and context-usage counters.

Publicly exported symbols:

from fennec_community.rag.types.conversational_rag import (
    ConversationalRAG,
    RAGConfigConverstion,
    ConversationTurn,
    ConversationHistory,
)

2. RAGConfigConverstion

from fennec_community.rag.types.conversational_rag import RAGConfigConverstion

RAGConfigConverstion is a dataclass that centralises all tunable defaults for a ConversationalRAG session. Every constructor parameter of ConversationalRAG has a matching field here so the defaults are defined in exactly one place.

Constructor

RAGConfigConverstion(
    max_history_turns: int = 10,
    context_turns: int = 3,
    enable_context_compression: bool = False,
    include_sources: bool = True,
    use_history: bool = True,
    auto_save_history: bool = True,
    history_save_path: str = "conversations",
)
Parameter Type Default Description
max_history_turns int 10 Maximum number of turns retained in the in-memory history ring buffer. Oldest turns are evicted when the buffer is full.
context_turns int 3 Number of the most recent turns formatted and injected into every LLM prompt as conversation context.
enable_context_compression bool False Reserved for future use — compressing long conversation contexts before injection.
include_sources bool True Default for whether source citations are appended to answers in ask().
use_history bool True Default for whether conversation history is used in ask().
auto_save_history bool True Whether to auto-save the history JSON file after every turn.
history_save_path str "conversations" Directory under which auto-named session files are created.

Example:

from fennec_community.rag.types.conversational_rag import RAGConfigConverstion

cfg = RAGConfigConverstion(
    max_history_turns=20,
    context_turns=5,
    include_sources=False,
)

3. ConversationalRAG

from fennec_community.rag.types.conversational_rag import ConversationalRAG

ConversationalRAG is the main class of the package. It wraps any RAG system and adds stateful, multi-turn conversational capabilities with live persistence.


3.1 Constructor

ConversationalRAG(
    rag_system: Any,
    max_history_turns: int = 10,
    context_turns: int = 3,
    enable_context_compression: bool = False,
    instructions: Optional[str] = None,
    lang: str = "ar",
    history_file: Optional[str] = None,
    auto_save: bool = True,
)

Purpose: Instantiate a stateful conversational RAG session wired to an existing RAG backend. Creates or resumes the conversation history file on disk immediately.

Parameter Type Required Description
rag_system Any Yes The underlying RAG backend. Must expose .retrieve(query), .context_manager.build(query, chunks), and .llm.generate(prompt).
max_history_turns int No Maximum number of turns kept in the history ring buffer. Default: 10.
context_turns int No Number of recent turns injected into every LLM prompt. Default: 3.
enable_context_compression bool No Enable context compression for long conversations (reserved). Default: False.
instructions Optional[str] No Custom system-level instructions for the LLM. When None, built-in defaults are chosen based on lang.
lang str No Conversation language — "ar" (Arabic), "en" (English), or any other string for a generic fallback. Controls default instructions and prompt formatting. Default: "ar".
history_file Optional[str] No Explicit path to the JSON persistence file. When None, a timestamped file is created automatically inside the conversations/ directory.
auto_save bool No Persist history to disk after every turn. Default: True.

Returns: ConversationalRAG instance.

Raises: ValueError — if rag_system is None.

Default instructions per language:

lang Instruction style
"ar" Strict Arabic — answer only from context, no external knowledge, respond in the query's language.
"en" Strict English — same constraints expressed in English.
Any other Generic bilingual — accurate and concise, state when information is insufficient.

Example:

from fennec_community.rag.types.conversational_rag import ConversationalRAG

bot = ConversationalRAG(
    rag_system=my_rag,
    max_history_turns=15,
    context_turns=4,
    lang="en",
    history_file="sessions/project_alpha.json",
)

3.2 Core Interaction

ask()

bot.ask(
    query: str,
    include_sources: bool = True,
    use_history: bool = True,
) -> str

Purpose: The primary interaction method. Processes a natural-language question through the full conversational pipeline:

  1. Validates the query (rejects blank / single-character inputs).
  2. Enriches the query with the previous turn's question if anaphoric indicators are detected.
  3. Retrieves relevant chunks from the RAG backend.
  4. Builds a context block via rag_system.context_manager.build().
  5. Constructs a history-aware, instruction-bounded prompt.
  6. Calls rag_system.llm.generate() to produce the answer.
  7. Optionally appends source citations.
  8. Appends the turn to the conversation history (auto-saved to disk).
  9. Updates per-session statistics.
Parameter Type Default Description
query str Natural-language question from the user.
include_sources bool True (config.include_sources) Append ranked source document IDs and similarity scores to the answer.
use_history bool True (config.use_history) Inject conversation history into the prompt and use it for query enrichment. Pass False to treat this turn as stateless.

Returns: str — the LLM-generated answer, with optional source block appended.

Edge-case return values:

Condition Return value
Empty / single-char query "⚠️Please enter a valid question"
No chunks retrieved "I don't have enough information to answer"
LLM unavailable "⚠️ LLM isn't available"
Exception during processing "Sorry, an error occurred: <error message>"

All edge cases still append a turn to history so the conversation trace remains complete.

Example:

answer = bot.ask("What is the capital of Egypt?")
print(answer)

# Follow-up using anaphoric reference — query is auto-enriched
answer = bot.ask("Tell me more about it", include_sources=False)
print(answer)

3.3 Session Management

reset_conversation()

bot.reset_conversation() -> None

Purpose: Clear the entire in-memory conversation history, effectively starting a fresh session. The history JSON file on disk is not deleted — only the in-memory buffer is cleared.

Parameters: None.

Returns: None

Example:

bot.reset_conversation()
print(bot)  # ConversationalRAG(turns=0, ...)

3.4 Persistence & Export

export_conversation()

bot.export_conversation(
    format: str = "json",
    output_file: Optional[str] = None,
) -> str

Purpose: Serialise the entire conversation history to a string in the requested format. Optionally write the result to a file.

Parameter Type Default Description
format str "json" Output format — "json" for a structured JSON array, "text" for a human-readable Q&A transcript.
output_file Optional[str] None File path to write the export. When None, the content is returned but not written.

Returns: str — the serialised conversation content.

Raises: ValueError — if format is not "json" or "text".

JSON format structure:

[
  {
    "query": "What is RAG?",
    "answer": "Retrieval-Augmented Generation is...",
    "timestamp": "2025-01-15T10:30:00.123456",
    "retrieved_chunks": 4
  }
]

Text format structure:

Question: What is RAG?
Answer: Retrieval-Augmented Generation is...

Question: How does it work?
Answer: ...

Example:

# Export to JSON string
json_str = bot.export_conversation(format="json")

# Export to text file
bot.export_conversation(format="text", output_file="session_transcript.txt")

load_conversation_from_file()

bot.load_conversation_from_file(file_path: str) -> None

Purpose: Replace the current in-memory history by loading a previously exported or auto-saved JSON file. Useful for resuming a prior session or for testing with pre-populated history.

Parameter Type Required Description
file_path str Yes Path to the JSON file. Accepts both the auto-save format ({"turns": [...]}) and the plain export format ([...]).

Returns: None

Raises: Re-raises any Exception encountered while reading or parsing the file (after logging the error).

Example:

bot.load_conversation_from_file("sessions/project_alpha.json")
print(f"Loaded {len(bot.history)} turns")

get_history_file_path()

bot.get_history_file_path() -> str

Purpose: Return the absolute file path of the auto-save JSON file used by the current session. Useful for locating the persistence file for backup, sharing, or display in a UI.

Parameters: None.

Returns: str — absolute path to the session JSON file.

Example:

path = bot.get_history_file_path()
print(f"Session saved at: {path}")
# e.g. /home/user/project/conversations/conversation_20250115_103045.json

3.5 Diagnostics

get_conversation_summary()

bot.get_conversation_summary() -> dict

Purpose: Return a snapshot of all per-session statistics and recent activity. Ideal for dashboards, monitoring, and debugging.

Parameters: None.

Returns: dict with the following keys:

Key Type Description
total_turns int Number of turns currently in the history buffer.
total_queries int Total ask() calls made in this session (including failed ones).
context_used_count int Number of queries where history context was injected into the prompt.
total_chunks_retrieved int Cumulative number of chunks fetched across all queries.
avg_chunks_per_query float total_chunks_retrieved / total_queries; 0 if no queries yet.
history_file str Absolute path to the session JSON file.
recent_topics List[str] Truncated query text (≤50 chars + "...") for the 5 most recent turns.

Example:

import json
summary = bot.get_conversation_summary()
print(json.dumps(summary, indent=2, ensure_ascii=False))
{
  "total_turns": 7,
  "total_queries": 7,
  "context_used_count": 5,
  "total_chunks_retrieved": 28,
  "avg_chunks_per_query": 4.0,
  "history_file": "/project/conversations/conversation_20250115_103045.json",
  "recent_topics": [
    "What is RAG?",
    "How does the retrieval step work?",
    "Tell me more about it",
    "What are the limitations?",
    "Can you give an example?"
  ]
}

3.6 Async API

aask()

await bot.aask(
    query: str,
    use_history: bool = True,
    include_sources: bool = False,
    **kwargs,
) -> str

Purpose: Asynchronous wrapper around ask(). Runs the full synchronous pipeline in a thread pool via asyncio.to_thread, making it safe to await from an asyncio event loop without blocking.

Parameter Type Default Description
query str Natural-language question.
use_history bool True Inject conversation history into the prompt.
include_sources bool False Append source citations to the answer.
**kwargs Any Currently unused; reserved for future extension.

Returns: str — the generated answer (identical to ask()).

Example:

import asyncio

async def chat():
    answer = await bot.aask("What are the main topics covered?")
    print(answer)

asyncio.run(chat())

astream()

async for token in bot.astream(
    query: str,
    use_history: bool = True,
):
    print(token, end="", flush=True)

Purpose: Async streaming conversational answer. Enriches the query with history context, then:

  • If the underlying RAG system exposes an astream(query) async generator — delegates to it directly for true token-by-token streaming.
  • Otherwise — falls back to aask() and re-yields the result word by word with a zero-duration asyncio.sleep between each word, keeping the event loop yielding.
Parameter Type Default Description
query str Natural-language question.
use_history bool True Use conversation history for query enrichment before streaming.

Yields: str — individual tokens (native streaming) or words followed by a space (fallback mode).

Example:

import asyncio

async def stream_chat():
    print("Answer: ", end="")
    async for token in bot.astream("Summarise the retrieved information."):
        print(token, end="", flush=True)
    print()

asyncio.run(stream_chat())

3.7 Context Manager

ConversationalRAG supports the async context manager protocol:

async with ConversationalRAG(rag_system=my_rag, lang="en") as bot:
    answer = await bot.aask("What is the main topic of the documents?")
    print(answer)
# __aexit__ is a no-op — used as a clean scope delimiter

Note: The synchronous with statement is not supported. Instantiate directly for synchronous use.


3.8 Representation

__repr__()

repr(bot)

Purpose: Return a concise machine-readable string representation of the current session state. Useful for logging and REPL inspection.

Returns: str in the format:

ConversationalRAG(turns=<n>, queries=<n>, file=<filename.json>)

Example:

print(repr(bot))
# ConversationalRAG(turns=7, queries=7, file=conversation_20250115_103045.json)

4. ConversationHistory

from fennec_community.rag.types.conversational_rag import ConversationHistory

ConversationHistory manages the ordered list of ConversationTurn objects for a session, enforces the ring-buffer size limit, and provides live JSON persistence. It is used internally by ConversationalRAG but can also be used standalone for custom history management.


4.1 Constructor

ConversationHistory(
    max_turns: int = 10,
    history_file: Optional[str] = None,
)

Purpose: Initialise the history manager. If history_file points to an existing JSON file, its contents are loaded immediately — enabling session resumption.

Parameter Type Default Description
max_turns int 10 Maximum number of turns retained. Oldest turns are dropped when this limit is exceeded.
history_file Optional[str] None Explicit path to the JSON persistence file. When None, a timestamped file is created automatically in conversations/<timestamp>.json. The parent directory is created if absent.

Behaviour on startup:

  • If the resolved history_file exists → loads turns from it.
  • If the file does not exist → starts with an empty turn list.

Example:

from fennec_community.rag.types.conversational_rag import ConversationHistory

# New session (auto-named file)
history = ConversationHistory(max_turns=20)

# Resume prior session
history = ConversationHistory(
    max_turns=20,
    history_file="conversations/session_abc.json",
)
print(f"Loaded {len(history)} turns")

4.2 Turn Management

add_turn()

history.add_turn(
    query: str,
    answer: str,
    chunks: Optional[List] = None,
    retrieved_chunks: int = 0,
) -> None

Purpose: Append a new ConversationTurn to the history and immediately persist the updated history to the JSON file. If the buffer exceeds max_turns, the oldest turns are dropped. This is the only write path — all additions go through this method.

Parameter Type Default Description
query str The user's question for this turn.
answer str The system's answer for this turn.
chunks Optional[List] None Retrieved chunk objects (stored in memory only; not persisted to JSON because they contain non-serialisable objects).
retrieved_chunks int 0 Count of retrieved chunks — persisted to JSON for analytics.

Returns: None

Example:

history.add_turn(
    query="What is the main topic?",
    answer="The document covers machine learning fundamentals.",
    retrieved_chunks=4,
)

clear()

history.clear() -> None

Purpose: Remove all turns from the in-memory buffer. The JSON file on disk is not deleted or overwritten — use export_conversation() in ConversationalRAG if you need to archive before clearing.

Parameters: None.

Returns: None

Example:

history.clear()
print(len(history))  # → 0

4.3 Retrieval & Formatting

get_recent_turns()

history.get_recent_turns(n: int = 3) -> List[ConversationTurn]

Purpose: Return the n most recent turns from the history, in chronological order (oldest first within the slice). Returns an empty list if the history is empty.

Parameter Type Default Description
n int 3 Number of recent turns to retrieve.

Returns: List[ConversationTurn] — up to n turns, chronologically ordered.

Example:

for turn in history.get_recent_turns(5):
    print(f"Q: {turn.query[:60]}")
    print(f"A: {turn.answer[:60]}")

format_for_prompt()

history.format_for_prompt(
    n: int = 3,
    language: str = "ar",
) -> str

Purpose: Format the n most recent turns into a ready-to-inject prompt block. The result is prepended by a "Conversation History:" header and uses Q: / A: formatting for each turn, separated by double newlines.

Parameter Type Default Description
n int 3 Number of recent turns to include.
language str "ar" Target language — currently controls the header label (both "ar" and "en" produce "Conversation History:"; reserved for localisation).

Returns: str — a formatted multi-turn history block ready to be inserted into an LLM prompt. Returns "" (empty string) if the history is empty.

Output format:

Conversation History:
Q: What is RAG?
A: RAG stands for Retrieval-Augmented Generation...

Q: How does the retrieval step work?
A: The retrieval step uses a vector database...

Example:

history_block = history.format_for_prompt(n=3, language="en")
prompt = f"{history_block}\n### Question:\n{query}\n### Answer:"

4.4 Utilities

get_file_path()

history.get_file_path() -> str

Purpose: Return the absolute path of the current session's JSON persistence file. Useful for logging, UI display, or passing to another process.

Parameters: None.

Returns: str — absolute path to the JSON file.

Example:

print(history.get_file_path())
# /home/user/project/conversations/conversation_20250115_103045.json

__len__()

len(history)  # → int

Purpose: Return the current number of turns in the history buffer.

Returns: int — number of stored turns.

Example:

print(f"History has {len(history)} turn(s)")

5. ConversationTurn

from fennec_community.rag.types.conversational_rag import ConversationTurn

An immutable dataclass representing a single question-and-answer exchange within a session.


5.1 Attributes

Attribute Type Description
query str The user's question for this turn.
answer str The system's generated answer.
timestamp datetime UTC-local timestamp of when the turn was created.
chunks Optional[List] Retrieved chunk objects kept in memory (not serialised to JSON). None when loaded from a file.
retrieved_chunks int Number of chunks retrieved for this turn (persisted to JSON for analytics).

5.2 Methods

to_dict()

turn.to_dict() -> dict

Purpose: Serialise the turn to a plain dictionary suitable for JSON encoding. The chunks field is intentionally excluded because it contains non-serialisable Python objects.

Parameters: None.

Returns: dict with keys: query, answer, timestamp (ISO 8601 string), retrieved_chunks.

Example:

{
  "query": "What is the main topic?",
  "answer": "The document covers machine learning fundamentals.",
  "timestamp": "2025-01-15T10:30:00.123456",
  "retrieved_chunks": 4
}

from_dict() (class method)

ConversationTurn.from_dict(data: dict) -> ConversationTurn

Purpose: Reconstruct a ConversationTurn from a plain dictionary (e.g., parsed from a saved JSON file). The chunks attribute is always set to None when loading from disk.

Parameter Type Required Description
data dict Yes Dictionary with keys query, answer, timestamp (ISO 8601 string), and optionally retrieved_chunks.

Returns: ConversationTurn instance.

Example:

turn = ConversationTurn.from_dict({
    "query": "What is RAG?",
    "answer": "Retrieval-Augmented Generation is...",
    "timestamp": "2025-01-15T10:30:00.123456",
    "retrieved_chunks": 3,
})

__str__()

str(turn)  # → str

Purpose: Return a human-readable two-line representation of the turn. Used by export_conversation(format="text") to build the full transcript.

Returns: str in the format:

Question: <query>
Answer: <answer>

6. Data Flow Diagram

The following diagram shows the complete data flow for a single ask() call:

User calls bot.ask(query, include_sources, use_history)
│
├─ Guard: empty / single-char queryreturn warning string
│
├─ _enhance_query_with_context(query, use_history)
│    ├─ use_history=False or history empty → return query unchanged
│    └─ anaphoric keyword detected in query
│         └─ prepend last turn's queryreturn enriched_query
│
├─ rag_system.retrieve(enhanced_query)
│    ├─ no chunks → add_turn(query, "I don't have enough…") → return message
│    └─ chunks retrieved ──────────────────────────────────┐
│                                                          │
├─ rag_system.context_manager.build(enhanced_query, chunks)│
│    └─ context_text                                       │
│                                                          │
├─ _build_conversational_prompt(query, context, use_history)
│    ├─ history.format_for_prompt(n=context_turns)
│    └─ assemble: [history_block] + [context] + [instructions] + [query]
│                                                          │
├─ rag_system.llm.generate(prompt) → answer               │
│                                                          │
├─ include_sources=True                                    │
│    └─ _format_sources(chunks) → append "📚 Sources:\n…" │
│                                                          │
├─ history.add_turn(query, answer, chunks, len(chunks))    │
│    └─ _save_to_file() → JSON written to disk immediately │
│                                                          │
├─ stats updated                                           │
│                                                          │
└─ return answer ──────────────────────────────────────────┘

Anaphoric reference detection (query enrichment):

query contains any of:
  Arabic MSA:  هذا، ذلك، هذه، تلك، المذكور، السابق، أيضاً …
  Arabic EG:   ده، دي، دول، كمان، برضو …
  Arabic LEV:  هيدا، هيدي، هدول، متلو …
  Arabic Gulf: هذاك، چذا، وايد …
  Arabic MAG:  هادا، هادي، بزاف …
  Arabic IRQ:  هاذا، هاي، نفس الشغلة …
  English:     this, that, it, also, too, as well, the same,
               aforementioned, furthermore, moreover …

  → enhanced_query = f"{last_turn.query}. {query}"

7. Quick-Start Example

import asyncio
from fennec_community.rag.types.conversational_rag import ConversationalRAG, RAGConfigConverstion
from fennec_community.rag.core import RAGSystem



# ── 1. Configure ──────────────────────────────────────────────────────────
cfg = RAGConfigConverstion(
    max_history_turns=15,
    context_turns=4,
    include_sources=True,
)


my_rag =RAGSystem(
    vector_db=my_vector_db,
    llm=my_llm,
    chunker=my_chunker,
    context_manager=my_ctx_mgr,
    config=config,
    enable_query_expansion=True,
    query_expansion_variants=3,
)

# ── 2. Instantiate ────────────────────────────────────────────────────────
bot = ConversationalRAG(
    rag_system=my_rag,          # your existing RAGSystem instance
    max_history_turns=cfg.max_history_turns,
    context_turns=cfg.context_turns,
    lang="en",
    history_file="sessions/demo.json",  # resumes if file exists
)
print(repr(bot))
# ConversationalRAG(turns=0, queries=0, file=demo.json)

# ── 3. Index documents ────────────────────────────────────────────────────
my_rag.add_texts({
    "doc1": "BERT is pre-trained using masked language modelling.",
    "doc2": "GPT uses a causal (left-to-right) language model objective.",
})

# ── 4. Synchronous multi-turn chat ────────────────────────────────────────
answer1 = bot.ask("What pre-training objective does BERT use?")
print(f"A1: {answer1}\n")

# "it" triggers query enrichment → resolved against turn 1's question
answer2 = bot.ask("How does it compare to GPT?")
print(f"A2: {answer2}\n")

# Turn off sources for a quick follow-up
answer3 = bot.ask("Give me one example.", include_sources=False)
print(f"A3: {answer3}\n")

# ── 5. Session diagnostics ────────────────────────────────────────────────
import json
print(json.dumps(bot.get_conversation_summary(), indent=2, ensure_ascii=False))

# ── 6. Export session ─────────────────────────────────────────────────────
bot.export_conversation(format="json",  output_file="sessions/demo_export.json")
bot.export_conversation(format="text",  output_file="sessions/demo_transcript.txt")

# ── 7. Reload session into another instance ───────────────────────────────
bot2 = ConversationalRAG(rag_system=my_rag, lang="en")
bot2.load_conversation_from_file("sessions/demo_export.json")
print(f"Resumed session with {len(bot2.history)} turns")

# ── 8. Reset and start fresh ──────────────────────────────────────────────
bot.reset_conversation()
print(repr(bot))  # ConversationalRAG(turns=0, ...)

# ── 9. Async query ────────────────────────────────────────────────────────
async def async_demo():
    answer = await bot.aask("What are the differences between BERT and GPT?")
    print(f"Async answer: {answer}")

asyncio.run(async_demo())

# ── 10. Async streaming ───────────────────────────────────────────────────
async def stream_demo():
    print("Streaming: ", end="")
    async for token in bot.astream("Summarise the key points."):
        print(token, end="", flush=True)
    print()

asyncio.run(stream_demo())

# ── 11. Locate the auto-save file ────────────────────────────────────────
print(f"Session file: {bot.get_history_file_path()}")

Simple Real Exaple


from fennec_community.llm import MistralInterface
from fennec_community.document_loaders import TextLoader
from fennec_community.vector_database import FAISSVectorDatabase
from fennec_community.chunks import ArabicTextChunker
from fennec_community.context import ContextManager
from fennec_community.embeddings import OllamaEmbedder
from fennec_community.rag.core import RAGSystem
from fennec_community.rag.types.conversational_rag import ConversationalRAG

glob = RAGSystem(llm=llm, vector_db=vector_db,chunker=chunker, context_manager=context_manager)
reader = TextLoader("./data_kn/faq.txt").load()
glob.add_documents(reader)

conv_rag = ConversationalRAG(
    rag_system=glob,
    max_history_turns=10,
    context_turns=3,
    lang="ar",
    history_file="my_conversation.json",
    auto_save=True,
)

conv_rag.ask("ما هي طرق الدفع؟")
Source: community/rag/conversional_rag.md