Prompt Modular
Table of Contents
- Overview
- Architecture
- Quick Start
- Data Types & Enumerations
- PromptEngine
- ContextManager
- GuardrailEngine
- GuardrailLibrary
- PromptOptimizer
- BuiltPrompt Methods
- ContextResult Properties
- Strategy System
- PromptMetrics Methods
- Integration Examples
- Error Reference
Overview
The prompt module is a production-grade, AI-ready prompt orchestration engine designed for Retrieval-Augmented Generation (RAG) systems. It transforms raw user queries and retrieved documents into highly optimized, context-aware prompts that can be fed directly into any LLM API (OpenAI, Anthropic, or any compatible provider).
Key Capabilities
| Capability | Description |
|---|---|
| Auto-detection | Automatically infers prompt type, strategy, and complexity from the query and documents |
| Context Engineering | Intelligently orders, deduplicates, and token-budget-manages retrieved documents |
| Guardrails | Injects anti-hallucination, citation, safety, and scope-control instructions |
| Optimization | Reduces token count via whitespace normalization, filler removal, and deduplication |
| Caching | Hash-based prompt cache with configurable TTL and size |
| Observability | Full metrics, per-prompt trace log, and event hook system |
| Adaptive Feedback | Records quality signals and recommends the historically best-performing strategy |
| Multi-LLM Output | Produces OpenAI-compatible and Anthropic-compatible payloads from the same prompt |
Architecture
PromptEngine ← primary public entry point
│
├── PromptBuilder ← resolves strategy, coordinates subsystems
│ ├── ContextManager ← document processing & token budgeting
│ ├── GuardrailEngine ← safety & quality instruction injection
│ └── Strategy (7) ← prompt template construction
│ SimpleStrategy
│ ChainOfThoughtStrategy
│ MultiHopStrategy
│ SelfConsistentStrategy
│ StepBackStrategy
│ ReActStrategy
│ LeastToMostStrategy
│
└── PromptOptimizer ← token reduction post-buildData flow:query + documents + config → PromptEngine.build() → BuiltPrompt → LLM API
Quick Start
from fennec_community.prompt import PromptEngine, Document
# 1. Create the engine (production defaults)
engine = PromptEngine()
# 2. Build a prompt
prompt = engine.build(
query = "What caused the 2008 financial crisis?",
documents = [
{"content": "The crisis was triggered by...", "source": "wiki", "score": 0.92},
{"content": "Subprime mortgage lending...", "source": "fed_report", "score": 0.87},
],
strategy = "multi_hop",
output_format = "json",
)
# 3a. Use with OpenAI
response = openai_client.chat.completions.create(
model = "gpt-4o",
messages = prompt.to_messages(),
)
# 3b. Use with Anthropic
payload = prompt.to_anthropic()
response = anthropic_client.messages.create(
**payload,
model = "claude-opus-4-20250514",
max_tokens = 1024,
)Data Types & Enumerations
Enumerations
All enumerations inherit from str, Enum, so their values can be passed as plain strings wherever an enum is expected.
PromptType
Defines the canonical archetype of the prompt being built. The engine uses this to select the default strategy and configure guardrails.
| Value | String | Description |
|---|---|---|
PromptType.QA |
"qa" |
Grounded question-answering from context |
PromptType.CONVERSATIONAL |
"conversational" |
Multi-turn dialogue |
PromptType.REASONING |
"reasoning" |
Chain-of-thought / multi-hop reasoning |
PromptType.AGENT |
"agent" |
ReAct / plan-and-execute agentic tasks |
PromptType.TOOL_USE |
"tool_use" |
Function-calling / tool description |
PromptType.SAFETY |
"safety" |
Content moderation / safety guard |
PromptType.SUMMARIZATION |
"summarization" |
Document summarization |
PromptType.EXTRACTION |
"extraction" |
Structured data extraction |
PromptType.COMPARISON |
"comparison" |
Compare / contrast multiple documents |
PromptStrategy
Controls the reasoning template applied to the prompt. Different strategies produce structurally different prompts that guide the LLM's reasoning approach.
| Value | String | Best For |
|---|---|---|
PromptStrategy.SIMPLE |
"simple" |
Single-fact lookup, direct Q&A |
PromptStrategy.CHAIN_OF_THOUGHT |
"cot" |
Reasoning tasks, complex explanations |
PromptStrategy.MULTI_HOP |
"multi_hop" |
Multi-document, multi-step questions |
PromptStrategy.SELF_CONSISTENT |
"self_consistent" |
High-stakes answers requiring verification |
PromptStrategy.STEP_BACK |
"step_back" |
Abstract-first reasoning |
PromptStrategy.REACT |
"react" |
Agentic tasks with tool calls |
PromptStrategy.LEAST_TO_MOST |
"least_to_most" |
Math, logic, progressive sub-problems |
OutputFormat
Specifies the format in which the LLM should return its answer. The engine injects the appropriate formatting instruction into the guardrail block.
| Value | String | Description |
|---|---|---|
OutputFormat.TEXT |
"text" |
Free-form plain text (default) |
OutputFormat.JSON |
"json" |
Structured JSON with answer, sources, confidence |
OutputFormat.MARKDOWN |
"markdown" |
Markdown with headers and bullets |
OutputFormat.BULLET_LIST |
"bullet_list" |
Concise bulleted list |
OutputFormat.STRUCTURED |
"structured" |
Domain-specific schema (requires output_schema) |
OutputFormat.CITATION |
"citation" |
Prose with inline [1] citations and Sources section |
QueryComplexity
Signals the complexity level of the query. Affects strategy selection (complex queries auto-upgrade to CoT or Multi-Hop) and guardrail selection (complex/expert adds a self-check guardrail).
| Value | String | Description |
|---|---|---|
QueryComplexity.SIMPLE |
"simple" |
Single-fact lookup |
QueryComplexity.MODERATE |
"moderate" |
Some reasoning required |
QueryComplexity.COMPLEX |
"complex" |
Multi-hop or multi-document |
QueryComplexity.EXPERT |
"expert" |
Deep domain knowledge required |
UserProfile
Tailors the tone and language style of the system prompt to the target audience.
| Value | String | Style Applied |
|---|---|---|
UserProfile.GENERAL |
"general" |
Clear, accessible, jargon-free |
UserProfile.TECHNICAL |
"technical" |
Precise technical language with full details |
UserProfile.ACADEMIC |
"academic" |
Formal academic tone with rigorous evidence |
UserProfile.EXECUTIVE |
"executive" |
Extremely concise, business impact first |
Document
Module: prompt.types
A dataclass representing a single retrieved passage from your vector store or retrieval system.
@dataclass
class Document:
content: str
source: str = ""
score: float = 1.0
metadata: Dict[str, Any] = field(default_factory=dict)
chunk_id: Optional[str] = None
language: str = "en"| Field | Type | Default | Description |
|---|---|---|---|
content |
str |
required | The text content of the passage |
source |
str |
"" |
Source identifier (URL, filename, document ID) used in citations |
score |
float |
1.0 |
Relevance score from retrieval system (higher = more relevant) |
metadata |
Dict[str, Any] |
{} |
Arbitrary key-value metadata (page number, author, date, etc.) |
chunk_id |
Optional[str] |
None |
Unique identifier for the chunk within a document |
language |
str |
"en" |
Language code of the document content |
Usage:
doc = Document(
content = "The Federal Reserve raised interest rates...",
source = "fed_report_2023.pdf",
score = 0.94,
metadata = {"page": 12, "author": "Federal Reserve"},
chunk_id = "fed_2023_p12_chunk_3",
)Note:
Documentobjects can also be created implicitly byPromptEngine.build()when you pass plaindictorstrobjects in thedocumentslist.
Message
Module: prompt.types
A dataclass representing a single turn in a conversation history.
@dataclass
class Message:
role: Literal["system", "user", "assistant"]
content: str| Field | Type | Description |
|---|---|---|
role |
Literal["system", "user", "assistant"] |
The speaker role |
content |
str |
The text of the message |
Usage:
history = [
Message(role="user", content="What is inflation?"),
Message(role="assistant", content="Inflation is the rate at which..."),
Message(role="user", content="And what causes it?"),
]PromptRequest
Module: prompt.types
A dataclass that encapsulates all configuration and context the engine needs to build a prompt. This is the internal canonical request object; most users pass arguments directly to PromptEngine.build() instead.
@dataclass
class PromptRequest:
# Required
query: str
# Context
documents: List[Document] = []
memory: List[Message] = []
# Intent / routing
prompt_type: PromptType = PromptType.QA
strategy: PromptStrategy = PromptStrategy.SIMPLE
output_format: OutputFormat = OutputFormat.TEXT
complexity: QueryComplexity = QueryComplexity.SIMPLE
user_profile: UserProfile = UserProfile.GENERAL
# Constraints
max_context_tokens: int = 3000
max_answer_tokens: int = 512
output_schema: Optional[Dict] = None
language: str = "en"
# Feature flags
enable_guardrails: bool = True
enable_cot: bool = False
enable_citations: bool = True
enable_uncertainty: bool = True
# Metadata
session_id: str = ""
user_id: str = ""
trace_id: str = ""
extra: Dict[str, Any] = {}| Field | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The user's question or task |
documents |
List[Document] |
[] |
Retrieved passages for context |
memory |
List[Message] |
[] |
Conversation history |
prompt_type |
PromptType |
QA |
Prompt archetype |
strategy |
PromptStrategy |
SIMPLE |
Reasoning strategy |
output_format |
OutputFormat |
TEXT |
Desired response format |
complexity |
QueryComplexity |
SIMPLE |
Query complexity level |
user_profile |
UserProfile |
GENERAL |
Target audience |
max_context_tokens |
int |
3000 |
Token budget for injected context |
max_answer_tokens |
int |
512 |
Hint to the LLM about expected answer length |
output_schema |
Optional[Dict] |
None |
JSON schema for STRUCTURED output format |
language |
str |
"en" |
Response language code |
enable_guardrails |
bool |
True |
Inject grounding/safety instructions |
enable_cot |
bool |
False |
Force chain-of-thought (auto-set by strategy) |
enable_citations |
bool |
True |
Request inline source citations |
enable_uncertainty |
bool |
True |
Ask model to express uncertainty honestly |
session_id |
str |
"" |
Session identifier for tracing |
user_id |
str |
"" |
User identifier for logging |
trace_id |
str |
"" |
Trace identifier for distributed tracing |
extra |
Dict[str, Any] |
{} |
Arbitrary extension data (e.g., tools list for agents) |
BuiltPrompt
Module: prompt.types
The output of the entire build pipeline. Contains the fully assembled prompt, token accounting, guardrail metadata, and optimization notes. This object is ready to be passed directly to any LLM client.
@dataclass
class BuiltPrompt:
system_prompt: str
user_prompt: str
messages: List[Message]
prompt_type: PromptType
strategy: PromptStrategy
output_format: OutputFormat
estimated_tokens: int
context_tokens_used: int
documents_included: int
documents_truncated: int
guardrails_applied: List[str]
tokens_saved: int
optimization_notes: List[str]
session_id: str
trace_id: str| Field | Type | Description |
|---|---|---|
system_prompt |
str |
The assembled system prompt |
user_prompt |
str |
The assembled user prompt with context and question |
messages |
List[Message] |
Full message list (system + history + user) |
prompt_type |
PromptType |
The effective prompt type used |
strategy |
PromptStrategy |
The effective strategy used |
output_format |
OutputFormat |
The output format enforced |
estimated_tokens |
int |
Approximate total token count of the prompt |
context_tokens_used |
int |
Tokens consumed by the context block specifically |
documents_included |
int |
Number of documents successfully injected |
documents_truncated |
int |
Number of documents excluded due to token budget |
guardrails_applied |
List[str] |
Names of all guardrails injected (for observability) |
tokens_saved |
int |
Tokens saved by the optimizer |
optimization_notes |
List[str] |
Human-readable notes from the optimizer pipeline |
session_id |
str |
Echo of the request session ID |
trace_id |
str |
Echo of the request trace ID |
ContextResult
Module: prompt.context_manager
The output of ContextManager.build(). Contains the ready-to-inject context block and rich metadata about what was included, excluded, and deduplicated.
| Field | Type | Description |
|---|---|---|
context_block |
str |
The fully formatted, ready-to-inject context string |
included_docs |
List[Document] |
Documents that fit within the token budget |
excluded_docs |
List[Document] |
Documents excluded due to token budget |
citation_map |
Dict[int, str] |
Maps citation index (e.g., 1) to source identifier |
tokens_used |
int |
Token count of the assembled context block |
tokens_budget |
int |
The token budget that was enforced |
duplicates_removed |
int |
Number of near-duplicate documents removed |
truncated |
bool |
Whether any document was partially truncated to fit |
PromptMetrics
Module: prompt.prompt_engine
A dataclass that accumulates engine-wide performance statistics across all calls to build(). Accessed via the engine.metrics property.
| Field | Type | Description |
|---|---|---|
total_builds |
int |
Total number of prompts built |
total_tokens |
int |
Cumulative token count across all builds |
total_tokens_saved |
int |
Cumulative tokens saved by the optimizer |
cache_hits |
int |
Number of times the cache was successfully hit |
builds_by_type |
Dict[str, int] |
Count of builds broken down by PromptType |
builds_by_strategy |
Dict[str, int] |
Count of builds broken down by PromptStrategy |
avg_build_ms |
float |
Rolling average build time in milliseconds |
FeedbackEntry
Module: prompt.prompt_engine
A dataclass that stores a single quality feedback signal for a previously built prompt, used by the adaptive feedback loop.
| Field | Type | Description |
|---|---|---|
trace_id |
str |
The trace ID of the prompt this feedback refers to |
prompt_type |
str |
The prompt type of the rated prompt |
strategy |
str |
The strategy used for the rated prompt |
quality_score |
float |
Quality rating from 0.0 (bad) to 1.0 (perfect) |
notes |
str |
Optional free-text notes about the quality |
Guardrail
Module: prompt.guardrails
A dataclass defining a single guardrail instruction that can be injected into the system prompt.
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
required | Unique identifier (used for observability and deduplication) |
instruction |
str |
required | The actual instruction text injected into the prompt |
priority |
int |
50 |
Injection order — higher priority instructions appear first |
PromptEngine
Module: prompt.prompt_engine
Import: from fennec_community.prompt import PromptEngine
The primary entry point for the entire system. Coordinates all subsystems, manages caching, collects metrics, and exposes the adaptive feedback loop.
PromptEngine Constructor
PromptEngine(
context_manager: Optional[ContextManager] = None,
guardrail_engine: Optional[GuardrailEngine] = None,
extra_guardrails: Optional[List[Guardrail]] = None,
enable_cache: bool = True,
cache_ttl_sec: int = 300,
max_cache_size: int = 256,
enable_auto_detect: bool = True,
memory_store: Optional[Any] = None,
cache_store: Optional[Any] = None,
router: Optional[Any] = None,
)Purpose: Instantiates the engine and all its subsystems. All parameters are optional — the defaults are production-ready.
| Parameter | Type | Default | Description |
|---|---|---|---|
context_manager |
Optional[ContextManager] |
None |
Custom context manager instance. Uses default ContextManager() if not provided |
guardrail_engine |
Optional[GuardrailEngine] |
None |
Custom guardrail engine. Uses default GuardrailEngine() if not provided |
extra_guardrails |
Optional[List[Guardrail]] |
None |
Additional custom Guardrail objects appended to every request |
enable_cache |
bool |
True |
Enable in-process SHA-256 hash-based prompt caching |
cache_ttl_sec |
int |
300 |
Cache time-to-live in seconds (5 minutes by default) |
max_cache_size |
int |
256 |
Maximum number of cached prompts; oldest is evicted when exceeded |
enable_auto_detect |
bool |
True |
Auto-detect prompt type, strategy, and complexity from query content |
memory_store |
Optional[Any] |
None |
External memory store handle (passed through for integration) |
cache_store |
Optional[Any] |
None |
External cache store handle (passed through for integration) |
router |
Optional[Any] |
None |
External router handle (passed through for integration) |
Example:
# Default — production-ready
engine = PromptEngine()
# Custom — add a domain-specific guardrail, disable cache
from fennec_community.prompt import PromptEngine, Guardrail
medical_guardrail = Guardrail(
name = "medical_disclaimer",
instruction = "Always recommend consulting a licensed physician. "
"Do not provide specific medical diagnoses.",
priority = 120,
)
engine = PromptEngine(
extra_guardrails = [medical_guardrail],
enable_cache = False,
)build()
engine.build(
query: str,
documents: Optional[List[Union[Document, Dict, str]]] = None,
memory: Optional[List[Union[Message, Dict]]] = None,
prompt_type: Union[PromptType, str] = PromptType.QA,
strategy: Union[PromptStrategy, str] = PromptStrategy.SIMPLE,
output_format: Union[OutputFormat, str] = OutputFormat.TEXT,
complexity: Union[QueryComplexity, str] = QueryComplexity.SIMPLE,
user_profile: Union[UserProfile, str] = UserProfile.GENERAL,
max_context_tokens: int = 3000,
max_answer_tokens: int = 512,
output_schema: Optional[Dict] = None,
language: str = "en",
enable_guardrails: bool = True,
enable_citations: bool = True,
enable_uncertainty: bool = True,
session_id: str = "",
user_id: str = "",
trace_id: str = "",
extra: Optional[Dict] = None,
) -> BuiltPromptPurpose: The core method of the entire system. Accepts a query and supporting context, runs the full build pipeline (auto-detection → context engineering → guardrails → strategy → optimization → caching), and returns a BuiltPrompt ready for any LLM API.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str |
required | The user's question or task. This is the only mandatory argument |
documents |
List[Document | Dict | str] |
None |
Retrieved passages. Accepts Document objects, plain dicts ({"content": ..., "source": ..., "score": ...}), or raw strings |
memory |
List[Message | Dict] |
None |
Conversation history. Accepts Message objects or dicts ({"role": ..., "content": ...}) |
prompt_type |
PromptType | str |
"qa" |
The prompt archetype. Can be overridden by auto-detection |
strategy |
PromptStrategy | str |
"simple" |
The reasoning strategy. Can be overridden by auto-detection and complexity upgrade |
output_format |
OutputFormat | str |
"text" |
The desired LLM response format |
complexity |
QueryComplexity | str |
"simple" |
Query complexity. Affects strategy selection and guardrails |
user_profile |
UserProfile | str |
"general" |
Target audience profile. Adjusts system prompt tone |
max_context_tokens |
int |
3000 |
Hard token budget for injected document context |
max_answer_tokens |
int |
512 |
Instructs the LLM about expected answer length |
output_schema |
Optional[Dict] |
None |
JSON Schema dict — required when output_format="structured" |
language |
str |
"en" |
BCP-47 language code for the response (e.g., "ar", "fr", "de") |
enable_guardrails |
bool |
True |
When True, injects grounding, no-fabrication, PII-protection, and scope guardrails |
enable_citations |
bool |
True |
When True and documents are provided, adds citation instruction |
enable_uncertainty |
bool |
True |
When True, instructs the model to say "I don't know" rather than guess |
session_id |
str |
"" |
Session identifier, echoed in BuiltPrompt and trace log |
user_id |
str |
"" |
User identifier for logging purposes |
trace_id |
str |
"" |
Distributed trace ID, echoed in BuiltPrompt and trace log |
extra |
Optional[Dict] |
None |
Extension payload. Use extra={"tools": [...]} for agent/tool-use prompts |
Returns: BuiltPrompt — the fully assembled, optimized prompt object.
Raises: ValueError — if an invalid string value is passed for an enum parameter (e.g., strategy="invalid_strategy").
Build Pipeline (internal order):
- Normalize inputs — convert dicts/strings to typed objects
- Auto-detect — infer
prompt_type,strategy,complexityfrom query content (ifenable_auto_detect=True) - Cache check — return cached
BuiltPromptif a matching prompt was recently built PromptBuilder.build()— run the full build pipeline- Cache store — save the result for future identical requests
- Metrics — update
PromptMetricscounters - Trace log — append a trace entry
- Fire hooks — emit the
"prompt.built"event
Example — minimal:
prompt = engine.build(query="What is RAG?")Example — full configuration:
prompt = engine.build(
query = "Compare the economic impacts of COVID-19 in the US vs EU.",
documents = retrieved_docs,
memory = chat_history,
prompt_type = "comparison",
strategy = "multi_hop",
output_format = "markdown",
complexity = "complex",
user_profile = "executive",
max_context_tokens = 4000,
max_answer_tokens = 1024,
language = "en",
enable_citations = True,
session_id = "sess_abc123",
trace_id = "trace_xyz789",
)Example — structured output with schema:
schema = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"key_points": {"type": "array", "items": {"type": "string"}},
"sources": {"type": "array", "items": {"type": "string"}},
}
}
prompt = engine.build(
query = "Summarize the key findings.",
documents = docs,
output_format = "structured",
output_schema = schema,
)Auto-Detection Rules (when enable_auto_detect=True):
| Signal | Result |
|---|---|
| Query contains "summarize" / "overview" | prompt_type → SUMMARIZATION |
| Query contains "compare" / "versus" / "vs" | prompt_type → COMPARISON |
| Query contains "extract" / "list all" | prompt_type → EXTRACTION |
| Query contains "why" / "how" / "explain" | prompt_type → REASONING |
| Query > 40 words OR > 5 documents | complexity → COMPLEX |
| Query > 20 words OR > 2 documents | complexity → MODERATE |
| Query has "and" / "also" / "furthermore" AND multiple docs | strategy → MULTI_HOP |
build_from_request()
engine.build_from_request(request: PromptRequest) -> BuiltPromptPurpose: Builds a prompt from a pre-constructed PromptRequest object instead of individual keyword arguments. Useful when you need to construct, serialize, or batch requests programmatically.
| Parameter | Type | Description |
|---|---|---|
request |
PromptRequest |
A fully populated PromptRequest dataclass instance |
Returns: BuiltPrompt
Example:
from fennec_community.prompt import PromptRequest, PromptType, PromptStrategy
request = PromptRequest(
query = "What are the side effects of ibuprofen?",
documents = my_docs,
prompt_type = PromptType.QA,
strategy = PromptStrategy.CHAIN_OF_THOUGHT,
user_profile = UserProfile.TECHNICAL,
)
prompt = engine.build_from_request(request)record_feedback()
engine.record_feedback(
trace_id: str,
quality_score: float,
notes: str = "",
) -> NonePurpose: Records a quality signal for a previously built prompt. Feedback is stored in an in-memory circular buffer (max 1000 entries) and used by adaptive_strategy_for() to recommend the best strategy for future similar prompts.
| Parameter | Type | Description |
|---|---|---|
trace_id |
str |
The trace_id of the prompt being rated (from BuiltPrompt.trace_id) |
quality_score |
float |
Quality score from 0.0 (completely wrong / unhelpful) to 1.0 (perfect) |
notes |
str |
Optional human-readable notes (e.g., "Answer was too verbose") |
Returns: None
Example:
# After the LLM response is reviewed
engine.record_feedback(
trace_id = prompt.trace_id,
quality_score = 0.85,
notes = "Good answer but missed one key point.",
)adaptive_strategy_for()
engine.adaptive_strategy_for(prompt_type: PromptType) -> Optional[PromptStrategy]Purpose: Analyses accumulated feedback to recommend the historically best-performing strategy for a given PromptType. Returns None if fewer than 5 feedback entries exist for that type (insufficient data). Use this to automatically select the strategy that your users have rated highest over time.
| Parameter | Type | Description |
|---|---|---|
prompt_type |
PromptType |
The prompt type to query the feedback history for |
Returns: Optional[PromptStrategy] — the best-performing strategy, or None if data is insufficient.
Example:
best = engine.adaptive_strategy_for(PromptType.QA)
if best:
prompt = engine.build(query=user_query, strategy=best)
else:
prompt = engine.build(query=user_query) # use defaultsmetrics (property)
engine.metrics -> Dict[str, Any]Purpose: Returns a snapshot of all engine-wide performance metrics as a plain dictionary, suitable for logging, dashboarding, or alerting.
Returns: Dict[str, Any] with the following keys:
| Key | Type | Description |
|---|---|---|
total_builds |
int |
Total number of build() calls |
total_tokens |
int |
Cumulative token usage |
total_tokens_saved |
int |
Cumulative tokens saved by optimizer |
cache_hits |
int |
Total cache hits |
avg_build_ms |
float |
Rolling average build latency in milliseconds |
cache_hit_rate_pct |
float |
Cache hit percentage (0.0 – 100.0) |
builds_by_type |
Dict[str, int] |
Build count per PromptType |
builds_by_strategy |
Dict[str, int] |
Build count per PromptStrategy |
Example:
import json
print(json.dumps(engine.metrics, indent=2))
# {
# "total_builds": 142,
# "total_tokens": 284000,
# "total_tokens_saved": 12400,
# "cache_hits": 38,
# "avg_build_ms": 4.72,
# "cache_hit_rate_pct": 26.8,
# "builds_by_type": {"qa": 90, "reasoning": 42, "summarization": 10},
# "builds_by_strategy": {"simple": 60, "cot": 52, "multi_hop": 30}
# }get_trace_log()
engine.get_trace_log(last_n: int = 20) -> List[Dict[str, Any]]Purpose: Returns the most recent trace entries from the internal trace log. Each entry captures the full context of a single build() call — inputs, outputs, token counts, latency, and guardrails applied. Useful for debugging and observability dashboards.
| Parameter | Type | Default | Description |
|---|---|---|---|
last_n |
int |
20 |
Number of most recent trace entries to return |
Returns: List[Dict[str, Any]] — each dict contains:
| Key | Description |
|---|---|
trace_id |
The trace identifier |
session_id |
The session identifier |
query_preview |
First 80 characters of the query |
prompt_type |
Effective prompt type used |
strategy |
Effective strategy used |
output_format |
Output format used |
docs_included |
Number of documents included |
docs_truncated |
Number of documents excluded |
estimated_tokens |
Total estimated token count |
tokens_saved |
Tokens saved by optimizer |
guardrails |
List of guardrail names applied |
elapsed_ms |
Build time in milliseconds |
ts |
Unix timestamp of the build |
Example:
traces = engine.get_trace_log(last_n=5)
for t in traces:
print(f"{t['trace_id']} | {t['prompt_type']} | {t['elapsed_ms']}ms | tokens={t['estimated_tokens']}")reset_metrics()
engine.reset_metrics() -> NonePurpose: Resets all accumulated PromptMetrics counters to zero. Useful for periodic metric resets in long-running services (e.g., reset at the start of each hour for per-hour dashboards).
Returns: None
Example:
# Reset every hour in a scheduled job
engine.reset_metrics()on()
engine.on(event: str, callback: Callable) -> NonePurpose: Registers an event hook that is called whenever the specified event is fired. This is the primary extensibility mechanism — use hooks to integrate with external monitoring systems, logging pipelines, or custom business logic without modifying the engine.
| Parameter | Type | Description |
|---|---|---|
event |
str |
The event name to subscribe to (currently: "prompt.built") |
callback |
Callable |
A callable invoked with keyword arguments when the event fires |
Returns: None
Available Events:
| Event | Fired When | Callback kwargs |
|---|---|---|
"prompt.built" |
After every successful build() call |
prompt: BuiltPrompt, request: PromptRequest |
Example:
def log_to_datadog(prompt: BuiltPrompt, request: PromptRequest):
datadog.metric("prompt.tokens", prompt.estimated_tokens, tags=[
f"type:{prompt.prompt_type.value}",
f"strategy:{prompt.strategy.value}",
])
engine.on("prompt.built", log_to_datadog)ContextManager
Module: prompt.context_manager
Import: from fennec_community.prompt import ContextManager, ContextResult
Transforms a raw list of retrieved documents into an optimally ordered, deduplicated, token-budget-aware context block ready for injection. Used internally by PromptBuilder but can also be used standalone.
ContextManager Constructor
ContextManager(
dedup_threshold: float = 0.85,
min_doc_tokens: int = 5,
use_lost_in_middle: bool = True,
summarize_overflow: bool = False,
max_memory_messages: int = 10,
)Purpose: Configures the context engineering pipeline.
| Parameter | Type | Default | Description |
|---|---|---|---|
dedup_threshold |
float |
0.85 |
Jaccard similarity threshold above which two documents are considered duplicates and the lower-scoring one is removed |
min_doc_tokens |
int |
5 |
Documents with fewer estimated tokens than this are silently skipped |
use_lost_in_middle |
bool |
True |
Reorders documents to place the most relevant at the start and end of the context block, combating the "lost-in-the-middle" attention problem |
summarize_overflow |
bool |
False |
If True, documents that don't fit the token budget are summarized instead of excluded (stub — not yet implemented) |
max_memory_messages |
int |
10 |
Maximum number of conversation turns to include in the memory block |
ContextManager.build()
context_manager.build(request: PromptRequest) -> ContextResultPurpose: Runs the full context engineering pipeline on the documents in request.documents and returns a ContextResult with the formatted, ready-to-inject context block.
Pipeline steps (internal order):
- Filter documents shorter than
min_doc_tokens - Sort by relevance score (descending)
- Deduplicate using exact hash + Jaccard shingle similarity
- Reorder for lost-in-the-middle mitigation (if enabled)
- Enforce token budget — partially truncate documents that overflow
- Format the context block with numbered source headers and relevance labels
- Build the citation map (
{1: "source_a", 2: "source_b", ...})
| Parameter | Type | Description |
|---|---|---|
request |
PromptRequest |
The full prompt request (uses request.documents and request.max_context_tokens) |
Returns: ContextResult
Example (standalone usage):
from fennec_community.prompt import ContextManager, Document, PromptRequest
cm = ContextManager(dedup_threshold=0.80, use_lost_in_middle=True)
request = PromptRequest(query="What is inflation?", documents=my_docs)
result = cm.build(request)
print(f"Context utilization: {result.utilization_pct}%")
print(f"Documents included: {len(result.included_docs)}")
print(f"Duplicates removed: {result.duplicates_removed}")
print(result.context_block)format_memory()
context_manager.format_memory(
memory: List[Message],
max_turns: Optional[int] = None,
) -> strPurpose: Converts a list of Message objects (conversation history) into a compact, formatted text block for injection into the prompt. Limits history to the most recent max_turns turns to control token usage.
| Parameter | Type | Default | Description |
|---|---|---|---|
memory |
List[Message] |
required | The full conversation history |
max_turns |
Optional[int] |
None |
Maximum number of conversation turns to include. Defaults to max_memory_messages set in the constructor |
Returns: str — formatted conversation history, or an empty string if memory is empty.
Output format:
User: What is inflation?
Assistant: Inflation is the rate at which...
User: And what causes it?Example:
memory_block = cm.format_memory(memory=chat_history, max_turns=5)GuardrailEngine
Module: prompt.guardrails
Import: from fennec_community.prompt import GuardrailEngine, Guardrail
Selects and assembles safety and quality instructions that are injected into the system prompt. Guardrails are applied before generation, not as post-processing filters.
GuardrailEngine Constructor
GuardrailEngine(extra_guardrails: Optional[List[Guardrail]] = None)Purpose: Creates a guardrail engine. Optionally accepts custom guardrails that will be appended to every request in addition to the automatically selected standard guardrails.
| Parameter | Type | Default | Description |
|---|---|---|---|
extra_guardrails |
Optional[List[Guardrail]] |
None |
Custom Guardrail objects always appended to the guardrail block |
GuardrailEngine.build()
guardrail_engine.build(request: PromptRequest) -> tuple[str, List[str]]Purpose: Selects all applicable guardrails for the given request, sorts them by priority, deduplicates by name, and renders them into a single formatted instruction block.
| Parameter | Type | Description |
|---|---|---|
request |
PromptRequest |
The prompt request (used to determine which guardrails apply) |
Returns: tuple[str, List[str]]
[0]— The rendered guardrail instruction block (injected into the system prompt)[1]— List of applied guardrail names (for observability, stored inBuiltPrompt.guardrails_applied)
Guardrail Selection Logic:
| Condition | Guardrails Applied |
|---|---|
| Always | safe_output, concise |
enable_guardrails=True AND documents present |
grounding, no_fabrication |
enable_uncertainty=True |
uncertainty |
enable_citations=True AND documents present |
cite_sources |
Prompt type is not AGENT or TOOL_USE |
stay_on_topic |
enable_guardrails=True |
pii_protection |
Strategy is COT, MULTI_HOP, or LEAST_TO_MOST |
show_reasoning |
Complexity is COMPLEX or EXPERT |
self_check |
OutputFormat.JSON |
JSON format instruction |
OutputFormat.BULLET_LIST |
Bullet list format instruction |
OutputFormat.MARKDOWN |
Markdown format instruction |
OutputFormat.CITATION |
Citation format instruction |
OutputFormat.STRUCTURED |
Schema-based format instruction |
GuardrailLibrary
Module: prompt.guardrails
Import: from fennec_community.prompt import GuardrailLibrary
A catalogue of pre-built guardrail objects. All guardrails are class-level attributes (singletons). Use these when constructing custom GuardrailEngine instances or passing extra_guardrails to PromptEngine.
| Attribute | Name | Priority | Purpose |
|---|---|---|---|
GuardrailLibrary.SAFE_OUTPUT |
safe_output |
110 | Blocks harmful, offensive, or discriminatory outputs |
GuardrailLibrary.PII_PROTECTION |
pii_protection |
105 | Prevents exposure of personal identifiable information |
GuardrailLibrary.GROUNDING |
grounding |
100 | Forces answers to stay within provided context only |
GuardrailLibrary.NO_FABRICATION |
no_fabrication |
95 | Prohibits invented facts, statistics, or citations |
GuardrailLibrary.UNCERTAINTY |
uncertainty |
90 | Requires honest "I don't know" responses when unsure |
GuardrailLibrary.CITE_SOURCES |
cite_sources |
80 | Requires bracketed [1] inline citations |
GuardrailLibrary.STAY_ON_TOPIC |
stay_on_topic |
70 | Prevents scope drift and unsolicited opinions |
GuardrailLibrary.NO_PERSONAL_OPINIONS |
no_personal_opinions |
60 | Prevents editorializing |
GuardrailLibrary.SHOW_REASONING |
show_reasoning |
50 | Requires step-by-step reasoning before answer |
GuardrailLibrary.SELF_CHECK |
self_check |
45 | Adds a 3-point self-verification step before answering |
GuardrailLibrary.CONCISE |
concise |
40 | Strips preamble filler and gets to the point |
GuardrailLibrary.NO_MARKDOWN_LEAKAGE |
no_markdown_leakage |
30 | Prevents unsolicited markdown formatting |
Example — create a custom guardrail:
from fennec_community.prompt import Guardrail, GuardrailLibrary
legal_guardrail = Guardrail(
name = "legal_disclaimer",
instruction = "This is not legal advice. Always recommend consulting a qualified attorney.",
priority = 115, # higher than safe_output, applied first
)
engine = PromptEngine(extra_guardrails=[legal_guardrail])PromptOptimizer
Module: prompt.optimizer
Import: from fennec_community.prompt import PromptOptimizer
Applies a pipeline of lightweight, deterministic token-reduction optimizations to the assembled system and user prompts. Runs automatically inside every strategy's build() method.
PromptOptimizer Constructor
PromptOptimizer(
max_total_tokens: int = 6000,
enable_filler: bool = True,
enable_dedup: bool = True,
enable_whitespace: bool = True,
)Purpose: Configures the optimization pipeline.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_total_tokens |
int |
6000 |
Hard total token cap. If system + user tokens exceed this, the user prompt is truncated at a paragraph boundary |
enable_filler |
bool |
True |
Strip common LLM padding phrases ("Certainly!", "Great question!", etc.) |
enable_dedup |
bool |
True |
Remove instruction paragraphs from the user prompt that already appear verbatim in the system prompt |
enable_whitespace |
bool |
True |
Collapse multiple spaces and excessive blank lines |
optimize()
optimizer.optimize(
system: str,
user: str,
request: Optional[object] = None,
) -> Tuple[str, str, int, List[str]]Purpose: Applies all enabled optimization passes to the system and user prompt strings. The optimization pipeline runs in this order: whitespace normalization → filler removal → instruction deduplication → hard token cap.
| Parameter | Type | Default | Description |
|---|---|---|---|
system |
str |
required | The assembled system prompt text |
user |
str |
required | The assembled user prompt text |
request |
Optional[object] |
None |
The original PromptRequest (reserved for future use) |
Returns: Tuple[str, str, int, List[str]]
[0]— Optimized system prompt[1]— Optimized user prompt[2]— Number of tokens saved (0 if none)[3]— List of human-readable optimization notes (e.g.,["whitespace-normalized", "filler-stripped", "dedup-removed-2-paragraphs"])
Example (standalone usage):
from fennec_community.prompt import PromptOptimizer
optimizer = PromptOptimizer(max_total_tokens=4000)
system_opt, user_opt, saved, notes = optimizer.optimize(
system = my_system_prompt,
user = my_user_prompt,
)
print(f"Saved {saved} tokens via: {notes}")BuiltPrompt Methods
These are public methods and properties on the BuiltPrompt object returned by engine.build().
to_messages()
built_prompt.to_messages() -> List[Dict[str, str]]Purpose: Serializes the full message list (system + conversation history + user) into the OpenAI Chat Completions API format — a list of {"role": ..., "content": ...} dicts.
Returns: List[Dict[str, str]]
Example:
response = openai_client.chat.completions.create(
model = "gpt-4o",
messages = prompt.to_messages(),
)to_anthropic()
built_prompt.to_anthropic() -> Dict[str, Any]Purpose: Serializes the prompt into the Anthropic Messages API format — a dict with a "system" key (string) and a "messages" key (list of non-system messages). Can be unpacked directly as **kwargs into anthropic_client.messages.create().
Returns: Dict[str, Any] with keys:
"system"— the system prompt string"messages"— list of{"role": ..., "content": ...}dicts (excludes system messages)
Example:
payload = prompt.to_anthropic()
response = anthropic_client.messages.create(
**payload,
model = "claude-opus-4-20250514",
max_tokens = 1024,
)full_text (property)
built_prompt.full_text -> strPurpose: Returns the system and user prompts combined as a single plain-text string, prefixed with [SYSTEM] and [USER] section headers. Useful for debugging, logging, or human review of the assembled prompt.
Returns: str
Example:
print(prompt.full_text)
# [SYSTEM]
# You are an expert AI assistant...
#
# [USER]
# ## Context
# --- Source [1] wiki (relevance: 0.92) ---
# ...ContextResult Properties
utilization_pct (property)
context_result.utilization_pct -> floatPurpose: Returns the percentage of the token budget consumed by the assembled context block. Useful for monitoring how efficiently the document context is using the available token budget.
Returns: float — value between 0.0 and 100.0+ (can exceed 100 if truncation occurred).
Example:
result = cm.build(request)
print(f"Token budget utilization: {result.utilization_pct}%")
# Token budget utilization: 84.3%Strategy System
The strategy system provides 7 built-in prompt construction templates. Strategies are selected automatically (via auto-detection and complexity upgrade) or specified explicitly via strategy= in engine.build().
get_strategy()
get_strategy(strategy: PromptStrategy) -> BaseStrategyModule: prompt.strategies
Import: from fennec_community.prompt import get_strategy
Purpose: Retrieves the singleton strategy implementation for the given PromptStrategy enum value. Falls back to SimpleStrategy if the strategy is not registered (with a warning log). Primarily used internally by PromptBuilder, but available for advanced use cases.
| Parameter | Type | Description |
|---|---|---|
strategy |
PromptStrategy |
The strategy enum value to look up |
Returns: BaseStrategy — the strategy implementation object.
Example:
from fennec_community.prompt import get_strategy, PromptStrategy
impl = get_strategy(PromptStrategy.CHAIN_OF_THOUGHT)STRATEGY_REGISTRY
STRATEGY_REGISTRY: Dict[PromptStrategy, BaseStrategy]Module: prompt.strategies
Import: from fennec_community.prompt import STRATEGY_REGISTRY
Purpose: The dictionary mapping every PromptStrategy enum value to its singleton implementation. Use this to inspect available strategies or to register custom strategy implementations.
| Strategy Key | Implementation Class | Best For |
|---|---|---|
PromptStrategy.SIMPLE |
SimpleStrategy |
Direct Q&A, factual lookup |
PromptStrategy.CHAIN_OF_THOUGHT |
ChainOfThoughtStrategy |
Reasoning, explanation |
PromptStrategy.MULTI_HOP |
MultiHopStrategy |
Multi-document, multi-step |
PromptStrategy.SELF_CONSISTENT |
SelfConsistentStrategy |
High-stakes verification |
PromptStrategy.STEP_BACK |
StepBackStrategy |
Abstract-first reasoning |
PromptStrategy.REACT |
ReActStrategy |
Agentic tool-use |
PromptStrategy.LEAST_TO_MOST |
LeastToMostStrategy |
Math, logic, progressive decomposition |
Example — register a custom strategy:
from fennec_community.prompt import STRATEGY_REGISTRY, PromptStrategy
from fennec_community.prompt.strategies import BaseStrategy
class MyCustomStrategy(BaseStrategy):
STRATEGY = PromptStrategy.SIMPLE # override an existing slot
def _build_system(self, req, guardrail_block): ...
def _build_user(self, req, context_block, memory_block): ...
STRATEGY_REGISTRY[PromptStrategy.SIMPLE] = MyCustomStrategy()PromptMetrics Methods
to_dict()
metrics_obj.to_dict() -> Dict[str, Any]Purpose: Serializes the PromptMetrics dataclass into a plain Python dictionary, suitable for JSON serialization, logging, or dashboarding. This is what engine.metrics (the property) calls internally.
Returns: Dict[str, Any] — see the metrics property section for the full key reference.
Example:
import json
# Access via engine property (recommended)
print(json.dumps(engine.metrics, indent=2))Integration Examples
OpenAI
from fennec_community.prompt import PromptEngine
import openai
engine = PromptEngine()
client = openai.OpenAI(api_key="...")
prompt = engine.build(
query = "What is the capital of France?",
documents = [{"content": "France is a country in Europe. Its capital is Paris.", "source": "geo_db"}],
)
response = client.chat.completions.create(
model = "gpt-4o",
messages = prompt.to_messages(),
)
print(response.choices[0].message.content)Anthropic
from fennec_community.prompt import PromptEngine
import anthropic
engine = PromptEngine()
client = anthropic.Anthropic(api_key="...")
prompt = engine.build(
query = "Summarize the quarterly earnings report.",
documents = retrieved_docs,
strategy = "chain_of_thought",
output_format = "markdown",
user_profile = "executive",
)
response = client.messages.create(
**prompt.to_anthropic(),
model = "claude-opus-4-20250514",
max_tokens = 2048,
)
print(response.content[0].text)Multi-turn Conversation
from fennec_community.prompt import PromptEngine, Message
engine = PromptEngine()
history = []
def chat(user_message: str, docs=None) -> str:
prompt = engine.build(
query = user_message,
documents = docs or [],
memory = history,
prompt_type = "conversational",
)
# Call your LLM here...
answer = llm_call(prompt.to_messages())
# Update history
history.append(Message(role="user", content=user_message))
history.append(Message(role="assistant", content=answer))
return answerAgentic Tool-Use
tools = [
{
"name": "search_database",
"description": "Search the company knowledge base.",
"parameters": {"query": "string", "top_k": "int"},
},
{
"name": "get_document",
"description": "Retrieve a specific document by ID.",
"parameters": {"doc_id": "string"},
},
]
prompt = engine.build(
query = "Find all invoices from Q3 2024 and calculate the total.",
prompt_type = "agent",
strategy = "react",
extra = {"tools": tools},
)Custom Guardrails + Observability Hook
from fennec_community.prompt import PromptEngine, Guardrail, BuiltPrompt, PromptRequest
# Custom guardrail
disclaimer = Guardrail(
name = "financial_disclaimer",
instruction = "This is not financial advice. Past performance is not indicative of future results.",
priority = 115,
)
engine = PromptEngine(extra_guardrails=[disclaimer])
# Hook into every build for custom logging
def on_prompt_built(prompt: BuiltPrompt, request: PromptRequest):
print(f"[{request.trace_id}] Built {prompt.prompt_type.value} | "
f"{prompt.estimated_tokens} tokens | {prompt.tokens_saved} saved | "
f"guardrails={prompt.guardrails_applied}")
engine.on("prompt.built", on_prompt_built)Adaptive Strategy Selection
# After collecting feedback over time:
engine.record_feedback(trace_id="abc", quality_score=0.9)
engine.record_feedback(trace_id="def", quality_score=0.6)
# ... at least 5 feedback entries ...
best_strategy = engine.adaptive_strategy_for(PromptType.QA)
prompt = engine.build(
query = "What is the return policy?",
strategy = best_strategy or PromptStrategy.SIMPLE,
)Error Reference
| Error | When | Resolution |
|---|---|---|
ValueError |
Invalid string passed for an enum parameter (e.g., strategy="unknown") |
Use a valid PromptStrategy, PromptType, OutputFormat, QueryComplexity, or UserProfile value |
KeyError |
get_strategy() called with unregistered strategy |
Engine falls back to SimpleStrategy with a warning log — not a hard error |
| Token budget exceeded | Documents larger than max_context_tokens |
Documents are partially truncated at word boundaries; BuiltPrompt.documents_truncated > 0 signals this |
| Cache eviction | max_cache_size reached |
Oldest entry is evicted automatically (LRU-like behaviour) |
| Hook error | Exception inside an on() callback |
Logged as a warning; does not propagate or interrupt the build |
community/prompt.md