Output Parser Modular
Purpose: A production-grade engine for parsing, validating, and fault-tolerantly handling raw LLM outputs.
Architecture Overview
The module operates as a multi-stage pipeline:
Raw LLM Text
│
▼
FormatDetector ← Auto-detects format (JSON / YAML / CSV / ...)
│
▼
Format Parsers ← Converts text into structured Python objects
│
▼
OutputFixer ← Repairs broken or incomplete outputs
│
▼
OutputValidator ← Validates correctness, types, and safety
│
▼
RetryHandler ← Re-prompts LLM with escalating instructions on failure
│
▼
ParseResult ← Final typed, audited resultImports
from fennec_community.output_parser import (
# Core
OutputParser, ParseError,
create_answer_parser, create_json_parser, create_tool_call_parser,
# Schemas
AnswerSchema, ToolCallSchema, RetrievalResultSchema,
RankedAnswersSchema, FieldSchema,
# Enums
OutputFormat, ParseMode, FixStrategy, ValidationStatus,
# Results & Tracing
ParseResult, ParseTrace, ValidationResult,
# Format Detection
FormatDetector, FormatCandidate,
# Validation
OutputValidator, ValidationRule, build_answer_validator,
# Fixing
OutputFixer, build_answer_fixer,
# Retry
RetryHandler, RetryResult, RetryStrategy, graceful_fallback,
)1. Core Class: OutputParser
Purpose: The central orchestrator — combines all pipeline stages into a single, unified interface.
OutputParser.__init__()
OutputParser(
schema=None,
fields=None,
mode=ParseMode.LENIENT,
expected_format=None,
llm_fn=None,
max_retries=2,
enable_safety=True,
enable_cache=True,
original_prompt="",
)| Parameter | Type | Default | Description |
|---|---|---|---|
schema |
Type[BaseModel] | None |
None |
Pydantic model class. When provided, parsed dicts are automatically cast to this type. |
fields |
List[FieldSchema] | None |
None |
Explicit field definitions as an alternative to a Pydantic schema. |
mode |
ParseMode |
LENIENT |
Strictness level: STRICT / LENIENT / SEMANTIC / TOOL_CALL |
expected_format |
OutputFormat | None |
None |
Force a specific format, skipping auto-detection. |
llm_fn |
Callable[[str], str] | None |
None |
An LLM callable used during fix and retry operations. |
max_retries |
int |
2 |
Maximum number of LLM regeneration attempts on parse failure. |
enable_safety |
bool |
True |
Enables safety checks: hallucination markers, data leakage, prompt injection. |
enable_cache |
bool |
True |
Caches successful parse results in memory (keyed by MD5 hash of input). |
original_prompt |
str |
"" |
The original user prompt, embedded in retry prompts for context. |
OutputParser.parse()
def parse(
text: str,
expected_format: Optional[OutputFormat] = None,
) -> ParseResultPurpose: Parses raw LLM text and returns a fully validated, optionally typed ParseResult with a complete audit trail.
| Parameter | Type | Description |
|---|---|---|
text |
str |
Raw LLM output string to parse. |
expected_format |
OutputFormat | None |
Override the detected format for this call only. |
Returns: ParseResult
| Property | Type | Description |
|---|---|---|
.data |
Any |
The parsed, validated, typed output (dict / Pydantic instance / list / ...). |
.ok |
bool |
True if parsing succeeded and data is not None. |
.trace |
ParseTrace |
Full audit trail of every pipeline stage. |
.raw |
str |
The original, unmodified LLM text. |
Raises: ParseError in ParseMode.STRICT if parsing fails after all recovery attempts.
Example:
parser = OutputParser()
result = parser.parse('{"answer": "Paris", "confidence": 0.98}')
if result.ok:
print(result.data) # {'answer': 'Paris', 'confidence': 0.98}
print(result.trace.detected_format) # OutputFormat.JSON
print(result.trace.duration_ms) # e.g. 1.23OutputParser.parse_typed()
def parse_typed(
text: str,
schema: Type[T],
expected_format: Optional[OutputFormat] = None,
) -> TPurpose: Parses text and returns a typed instance of schema directly. This is a convenience shorthand for parser.parse(text).as_typed(schema).
| Parameter | Type | Description |
|---|---|---|
text |
str |
Raw LLM output string. |
schema |
Type[T] |
The class to cast the parsed data into. |
expected_format |
OutputFormat | None |
Optional format override. |
Returns: An instance of type T (Pydantic model or dataclass).
Raises: ParseError if parsing fails; TypeError if the cast to schema fails.
Example:
from fennec_community.output_parser import AnswerSchema
parser = OutputParser(schema=AnswerSchema)
answer: AnswerSchema = parser.parse_typed(raw_text, AnswerSchema)
print(answer.answer) # "Paris"
print(answer.confidence) # 0.98OutputParser.get_format_instructions()
def get_format_instructions() -> strPurpose: Generates a format instruction string ready to be embedded directly into a System Prompt or User Prompt, guiding the LLM to produce output in the expected structure.
Takes no parameters.
Returns: str — a prompt-ready instruction block.
Behavior:
- If
schemais set → generates a full JSON Schema with an example - If
fieldsare set → generates akey: <type>list with descriptions - If neither → returns a generic JSON example
Example:
parser = OutputParser(schema=AnswerSchema)
instructions = parser.get_format_instructions()
# "Return ONLY valid JSON matching this schema:\n```json\n{...}\n```\nDo not include any explanation..."
prompt = f"Answer the following question.\n{instructions}\nQuestion: What is the capital of France?"OutputParser.clear_cache()
def clear_cache() -> NonePurpose: Clears all in-memory cached parse results. Useful when the schema or mode changes at runtime, or after long test sessions.
Takes no parameters. Returns nothing.
Example:
parser.clear_cache()2. Factory Functions
Convenience functions that return a pre-configured
OutputParserfor the most common use cases.
create_answer_parser()
def create_answer_parser(
llm_fn: Optional[Callable[[str], str]] = None,
max_retries: int = 2,
strict: bool = False,
) -> OutputParserPurpose: Creates a parser pre-configured for standard RAG AnswerSchema outputs — includes full safety checks and schema validation with zero additional setup.
| Parameter | Type | Default | Description |
|---|---|---|---|
llm_fn |
Callable[[str], str] | None |
None |
LLM callable for retry on failure. |
max_retries |
int |
2 |
Number of regeneration attempts. |
strict |
bool |
False |
True = raise ParseError on failure; False = use graceful fallback. |
Returns: A ready-to-use OutputParser targeting AnswerSchema.
Example:
parser = create_answer_parser(llm_fn=my_llm, strict=True)
result = parser.parse(raw_llm_output)
answer: AnswerSchema = result.data
print(answer.answer, answer.sources, answer.confidence)create_json_parser()
def create_json_parser(
schema: Optional[Type] = None,
llm_fn: Optional[Callable[[str], str]] = None,
strict: bool = False,
) -> OutputParserPurpose: Creates a parser dedicated to JSON outputs with optional Pydantic schema enforcement. Ideal when the LLM is expected to return pure JSON.
| Parameter | Type | Default | Description |
|---|---|---|---|
schema |
Type | None |
None |
Pydantic model to cast the parsed result into automatically. |
llm_fn |
Callable[[str], str] | None |
None |
LLM callable for retry on failure. |
strict |
bool |
False |
Strictness level on failure. |
Returns: An OutputParser locked to OutputFormat.JSON.
Example:
from pydantic import BaseModel
class ProductSchema(BaseModel):
name: str
price: float
in_stock: bool
parser = create_json_parser(schema=ProductSchema, strict=True)
product: ProductSchema = parser.parse_typed(raw_text, ProductSchema)create_tool_call_parser()
def create_tool_call_parser(
llm_fn: Optional[Callable[[str], str]] = None,
) -> OutputParserPurpose: Creates a parser specialized for Tool Call outputs. Supports both OpenAI function-call JSON format and ReAct-style Action: ... / Action Input: ... format. Safety checks are disabled by default as tool environments are trusted.
| Parameter | Type | Default | Description |
|---|---|---|---|
llm_fn |
Callable[[str], str] | None |
None |
LLM callable for a single retry attempt. |
Returns: An OutputParser configured with ParseMode.TOOL_CALL and OutputFormat.TOOL_CALL.
Example:
parser = create_tool_call_parser()
result = parser.parse('Action: search\nAction Input: {"query": "weather in Cairo"}')
# result.data → {"tool_name": "search", "arguments": {"query": "weather in Cairo"}, "thought": None}3. Format Detection: FormatDetector
Purpose: Analyses raw LLM text using multi-signal heuristics to determine its format before parsing begins.
FormatDetector.detect()
def detect(text: str) -> OutputFormatPurpose: Returns the single most likely OutputFormat for the given text.
| Parameter | Type | Description |
|---|---|---|
text |
str |
The raw LLM output to analyse. |
Returns: OutputFormat enum value.
Example:
detector = FormatDetector()
detector.detect('{"key": "value"}') # → OutputFormat.JSON
detector.detect("1. First\n2. Second") # → OutputFormat.NUMBERED_LIST
detector.detect("| A | B |\n|---|---|\n") # → OutputFormat.MARKDOWN_TABLEFormatDetector.rank()
def rank(text: str) -> List[FormatCandidate]Purpose: Returns a ranked list of all plausible formats with confidence scores for each — invaluable for debugging ambiguous outputs.
| Parameter | Type | Description |
|---|---|---|
text |
str |
The raw text to analyse. |
Returns: List[FormatCandidate] — sorted descending by confidence. Each FormatCandidate contains:
| Property | Type | Description |
|---|---|---|
.format |
OutputFormat |
The detected format. |
.confidence |
float |
Confidence score from 0.0 to 1.0. |
.evidence |
str |
Human-readable reason for this score. |
Example:
detector = FormatDetector()
candidates = detector.rank("name: John\nage: 30\ncity: Cairo")
for c in candidates:
print(f"{c.format.value}: {c.confidence:.2f} — {c.evidence}")
# key_value: 0.60 — 3 key: value pairs
# yaml: 0.50 — 3 key: value lines
# plain_text: 0.25 — default text fallbackFormatDetector.detect_with_confidence()
def detect_with_confidence(text: str) -> Tuple[OutputFormat, float]Purpose: Returns the best format alongside its confidence score as a single tuple — a practical shorthand when you need both values together.
| Parameter | Type | Description |
|---|---|---|
text |
str |
The raw text to analyse. |
Returns: Tuple[OutputFormat, float] — (best_format, confidence_score)
Example:
detector = FormatDetector()
fmt, confidence = detector.detect_with_confidence(text)
if confidence < 0.5:
print("Warning: format is ambiguous, consider using LENIENT mode")4. Validation: OutputValidator
Purpose: Validates parsed outputs across four sequential layers to ensure correctness, type safety, business rules, and security.
OutputValidator.__init__()
OutputValidator(
fields=None,
rules=None,
pydantic_model=None,
enable_safety=True,
)| Parameter | Type | Default | Description |
|---|---|---|---|
fields |
List[FieldSchema] | None |
None |
Fields to validate for presence and type. |
rules |
List[ValidationRule] | None |
None |
Custom business-rule predicates. |
pydantic_model |
Type | None |
None |
Pydantic model for structural validation (Layer 4). |
enable_safety |
bool |
True |
Enables hallucination, data leakage, and prompt injection checks. |
The four validation layers:
| Layer | Name | What It Checks |
|---|---|---|
| 1 | Schema Completeness | Required fields present with correct types |
| 2 | Business Rules | Custom ValidationRule predicates |
| 3 | Safety Checks | Hallucination markers, PII/credential leakage, prompt injection |
| 4 | Pydantic Validation | Full structural validation against a Pydantic model |
OutputValidator.validate()
def validate(
data: Any,
raw_text: str = "",
trace: Optional[ParseTrace] = None,
) -> List[ValidationResult]Purpose: Runs all validation layers and returns detailed per-check results. If a trace is passed, results are automatically appended to it.
| Parameter | Type | Description |
|---|---|---|
data |
Any |
The parsed data to validate (typically a dict). |
raw_text |
str |
The original LLM text (used for safety checks). |
trace |
ParseTrace | None |
If provided, validation results are added to the trace. |
Returns: List[ValidationResult]. Each result contains:
| Property | Type | Description |
|---|---|---|
.status |
ValidationStatus |
PASSED / FAILED / WARNING / SKIPPED |
.field |
str | None |
The associated field name (if applicable). |
.message |
str |
Error or warning message. |
.value |
Any |
The value that caused the issue. |
.passed |
bool |
True if status is PASSED. |
Example:
validator = OutputValidator(
fields=[FieldSchema("answer", "The answer text", dtype="str", required=True)],
enable_safety=True,
)
results = validator.validate({"answer": "Paris"}, raw_text=raw_output)
failures = [r for r in results if not r.passed]OutputValidator.is_valid()
def is_valid(
data: Any,
raw_text: str = "",
) -> boolPurpose: A quick pass/fail check — returns True only if all validation layers pass (warnings and skipped checks are not treated as failures).
| Parameter | Type | Description |
|---|---|---|
data |
Any |
The parsed data. |
raw_text |
str |
Original LLM text for safety checks. |
Returns: bool
Example:
if not validator.is_valid(parsed_data, raw_text):
raise ValueError("Output did not pass validation")OutputValidator.get_failures()
def get_failures(
data: Any,
raw_text: str = "",
) -> List[ValidationResult]Purpose: Returns only the failed validation results — ideal for structured logging and error reporting.
| Parameter | Type | Description |
|---|---|---|
data |
Any |
The parsed data. |
raw_text |
str |
Original LLM text. |
Returns: List[ValidationResult] — empty list if all checks pass.
Example:
failures = validator.get_failures(data, raw_text)
if failures:
for f in failures:
logger.error("[%s] %s", f.field, f.message)build_answer_validator()
def build_answer_validator(enable_safety: bool = True) -> OutputValidatorPurpose: Factory that returns a pre-configured OutputValidator for AnswerSchema outputs — validates answer presence, confidence range [0.0, 1.0], and runs all safety checks.
| Parameter | Type | Default | Description |
|---|---|---|---|
enable_safety |
bool |
True |
Enable or disable safety checks. |
Returns: A ready-to-use OutputValidator.
Example:
validator = build_answer_validator(enable_safety=True)
is_ok = validator.is_valid(parsed_data, raw_text)5. Custom Rules: ValidationRule
Purpose: Defines a named, callable validation rule with configurable severity.
ValidationRule.__init__()
ValidationRule(
name: str,
predicate: Callable[[Any], bool],
message: str,
field: Optional[str] = None,
severity: ValidationStatus = ValidationStatus.FAILED,
)| Parameter | Type | Description |
|---|---|---|
name |
str |
Rule identifier (used in logs). |
predicate |
Callable[[Any], bool] |
A function that receives the data and returns True if the check passes. |
message |
str |
Error message shown on failure. |
field |
str | None |
The associated field name (optional, for context). |
severity |
ValidationStatus |
Failure severity: FAILED (default) or WARNING. |
ValidationRule.check()
def check(data: Any) -> ValidationResultPurpose: Applies the predicate to the given data and returns a ValidationResult.
| Parameter | Type | Description |
|---|---|---|
data |
Any |
The data to validate. |
Returns: ValidationResult
Example:
rule = ValidationRule(
name="answer_min_length",
predicate=lambda d: len(d.get("answer", "")) >= 10,
message="Answer is too short (less than 10 characters)",
field="answer",
severity=ValidationStatus.WARNING,
)
result = rule.check({"answer": "Yes"})
# result.status → ValidationStatus.WARNING6. Fault Tolerance: OutputFixer
Purpose: Repairs malformed, incomplete, or broken LLM outputs using a hierarchy of four progressively deeper strategies.
OutputFixer.__init__()
OutputFixer(
required_fields=None,
field_defaults=None,
llm_fn=None,
)| Parameter | Type | Default | Description |
|---|---|---|---|
required_fields |
List[str] | None |
None |
Field names that must be present in the output. |
field_defaults |
Dict[str, Any] | None |
None |
Default values injected when a required field is missing. |
llm_fn |
Callable[[str], str] | None |
None |
LLM callable, required for the LLM_REFORMAT strategy. |
OutputFixer.fix()
def fix(
text: str,
expected_format: OutputFormat = OutputFormat.JSON,
) -> Tuple[str, FixStrategy]Purpose: Attempts to repair a broken text string using four escalating strategies, returning the repaired text and the strategy that succeeded. The caller should re-parse the returned text.
| Parameter | Type | Description |
|---|---|---|
text |
str |
The malformed or unparseable text. |
expected_format |
OutputFormat |
The expected format, which influences which repair logic is applied. |
Returns: Tuple[str, FixStrategy]
str— the repaired text (must be re-parsed by the caller)FixStrategy— the strategy used, orFixStrategy.NONEif all strategies failed
Repair strategies applied in order:
| Strategy | Description |
|---|---|
REGEX_REPAIR |
Strips markdown fences, fixes trailing commas, converts single quotes, quotes bare keys |
FIELD_INJECTION |
Injects missing required fields with their default values |
FALLBACK_PARSE |
Parses as key-value pairs and re-serializes as JSON |
LLM_REFORMAT |
Sends a reformat request to the LLM (requires llm_fn) |
Example:
fixer = OutputFixer(
required_fields=["answer"],
field_defaults={"answer": "", "confidence": 0.5},
)
fixed_text, strategy = fixer.fix("```json\n{answer: 'Paris'}\n```", OutputFormat.JSON)
# → ('{"answer": "Paris"}', FixStrategy.REGEX_REPAIR)OutputFixer.fix_dict()
def fix_dict(
data: Dict[str, Any],
) -> Tuple[Dict[str, Any], FixStrategy]Purpose: Repairs a partially parsed dict by injecting missing required fields directly — faster than fix() when structured data is already available.
| Parameter | Type | Description |
|---|---|---|
data |
Dict[str, Any] |
The incomplete parsed dict. |
Returns: Tuple[Dict[str, Any], FixStrategy]
- A completed dict with injected fields
FixStrategy.FIELD_INJECTIONif fields were injected;FixStrategy.NONEif nothing was missing
Example:
fixer = OutputFixer(required_fields=["answer", "sources"])
fixed, strategy = fixer.fix_dict({"answer": "Paris"})
# → ({"answer": "Paris", "sources": None}, FixStrategy.FIELD_INJECTION)build_answer_fixer()
def build_answer_fixer(llm_fn: Optional[Callable] = None) -> OutputFixerPurpose: Factory that returns a pre-configured OutputFixer for AnswerSchema outputs — automatically injects answer: "", sources: [], and confidence: 0.5 for missing fields.
| Parameter | Type | Default | Description |
|---|---|---|---|
llm_fn |
Callable | None |
None |
LLM callable for the LLM_REFORMAT strategy. |
Returns: A ready-to-use OutputFixer.
7. Retry & Regeneration: RetryHandler
Purpose: Manages LLM regeneration when parsing fails, re-prompting with progressively stricter instructions until a valid output is obtained or the retry budget is exhausted.
RetryHandler.__init__()
RetryHandler(
llm_fn: LLMCallable,
max_retries: int = 3,
backoff_seconds: float = 0.5,
pydantic_schema: Optional[Type] = None,
required_fields: Optional[List[str]] = None,
)| Parameter | Type | Default | Description |
|---|---|---|---|
llm_fn |
Callable[[str], str] |
required | The LLM callable to invoke on each retry. |
max_retries |
int |
3 |
Maximum number of retry attempts. |
backoff_seconds |
float |
0.5 |
Wait time between retries (multiplied by attempt index for linear backoff). |
pydantic_schema |
Type | None |
None |
Used to generate a JSON schema example in retry prompts. |
required_fields |
List[str] | None |
None |
Field names embedded in retry prompt examples. |
RetryHandler.run()
def run(
original_prompt: str,
parse_fn: Callable[[str], Any],
format_instructions: str = "",
last_error: str = "",
last_response: str = "",
) -> RetryResultPurpose: Executes a full retry cycle — builds an improved prompt → calls LLM → tests parseability → repeats until success or budget exhaustion.
| Parameter | Type | Description |
|---|---|---|
original_prompt |
str |
The original user question or request. |
parse_fn |
Callable[[str], Any] |
Parse function that raises an exception on failure. |
format_instructions |
str |
Format hint string from get_format_instructions(). |
last_error |
str |
Error message from the most recent failed parse. |
last_response |
str |
The last raw LLM response that failed parsing. |
Retry strategies applied in order:
| Strategy | Description |
|---|---|
STRICT_FORMAT |
Adds explicit format instructions and the failed response to the prompt |
JSON_STRICT |
Forces JSON-only output with a full schema example |
SIMPLIFIED |
Strips the prompt down to a minimal question with a basic JSON example |
GRACEFUL_FAIL |
Returns a structured error payload |
Returns: RetryResult
| Property | Type | Description |
|---|---|---|
.success |
bool |
True if any attempt succeeded. |
.response |
str |
The raw LLM response from the winning attempt. |
.attempts |
int |
Total number of actual LLM calls made. |
.strategy_used |
RetryStrategy | None |
The strategy that succeeded. |
.errors |
List[str] |
Error messages from each failed attempt. |
.total_duration_ms |
float |
Total wall-clock time in milliseconds. |
Example:
handler = RetryHandler(llm_fn=my_llm, max_retries=3)
result = handler.run(
original_prompt="What is the capital of France?",
parse_fn=lambda t: json.loads(t),
format_instructions='Return JSON: {"answer": "..."}',
last_error="No JSON found",
last_response="The capital of France is Paris.",
)
if result.success:
parsed = json.loads(result.response)RetryResult.as_error_payload()
def as_error_payload(original_query: str = "") -> Dict[str, Any]Purpose: Converts a failed RetryResult into a structured error dictionary — useful for consistent error handling at the application level.
| Parameter | Type | Description |
|---|---|---|
original_query |
str |
The original user query (for context in the error payload). |
Returns: Dict[str, Any]
{
"error": True,
"message": "Failed to obtain a valid response after all retries",
"attempts": 3,
"original_query": "...",
"errors": ["Attempt 1 ...", "Attempt 2 ...", ...],
}graceful_fallback()
def graceful_fallback(
raw_text: str,
query: str = "",
) -> Dict[str, Any]Purpose: The final safety net — when all parsing, fixing, and retry attempts fail, this function returns a structured, always-safe dict that guarantees the pipeline never returns None or raises an unhandled exception.
| Parameter | Type | Description |
|---|---|---|
raw_text |
str |
The original text that could not be parsed. |
query |
str |
The original user query (for debugging). |
Returns: Dict[str, Any] — always contains:
{
"answer": "<raw_text or error message>",
"sources": [],
"confidence": 0.0,
"_parse_error": True, # Flag to distinguish from real answers
"_raw": "<first 500 chars of raw_text>",
"_query": "<first 200 chars of query>",
}Example:
fallback = graceful_fallback(broken_text, query="What is the capital of France?")
# Always safe — never returns None, never raises8. Schemas & Data Types
AnswerSchema
Standard schema for RAG pipeline answers.
class AnswerSchema(BaseModel):
answer: str # The answer text (required)
sources: List[str] # Source references (default: [])
confidence: float # Confidence score 0.0–1.0 (default: 1.0, auto-clamped)
reasoning: Optional[str] # Optional chain-of-thought (default: None)ToolCallSchema
Schema for LLM tool/function call outputs.
class ToolCallSchema(BaseModel):
tool_name: str # Name of the tool to invoke (required)
arguments: Dict[str, Any] # Tool arguments (default: {})
thought: Optional[str] # Optional reasoning before the call (default: None)RetrievalResultSchema
Schema for a single retrieved document result.
class RetrievalResultSchema(BaseModel):
content: str # Document content (required)
source: Optional[str] # Source URL or ID (default: None)
score: Optional[float] # Relevance score 0.0–1.0 (default: None)
metadata: Dict[str, Any] # Additional metadata (default: {})RankedAnswersSchema
Schema for multiple candidate answers with a designated best answer.
class RankedAnswersSchema(BaseModel):
answers: List[AnswerSchema] # List of candidates (required, min length: 1)
best_index: int # Index of the best answer (default: 0)
@property
def best(self) -> AnswerSchema: # Direct access to the best answerFieldSchema
Defines a single expected field in an LLM output for schema-less validation.
@dataclass
class FieldSchema:
name: str # Field name
description: str # Human-readable description
dtype: str = "str" # Type: str | int | float | bool | list | dict
required: bool = True # Whether the field is required
aliases: List[str] = [] # Alternative field names to match
default: Any = None # Default value if missing
choices: Optional[List] = None # Restricts value to an allowed set9. Enums Reference
OutputFormat
| Value | Description |
|---|---|
JSON |
JSON object or array |
YAML |
YAML data |
CSV |
Tabular CSV data |
MARKDOWN_TABLE |
Pipe-delimited Markdown table |
NUMBERED_LIST |
Numbered list (1. / a.) |
BULLETED_LIST |
Bulleted list (* / - / •) |
KEY_VALUE |
Key: value pairs |
XML |
XML tag pairs |
TOOL_CALL |
Tool or function call invocation |
PLAIN_TEXT |
Plain unstructured text |
MIXED |
Mix of multiple formats |
UNKNOWN |
Could not be determined |
ParseMode
| Value | Description |
|---|---|
STRICT |
Raises ParseError on any failure — no fallback |
LENIENT |
Auto-fixes minor issues, uses graceful fallback on failure |
SEMANTIC |
Uses LLM to extract meaning from freeform text |
TOOL_CALL |
Specialized mode for parsing tool/function call syntax |
FixStrategy
| Value | Description |
|---|---|
REGEX_REPAIR |
Regex-based fixes (quotes, commas, fences, bare keys) |
FIELD_INJECTION |
Injects missing fields with default values |
FALLBACK_PARSE |
Parses as key-value and re-serializes as JSON |
LLM_REFORMAT |
Asks the LLM to reformat its own output |
NONE |
No fix was applied or possible |
RetryStrategy
| Value | Description |
|---|---|
STRICT_FORMAT |
Adds explicit format instructions to the retry prompt |
JSON_STRICT |
Forces JSON-only output with a schema example |
SIMPLIFIED |
Strips the prompt down to a minimal question |
GRACEFUL_FAIL |
Returns a structured error payload |
ValidationStatus
| Value | Description |
|---|---|
PASSED |
Check passed |
FAILED |
Check failed — treated as a hard error |
WARNING |
Check flagged a concern but did not fail |
SKIPPED |
Check was not applicable (e.g. field is optional and absent) |
10. Complete Usage Examples
Basic Usage
from fennec_community.output_parser import OutputParser, AnswerSchema, ParseMode
parser = OutputParser(
schema=AnswerSchema,
mode=ParseMode.LENIENT,
enable_safety=True,
)
result = parser.parse('{"answer": "Paris", "confidence": 0.95, "sources": ["Wikipedia"]}')
if result.ok:
answer: AnswerSchema = result.data
print(answer.answer) # Paris
print(answer.confidence) # 0.95
print(result.trace.summary())With LLM and Retry
from fennec_community.output_parser import create_answer_parser, ParseError
def my_llm(prompt: str) -> str:
return openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
).choices[0].message.content
parser = create_answer_parser(
llm_fn=my_llm,
max_retries=3,
strict=True,
)
try:
result = parser.parse(raw_llm_output)
answer = result.data
except ParseError as e:
print("Parse failed:", e)
print("Trace:", e.trace.summary() if e.trace else "N/A")Custom Validation Rules
from fennec_community.output_parser import OutputValidator, ValidationRule, FieldSchema, ValidationStatus
validator = OutputValidator(
fields=[
FieldSchema("answer", "The answer text", dtype="str", required=True),
FieldSchema("confidence", "Confidence score", dtype="float", required=True),
],
rules=[
ValidationRule(
name="min_confidence",
predicate=lambda d: d.get("confidence", 0) >= 0.3,
message="Confidence is too low (below 30%)",
field="confidence",
severity=ValidationStatus.WARNING,
),
ValidationRule(
name="answer_not_empty",
predicate=lambda d: bool(d.get("answer", "").strip()),
message="Answer field must not be empty",
field="answer",
),
],
)
results = validator.validate(parsed_data, raw_text=raw_text)
failures = [r for r in results if not r.passed]Format Detection Standalone
from fennec_community.output_parser import FormatDetector, OutputFormat
detector = FormatDetector()
# Single best format
fmt = detector.detect("| Name | Age |\n|------|-----|\n| Ali | 30 |")
# → OutputFormat.MARKDOWN_TABLE
# With confidence score
fmt, confidence = detector.detect_with_confidence(text)
if confidence < 0.5:
print("Ambiguous format — consider forcing expected_format in the parser")
# Full ranking for debugging
for candidate in detector.rank(text):
print(f"{candidate.format.value}: {candidate.confidence:.2f} — {candidate.evidence}")Manual Fix and Re-Parse
from fennec_community.output_parser import OutputFixer, OutputFormat
import json
fixer = OutputFixer(
required_fields=["answer", "confidence"],
field_defaults={"answer": "", "confidence": 0.5},
)
broken = "```json\n{answer: 'Paris', confidence: '0.9'}\n```"
fixed_text, strategy = fixer.fix(broken, OutputFormat.JSON)
print(strategy) # FixStrategy.REGEX_REPAIR
print(json.loads(fixed_text)) # {"answer": "Paris", "confidence": "0.9"}11. Production Notes
Safety Checks: When enable_safety=True, the validator automatically detects hallucination admission phrases, sensitive data leakage (credit card numbers, SSNs, email addresses, API keys, Bearer tokens), and prompt injection attempts. Any match raises a FAILED or WARNING validation result.
Caching: Parse results are stored in an in-memory dict keyed by MD5 hash of the input text. This benefits pipelines that process repeated outputs. Call clear_cache() when the schema or mode changes at runtime.
Without Pydantic: The entire module functions without Pydantic installed. It transparently falls back to dataclasses for all schema types.
Without PyYAML: YAML parsing is disabled if pyyaml is not installed, but all other formats work normally.
ParseTrace: Every ParseResult includes a ParseTrace with a complete audit record: detected format, fix strategy applied, retry count, all validation results, errors, warnings, and total duration in milliseconds. Use .trace.summary() for a compact dict suitable for structured logging and observability pipelines.
community/output_parser.md