Fennec Community community/output_parser.md

Output Parser Modular

Purpose: A production-grade engine for parsing, validating, and fault-tolerantly handling raw LLM outputs.

Architecture Overview

The module operates as a multi-stage pipeline:

Raw LLM Text
     │
     ▼
FormatDetector     ← Auto-detects format (JSON / YAML / CSV / ...)
     │
     ▼
Format Parsers     ← Converts text into structured Python objects
     │
     ▼
OutputFixer        ← Repairs broken or incomplete outputs
     │
     ▼
OutputValidator    ← Validates correctness, types, and safety
     │
     ▼
RetryHandler       ← Re-prompts LLM with escalating instructions on failure
     │
     ▼
ParseResult        ← Final typed, audited result

Imports

from fennec_community.output_parser import (
    # Core
    OutputParser, ParseError,
    create_answer_parser, create_json_parser, create_tool_call_parser,

    # Schemas
    AnswerSchema, ToolCallSchema, RetrievalResultSchema,
    RankedAnswersSchema, FieldSchema,

    # Enums
    OutputFormat, ParseMode, FixStrategy, ValidationStatus,

    # Results & Tracing
    ParseResult, ParseTrace, ValidationResult,

    # Format Detection
    FormatDetector, FormatCandidate,

    # Validation
    OutputValidator, ValidationRule, build_answer_validator,

    # Fixing
    OutputFixer, build_answer_fixer,

    # Retry
    RetryHandler, RetryResult, RetryStrategy, graceful_fallback,
)

1. Core Class: `OutputParser`

Purpose: The central orchestrator — combines all pipeline stages into a single, unified interface.

`OutputParser.init()`

OutputParser(
    schema=None,
    fields=None,
    mode=ParseMode.LENIENT,
    expected_format=None,
    llm_fn=None,
    max_retries=2,
    enable_safety=True,
    enable_cache=True,
    original_prompt="",
)

Parameter	Type	Default	Description
`schema`	`Type[BaseModel]` \| `None`	`None`	Pydantic model class. When provided, parsed dicts are automatically cast to this type.
`fields`	`List[FieldSchema]` \| `None`	`None`	Explicit field definitions as an alternative to a Pydantic schema.
`mode`	`ParseMode`	`LENIENT`	Strictness level: `STRICT` / `LENIENT` / `SEMANTIC` / `TOOL_CALL`
`expected_format`	`OutputFormat` \| `None`	`None`	Force a specific format, skipping auto-detection.
`llm_fn`	`Callable[[str], str]` \| `None`	`None`	An LLM callable used during fix and retry operations.
`max_retries`	`int`	`2`	Maximum number of LLM regeneration attempts on parse failure.
`enable_safety`	`bool`	`True`	Enables safety checks: hallucination markers, data leakage, prompt injection.
`enable_cache`	`bool`	`True`	Caches successful parse results in memory (keyed by MD5 hash of input).
`original_prompt`	`str`	`""`	The original user prompt, embedded in retry prompts for context.

`OutputParser.parse()`

def parse(
    text: str,
    expected_format: Optional[OutputFormat] = None,
) -> ParseResult

Purpose: Parses raw LLM text and returns a fully validated, optionally typed ParseResult with a complete audit trail.

Parameter	Type	Description
`text`	`str`	Raw LLM output string to parse.
`expected_format`	`OutputFormat` \| `None`	Override the detected format for this call only.

Returns: ParseResult

Property	Type	Description
`.data`	`Any`	The parsed, validated, typed output (dict / Pydantic instance / list / ...).
`.ok`	`bool`	`True` if parsing succeeded and data is not `None`.
`.trace`	`ParseTrace`	Full audit trail of every pipeline stage.
`.raw`	`str`	The original, unmodified LLM text.

Raises: ParseError in ParseMode.STRICT if parsing fails after all recovery attempts.

Example:

parser = OutputParser()
result = parser.parse('{"answer": "Paris", "confidence": 0.98}')

if result.ok:
    print(result.data)                      # {'answer': 'Paris', 'confidence': 0.98}
    print(result.trace.detected_format)     # OutputFormat.JSON
    print(result.trace.duration_ms)         # e.g. 1.23

`OutputParser.parse_typed()`

def parse_typed(
    text: str,
    schema: Type[T],
    expected_format: Optional[OutputFormat] = None,
) -> T

Purpose: Parses text and returns a typed instance of schema directly. This is a convenience shorthand for parser.parse(text).as_typed(schema).

Parameter	Type	Description
`text`	`str`	Raw LLM output string.
`schema`	`Type[T]`	The class to cast the parsed data into.
`expected_format`	`OutputFormat` \| `None`	Optional format override.

Returns: An instance of type T (Pydantic model or dataclass).

Raises: ParseError if parsing fails; TypeError if the cast to schema fails.

Example:

from fennec_community.output_parser import AnswerSchema

parser = OutputParser(schema=AnswerSchema)
answer: AnswerSchema = parser.parse_typed(raw_text, AnswerSchema)

print(answer.answer)      # "Paris"
print(answer.confidence)  # 0.98

`OutputParser.get_format_instructions()`

def get_format_instructions() -> str

Purpose: Generates a format instruction string ready to be embedded directly into a System Prompt or User Prompt, guiding the LLM to produce output in the expected structure.

Takes no parameters.

Returns: str — a prompt-ready instruction block.

Behavior:

If schema is set → generates a full JSON Schema with an example
If fields are set → generates a key: <type> list with descriptions
If neither → returns a generic JSON example

Example:

parser = OutputParser(schema=AnswerSchema)
instructions = parser.get_format_instructions()
# "Return ONLY valid JSON matching this schema:\n```json\n{...}\n```\nDo not include any explanation..."

prompt = f"Answer the following question.\n{instructions}\nQuestion: What is the capital of France?"

`OutputParser.clear_cache()`

def clear_cache() -> None

Purpose: Clears all in-memory cached parse results. Useful when the schema or mode changes at runtime, or after long test sessions.

Takes no parameters. Returns nothing.

Example:

parser.clear_cache()

2. Factory Functions

Convenience functions that return a pre-configured OutputParser for the most common use cases.

`create_answer_parser()`

def create_answer_parser(
    llm_fn: Optional[Callable[[str], str]] = None,
    max_retries: int = 2,
    strict: bool = False,
) -> OutputParser

Purpose: Creates a parser pre-configured for standard RAG AnswerSchema outputs — includes full safety checks and schema validation with zero additional setup.

Parameter	Type	Default	Description
`llm_fn`	`Callable[[str], str]` \| `None`	`None`	LLM callable for retry on failure.
`max_retries`	`int`	`2`	Number of regeneration attempts.
`strict`	`bool`	`False`	`True` = raise `ParseError` on failure; `False` = use graceful fallback.

Returns: A ready-to-use OutputParser targeting AnswerSchema.

Example:

parser = create_answer_parser(llm_fn=my_llm, strict=True)
result = parser.parse(raw_llm_output)
answer: AnswerSchema = result.data
print(answer.answer, answer.sources, answer.confidence)

`create_json_parser()`

def create_json_parser(
    schema: Optional[Type] = None,
    llm_fn: Optional[Callable[[str], str]] = None,
    strict: bool = False,
) -> OutputParser

Purpose: Creates a parser dedicated to JSON outputs with optional Pydantic schema enforcement. Ideal when the LLM is expected to return pure JSON.

Parameter	Type	Default	Description
`schema`	`Type` \| `None`	`None`	Pydantic model to cast the parsed result into automatically.
`llm_fn`	`Callable[[str], str]` \| `None`	`None`	LLM callable for retry on failure.
`strict`	`bool`	`False`	Strictness level on failure.

Returns: An OutputParser locked to OutputFormat.JSON.

Example:

from pydantic import BaseModel

class ProductSchema(BaseModel):
    name: str
    price: float
    in_stock: bool

parser = create_json_parser(schema=ProductSchema, strict=True)
product: ProductSchema = parser.parse_typed(raw_text, ProductSchema)

`create_tool_call_parser()`

def create_tool_call_parser(
    llm_fn: Optional[Callable[[str], str]] = None,
) -> OutputParser

Purpose: Creates a parser specialized for Tool Call outputs. Supports both OpenAI function-call JSON format and ReAct-style Action: ... / Action Input: ... format. Safety checks are disabled by default as tool environments are trusted.

Parameter	Type	Default	Description
`llm_fn`	`Callable[[str], str]` \| `None`	`None`	LLM callable for a single retry attempt.

Returns: An OutputParser configured with ParseMode.TOOL_CALL and OutputFormat.TOOL_CALL.

Example:

parser = create_tool_call_parser()
result = parser.parse('Action: search\nAction Input: {"query": "weather in Cairo"}')
# result.data → {"tool_name": "search", "arguments": {"query": "weather in Cairo"}, "thought": None}

3. Format Detection: `FormatDetector`

Purpose: Analyses raw LLM text using multi-signal heuristics to determine its format before parsing begins.

`FormatDetector.detect()`

def detect(text: str) -> OutputFormat

Purpose: Returns the single most likely OutputFormat for the given text.

Parameter	Type	Description
`text`	`str`	The raw LLM output to analyse.

Returns: OutputFormat enum value.

Example:

detector = FormatDetector()
detector.detect('{"key": "value"}')        # → OutputFormat.JSON
detector.detect("1. First\n2. Second")     # → OutputFormat.NUMBERED_LIST
detector.detect("| A | B |\n|---|---|\n")  # → OutputFormat.MARKDOWN_TABLE

`FormatDetector.rank()`

def rank(text: str) -> List[FormatCandidate]

Purpose: Returns a ranked list of all plausible formats with confidence scores for each — invaluable for debugging ambiguous outputs.

Parameter	Type	Description
`text`	`str`	The raw text to analyse.

Returns: List[FormatCandidate] — sorted descending by confidence. Each FormatCandidate contains:

Property	Type	Description
`.format`	`OutputFormat`	The detected format.
`.confidence`	`float`	Confidence score from `0.0` to `1.0`.
`.evidence`	`str`	Human-readable reason for this score.

Example:

detector = FormatDetector()
candidates = detector.rank("name: John\nage: 30\ncity: Cairo")
for c in candidates:
    print(f"{c.format.value}: {c.confidence:.2f} — {c.evidence}")
# key_value:  0.60 — 3 key: value pairs
# yaml:       0.50 — 3 key: value lines
# plain_text: 0.25 — default text fallback

`FormatDetector.detect_with_confidence()`

def detect_with_confidence(text: str) -> Tuple[OutputFormat, float]

Purpose: Returns the best format alongside its confidence score as a single tuple — a practical shorthand when you need both values together.

Parameter	Type	Description
`text`	`str`	The raw text to analyse.

Returns: Tuple[OutputFormat, float] — (best_format, confidence_score)

Example:

detector = FormatDetector()
fmt, confidence = detector.detect_with_confidence(text)

if confidence < 0.5:
    print("Warning: format is ambiguous, consider using LENIENT mode")

4. Validation: `OutputValidator`

Purpose: Validates parsed outputs across four sequential layers to ensure correctness, type safety, business rules, and security.

`OutputValidator.init()`

OutputValidator(
    fields=None,
    rules=None,
    pydantic_model=None,
    enable_safety=True,
)

Parameter	Type	Default	Description
`fields`	`List[FieldSchema]` \| `None`	`None`	Fields to validate for presence and type.
`rules`	`List[ValidationRule]` \| `None`	`None`	Custom business-rule predicates.
`pydantic_model`	`Type` \| `None`	`None`	Pydantic model for structural validation (Layer 4).
`enable_safety`	`bool`	`True`	Enables hallucination, data leakage, and prompt injection checks.

The four validation layers:

Layer	Name	What It Checks
1	Schema Completeness	Required fields present with correct types
2	Business Rules	Custom `ValidationRule` predicates
3	Safety Checks	Hallucination markers, PII/credential leakage, prompt injection
4	Pydantic Validation	Full structural validation against a Pydantic model

`OutputValidator.validate()`

def validate(
    data: Any,
    raw_text: str = "",
    trace: Optional[ParseTrace] = None,
) -> List[ValidationResult]

Purpose: Runs all validation layers and returns detailed per-check results. If a trace is passed, results are automatically appended to it.

Parameter	Type	Description
`data`	`Any`	The parsed data to validate (typically a `dict`).
`raw_text`	`str`	The original LLM text (used for safety checks).
`trace`	`ParseTrace` \| `None`	If provided, validation results are added to the trace.

Returns: List[ValidationResult]. Each result contains:

Property	Type	Description
`.status`	`ValidationStatus`	`PASSED` / `FAILED` / `WARNING` / `SKIPPED`
`.field`	`str` \| `None`	The associated field name (if applicable).
`.message`	`str`	Error or warning message.
`.value`	`Any`	The value that caused the issue.
`.passed`	`bool`	`True` if `status` is `PASSED`.

Example:

validator = OutputValidator(
    fields=[FieldSchema("answer", "The answer text", dtype="str", required=True)],
    enable_safety=True,
)
results = validator.validate({"answer": "Paris"}, raw_text=raw_output)
failures = [r for r in results if not r.passed]

`OutputValidator.is_valid()`

def is_valid(
    data: Any,
    raw_text: str = "",
) -> bool

Purpose: A quick pass/fail check — returns True only if all validation layers pass (warnings and skipped checks are not treated as failures).

Parameter	Type	Description
`data`	`Any`	The parsed data.
`raw_text`	`str`	Original LLM text for safety checks.

Returns: bool

Example:

if not validator.is_valid(parsed_data, raw_text):
    raise ValueError("Output did not pass validation")

`OutputValidator.get_failures()`

def get_failures(
    data: Any,
    raw_text: str = "",
) -> List[ValidationResult]

Purpose: Returns only the failed validation results — ideal for structured logging and error reporting.

Parameter	Type	Description
`data`	`Any`	The parsed data.
`raw_text`	`str`	Original LLM text.

Returns: List[ValidationResult] — empty list if all checks pass.

Example:

failures = validator.get_failures(data, raw_text)
if failures:
    for f in failures:
        logger.error("[%s] %s", f.field, f.message)

`build_answer_validator()`

def build_answer_validator(enable_safety: bool = True) -> OutputValidator

Purpose: Factory that returns a pre-configured OutputValidator for AnswerSchema outputs — validates answer presence, confidence range [0.0, 1.0], and runs all safety checks.

Parameter	Type	Default	Description
`enable_safety`	`bool`	`True`	Enable or disable safety checks.

Returns: A ready-to-use OutputValidator.

Example:

validator = build_answer_validator(enable_safety=True)
is_ok = validator.is_valid(parsed_data, raw_text)

5. Custom Rules: `ValidationRule`

Purpose: Defines a named, callable validation rule with configurable severity.

`ValidationRule.init()`

ValidationRule(
    name: str,
    predicate: Callable[[Any], bool],
    message: str,
    field: Optional[str] = None,
    severity: ValidationStatus = ValidationStatus.FAILED,
)

Parameter	Type	Description
`name`	`str`	Rule identifier (used in logs).
`predicate`	`Callable[[Any], bool]`	A function that receives the data and returns `True` if the check passes.
`message`	`str`	Error message shown on failure.
`field`	`str` \| `None`	The associated field name (optional, for context).
`severity`	`ValidationStatus`	Failure severity: `FAILED` (default) or `WARNING`.

`ValidationRule.check()`

def check(data: Any) -> ValidationResult

Purpose: Applies the predicate to the given data and returns a ValidationResult.

Parameter	Type	Description
`data`	`Any`	The data to validate.

Returns: ValidationResult

Example:

rule = ValidationRule(
    name="answer_min_length",
    predicate=lambda d: len(d.get("answer", "")) >= 10,
    message="Answer is too short (less than 10 characters)",
    field="answer",
    severity=ValidationStatus.WARNING,
)
result = rule.check({"answer": "Yes"})
# result.status → ValidationStatus.WARNING

6. Fault Tolerance: `OutputFixer`

Purpose: Repairs malformed, incomplete, or broken LLM outputs using a hierarchy of four progressively deeper strategies.

`OutputFixer.init()`

OutputFixer(
    required_fields=None,
    field_defaults=None,
    llm_fn=None,
)

Parameter	Type	Default	Description
`required_fields`	`List[str]` \| `None`	`None`	Field names that must be present in the output.
`field_defaults`	`Dict[str, Any]` \| `None`	`None`	Default values injected when a required field is missing.
`llm_fn`	`Callable[[str], str]` \| `None`	`None`	LLM callable, required for the `LLM_REFORMAT` strategy.

`OutputFixer.fix()`

def fix(
    text: str,
    expected_format: OutputFormat = OutputFormat.JSON,
) -> Tuple[str, FixStrategy]

Purpose: Attempts to repair a broken text string using four escalating strategies, returning the repaired text and the strategy that succeeded. The caller should re-parse the returned text.

Parameter	Type	Description
`text`	`str`	The malformed or unparseable text.
`expected_format`	`OutputFormat`	The expected format, which influences which repair logic is applied.

Returns: Tuple[str, FixStrategy]

str — the repaired text (must be re-parsed by the caller)
FixStrategy — the strategy used, or FixStrategy.NONE if all strategies failed

Repair strategies applied in order:

Strategy	Description
`REGEX_REPAIR`	Strips markdown fences, fixes trailing commas, converts single quotes, quotes bare keys
`FIELD_INJECTION`	Injects missing required fields with their default values
`FALLBACK_PARSE`	Parses as key-value pairs and re-serializes as JSON
`LLM_REFORMAT`	Sends a reformat request to the LLM (requires `llm_fn`)

Example:

fixer = OutputFixer(
    required_fields=["answer"],
    field_defaults={"answer": "", "confidence": 0.5},
)
fixed_text, strategy = fixer.fix("```json\n{answer: 'Paris'}\n```", OutputFormat.JSON)
# → ('{"answer": "Paris"}', FixStrategy.REGEX_REPAIR)

`OutputFixer.fix_dict()`

def fix_dict(
    data: Dict[str, Any],
) -> Tuple[Dict[str, Any], FixStrategy]

Purpose: Repairs a partially parsed dict by injecting missing required fields directly — faster than fix() when structured data is already available.

Parameter	Type	Description
`data`	`Dict[str, Any]`	The incomplete parsed dict.

Returns: Tuple[Dict[str, Any], FixStrategy]

A completed dict with injected fields
FixStrategy.FIELD_INJECTION if fields were injected; FixStrategy.NONE if nothing was missing

Example:

fixer = OutputFixer(required_fields=["answer", "sources"])
fixed, strategy = fixer.fix_dict({"answer": "Paris"})
# → ({"answer": "Paris", "sources": None}, FixStrategy.FIELD_INJECTION)

`build_answer_fixer()`

def build_answer_fixer(llm_fn: Optional[Callable] = None) -> OutputFixer

Purpose: Factory that returns a pre-configured OutputFixer for AnswerSchema outputs — automatically injects answer: "", sources: [], and confidence: 0.5 for missing fields.

Parameter	Type	Default	Description
`llm_fn`	`Callable` \| `None`	`None`	LLM callable for the `LLM_REFORMAT` strategy.

Returns: A ready-to-use OutputFixer.

7. Retry & Regeneration: `RetryHandler`

Purpose: Manages LLM regeneration when parsing fails, re-prompting with progressively stricter instructions until a valid output is obtained or the retry budget is exhausted.

`RetryHandler.init()`

RetryHandler(
    llm_fn: LLMCallable,
    max_retries: int = 3,
    backoff_seconds: float = 0.5,
    pydantic_schema: Optional[Type] = None,
    required_fields: Optional[List[str]] = None,
)

Parameter	Type	Default	Description
`llm_fn`	`Callable[[str], str]`	required	The LLM callable to invoke on each retry.
`max_retries`	`int`	`3`	Maximum number of retry attempts.
`backoff_seconds`	`float`	`0.5`	Wait time between retries (multiplied by attempt index for linear backoff).
`pydantic_schema`	`Type` \| `None`	`None`	Used to generate a JSON schema example in retry prompts.
`required_fields`	`List[str]` \| `None`	`None`	Field names embedded in retry prompt examples.

`RetryHandler.run()`

def run(
    original_prompt: str,
    parse_fn: Callable[[str], Any],
    format_instructions: str = "",
    last_error: str = "",
    last_response: str = "",
) -> RetryResult

Purpose: Executes a full retry cycle — builds an improved prompt → calls LLM → tests parseability → repeats until success or budget exhaustion.

Parameter	Type	Description
`original_prompt`	`str`	The original user question or request.
`parse_fn`	`Callable[[str], Any]`	Parse function that raises an exception on failure.
`format_instructions`	`str`	Format hint string from `get_format_instructions()`.
`last_error`	`str`	Error message from the most recent failed parse.
`last_response`	`str`	The last raw LLM response that failed parsing.

Retry strategies applied in order:

Strategy	Description
`STRICT_FORMAT`	Adds explicit format instructions and the failed response to the prompt
`JSON_STRICT`	Forces JSON-only output with a full schema example
`SIMPLIFIED`	Strips the prompt down to a minimal question with a basic JSON example
`GRACEFUL_FAIL`	Returns a structured error payload

Returns: RetryResult

Property	Type	Description
`.success`	`bool`	`True` if any attempt succeeded.
`.response`	`str`	The raw LLM response from the winning attempt.
`.attempts`	`int`	Total number of actual LLM calls made.
`.strategy_used`	`RetryStrategy` \| `None`	The strategy that succeeded.
`.errors`	`List[str]`	Error messages from each failed attempt.
`.total_duration_ms`	`float`	Total wall-clock time in milliseconds.

Example:

handler = RetryHandler(llm_fn=my_llm, max_retries=3)
result = handler.run(
    original_prompt="What is the capital of France?",
    parse_fn=lambda t: json.loads(t),
    format_instructions='Return JSON: {"answer": "..."}',
    last_error="No JSON found",
    last_response="The capital of France is Paris.",
)
if result.success:
    parsed = json.loads(result.response)

`RetryResult.as_error_payload()`

def as_error_payload(original_query: str = "") -> Dict[str, Any]

Purpose: Converts a failed RetryResult into a structured error dictionary — useful for consistent error handling at the application level.

Parameter	Type	Description
`original_query`	`str`	The original user query (for context in the error payload).

Returns: Dict[str, Any]

{
    "error": True,
    "message": "Failed to obtain a valid response after all retries",
    "attempts": 3,
    "original_query": "...",
    "errors": ["Attempt 1 ...", "Attempt 2 ...", ...],
}

`graceful_fallback()`

def graceful_fallback(
    raw_text: str,
    query: str = "",
) -> Dict[str, Any]

Purpose: The final safety net — when all parsing, fixing, and retry attempts fail, this function returns a structured, always-safe dict that guarantees the pipeline never returns None or raises an unhandled exception.

Parameter	Type	Description
`raw_text`	`str`	The original text that could not be parsed.
`query`	`str`	The original user query (for debugging).

Returns: Dict[str, Any] — always contains:

{
    "answer": "<raw_text or error message>",
    "sources": [],
    "confidence": 0.0,
    "_parse_error": True,   # Flag to distinguish from real answers
    "_raw": "<first 500 chars of raw_text>",
    "_query": "<first 200 chars of query>",
}

Example:

fallback = graceful_fallback(broken_text, query="What is the capital of France?")
# Always safe — never returns None, never raises

8. Schemas & Data Types

`AnswerSchema`

Standard schema for RAG pipeline answers.

class AnswerSchema(BaseModel):
    answer: str                   # The answer text (required)
    sources: List[str]            # Source references (default: [])
    confidence: float             # Confidence score 0.0–1.0 (default: 1.0, auto-clamped)
    reasoning: Optional[str]      # Optional chain-of-thought (default: None)

`ToolCallSchema`

Schema for LLM tool/function call outputs.

class ToolCallSchema(BaseModel):
    tool_name: str                # Name of the tool to invoke (required)
    arguments: Dict[str, Any]     # Tool arguments (default: {})
    thought: Optional[str]        # Optional reasoning before the call (default: None)

`RetrievalResultSchema`

Schema for a single retrieved document result.

class RetrievalResultSchema(BaseModel):
    content: str                  # Document content (required)
    source: Optional[str]         # Source URL or ID (default: None)
    score: Optional[float]        # Relevance score 0.0–1.0 (default: None)
    metadata: Dict[str, Any]      # Additional metadata (default: {})

`RankedAnswersSchema`

Schema for multiple candidate answers with a designated best answer.

class RankedAnswersSchema(BaseModel):
    answers: List[AnswerSchema]   # List of candidates (required, min length: 1)
    best_index: int               # Index of the best answer (default: 0)

    @property
    def best(self) -> AnswerSchema:  # Direct access to the best answer

`FieldSchema`

Defines a single expected field in an LLM output for schema-less validation.

@dataclass
class FieldSchema:
    name: str                         # Field name
    description: str                  # Human-readable description
    dtype: str = "str"                # Type: str | int | float | bool | list | dict
    required: bool = True             # Whether the field is required
    aliases: List[str] = []           # Alternative field names to match
    default: Any = None               # Default value if missing
    choices: Optional[List] = None    # Restricts value to an allowed set

9. Enums Reference

`OutputFormat`

Value	Description
`JSON`	JSON object or array
`YAML`	YAML data
`CSV`	Tabular CSV data
`MARKDOWN_TABLE`	Pipe-delimited Markdown table
`NUMBERED_LIST`	Numbered list (`1.` / `a.`)
`BULLETED_LIST`	Bulleted list (`*` / `-` / `•`)
`KEY_VALUE`	Key: value pairs
`XML`	XML tag pairs
`TOOL_CALL`	Tool or function call invocation
`PLAIN_TEXT`	Plain unstructured text
`MIXED`	Mix of multiple formats
`UNKNOWN`	Could not be determined

`ParseMode`

Value	Description
`STRICT`	Raises `ParseError` on any failure — no fallback
`LENIENT`	Auto-fixes minor issues, uses graceful fallback on failure
`SEMANTIC`	Uses LLM to extract meaning from freeform text
`TOOL_CALL`	Specialized mode for parsing tool/function call syntax

`FixStrategy`

Value	Description
`REGEX_REPAIR`	Regex-based fixes (quotes, commas, fences, bare keys)
`FIELD_INJECTION`	Injects missing fields with default values
`FALLBACK_PARSE`	Parses as key-value and re-serializes as JSON
`LLM_REFORMAT`	Asks the LLM to reformat its own output
`NONE`	No fix was applied or possible

`RetryStrategy`

Value	Description
`STRICT_FORMAT`	Adds explicit format instructions to the retry prompt
`JSON_STRICT`	Forces JSON-only output with a schema example
`SIMPLIFIED`	Strips the prompt down to a minimal question
`GRACEFUL_FAIL`	Returns a structured error payload

`ValidationStatus`

Value	Description
`PASSED`	Check passed
`FAILED`	Check failed — treated as a hard error
`WARNING`	Check flagged a concern but did not fail
`SKIPPED`	Check was not applicable (e.g. field is optional and absent)

10. Complete Usage Examples

Basic Usage

from fennec_community.output_parser import OutputParser, AnswerSchema, ParseMode

parser = OutputParser(
    schema=AnswerSchema,
    mode=ParseMode.LENIENT,
    enable_safety=True,
)

result = parser.parse('{"answer": "Paris", "confidence": 0.95, "sources": ["Wikipedia"]}')

if result.ok:
    answer: AnswerSchema = result.data
    print(answer.answer)        # Paris
    print(answer.confidence)    # 0.95
    print(result.trace.summary())

With LLM and Retry

from fennec_community.output_parser import create_answer_parser, ParseError

def my_llm(prompt: str) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

parser = create_answer_parser(
    llm_fn=my_llm,
    max_retries=3,
    strict=True,
)

try:
    result = parser.parse(raw_llm_output)
    answer = result.data
except ParseError as e:
    print("Parse failed:", e)
    print("Trace:", e.trace.summary() if e.trace else "N/A")

Custom Validation Rules

from fennec_community.output_parser import OutputValidator, ValidationRule, FieldSchema, ValidationStatus

validator = OutputValidator(
    fields=[
        FieldSchema("answer", "The answer text", dtype="str", required=True),
        FieldSchema("confidence", "Confidence score", dtype="float", required=True),
    ],
    rules=[
        ValidationRule(
            name="min_confidence",
            predicate=lambda d: d.get("confidence", 0) >= 0.3,
            message="Confidence is too low (below 30%)",
            field="confidence",
            severity=ValidationStatus.WARNING,
        ),
        ValidationRule(
            name="answer_not_empty",
            predicate=lambda d: bool(d.get("answer", "").strip()),
            message="Answer field must not be empty",
            field="answer",
        ),
    ],
)

results = validator.validate(parsed_data, raw_text=raw_text)
failures = [r for r in results if not r.passed]

Format Detection Standalone

from fennec_community.output_parser import FormatDetector, OutputFormat

detector = FormatDetector()

# Single best format
fmt = detector.detect("| Name | Age |\n|------|-----|\n| Ali | 30 |")
# → OutputFormat.MARKDOWN_TABLE

# With confidence score
fmt, confidence = detector.detect_with_confidence(text)
if confidence < 0.5:
    print("Ambiguous format — consider forcing expected_format in the parser")

# Full ranking for debugging
for candidate in detector.rank(text):
    print(f"{candidate.format.value}: {candidate.confidence:.2f} — {candidate.evidence}")

Manual Fix and Re-Parse

from fennec_community.output_parser import OutputFixer, OutputFormat
import json

fixer = OutputFixer(
    required_fields=["answer", "confidence"],
    field_defaults={"answer": "", "confidence": 0.5},
)

broken = "```json\n{answer: 'Paris', confidence: '0.9'}\n```"
fixed_text, strategy = fixer.fix(broken, OutputFormat.JSON)

print(strategy)               # FixStrategy.REGEX_REPAIR
print(json.loads(fixed_text)) # {"answer": "Paris", "confidence": "0.9"}

11. Production Notes

Safety Checks: When enable_safety=True, the validator automatically detects hallucination admission phrases, sensitive data leakage (credit card numbers, SSNs, email addresses, API keys, Bearer tokens), and prompt injection attempts. Any match raises a FAILED or WARNING validation result.

Caching: Parse results are stored in an in-memory dict keyed by MD5 hash of the input text. This benefits pipelines that process repeated outputs. Call clear_cache() when the schema or mode changes at runtime.

Without Pydantic: The entire module functions without Pydantic installed. It transparently falls back to dataclasses for all schema types.

Without PyYAML: YAML parsing is disabled if pyyaml is not installed, but all other formats work normally.

ParseTrace: Every ParseResult includes a ParseTrace with a complete audit record: detected format, fix strategy applied, retry count, all validation results, errors, warnings, and total duration in milliseconds. Use .trace.summary() for a compact dict suitable for structured logging and observability pipelines.

Source: community/output_parser.md

Architecture Overview

Imports

1. Core Class: OutputParser

OutputParser.__init__()

OutputParser.parse()

OutputParser.parse_typed()

OutputParser.get_format_instructions()

OutputParser.clear_cache()

2. Factory Functions

create_answer_parser()

create_json_parser()

create_tool_call_parser()

3. Format Detection: FormatDetector

FormatDetector.detect()

FormatDetector.rank()

FormatDetector.detect_with_confidence()

4. Validation: OutputValidator

OutputValidator.__init__()

OutputValidator.validate()

OutputValidator.is_valid()

OutputValidator.get_failures()

build_answer_validator()

5. Custom Rules: ValidationRule

ValidationRule.__init__()

ValidationRule.check()

6. Fault Tolerance: OutputFixer

OutputFixer.__init__()

OutputFixer.fix()

OutputFixer.fix_dict()

build_answer_fixer()

7. Retry & Regeneration: RetryHandler

RetryHandler.__init__()

RetryHandler.run()

RetryResult.as_error_payload()

graceful_fallback()

8. Schemas & Data Types

AnswerSchema

ToolCallSchema

RetrievalResultSchema

RankedAnswersSchema

FieldSchema

9. Enums Reference

OutputFormat

ParseMode

FixStrategy

RetryStrategy

ValidationStatus

10. Complete Usage Examples

Basic Usage

With LLM and Retry

Custom Validation Rules

Format Detection Standalone

Manual Fix and Re-Parse

11. Production Notes

1. Core Class: `OutputParser`

`OutputParser.init()`

`OutputParser.parse()`

`OutputParser.parse_typed()`

`OutputParser.get_format_instructions()`

`OutputParser.clear_cache()`

`create_answer_parser()`

`create_json_parser()`

`create_tool_call_parser()`

3. Format Detection: `FormatDetector`

`FormatDetector.detect()`

`FormatDetector.rank()`

`FormatDetector.detect_with_confidence()`

4. Validation: `OutputValidator`

`OutputValidator.init()`

`OutputValidator.validate()`

`OutputValidator.is_valid()`

`OutputValidator.get_failures()`

`build_answer_validator()`

5. Custom Rules: `ValidationRule`

`ValidationRule.init()`

`ValidationRule.check()`

6. Fault Tolerance: `OutputFixer`

`OutputFixer.init()`

`OutputFixer.fix()`

`OutputFixer.fix_dict()`

`build_answer_fixer()`

7. Retry & Regeneration: `RetryHandler`

`RetryHandler.init()`

`RetryHandler.run()`

`RetryResult.as_error_payload()`

`graceful_fallback()`

`AnswerSchema`

`ToolCallSchema`

`RetrievalResultSchema`

`RankedAnswersSchema`

`FieldSchema`

`OutputFormat`

`ParseMode`

`FixStrategy`

`RetryStrategy`

`ValidationStatus`