AI Integration · Development
LLM Structured Outputs in 2026: Reliable JSON Without the Parser Nightmares
Getting a language model to return valid, schema-conforming JSON is harder than it looks. Here's what works in production, from native structured output APIs to library-level validation.
Anurag Verma
7 min read
Sponsored
The first time you ask an LLM to return JSON, it usually works. The hundredth time, you find the edge cases: a trailing comma, a key spelled differently than you specified, a markdown code fence wrapped around what should be a raw object, or a model that decides to explain itself before the JSON starts.
These failures are annoying in development. In production, they silently break data pipelines, crash parsers, and cause user-facing errors that are hard to reproduce because the model won’t always make the same mistake twice.
Structured outputs have matured significantly in the past year. Here’s what actually works.
Why LLMs Struggle With Schema Adherence
Language models generate text token by token, sampling from a probability distribution at each step. There’s nothing in that process that intrinsically prevents a model from generating "price": "ten dollars" when your schema says "price" should be a number.
Instruction following helps — “return valid JSON with this schema” works most of the time — but it’s probabilistic. A well-prompted GPT-4o or Claude will comply most of the time, but “most of the time” is a problem when downstream code has no fallback.
The solutions fall into three approaches:
- Constrained generation (guaranteed valid output, provider-side)
- Validation with retry (application-level, works with any model)
- Schema-aware libraries (abstracts the retry logic)
Constrained Generation
Several providers now support native structured output that enforces schema adherence at the generation layer. The model physically cannot produce a token sequence that would violate the schema.
OpenAI Structured Outputs
OpenAI’s structured outputs (available on gpt-4o and newer) accept a JSON Schema and guarantee compliant output:
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class ProductExtraction(BaseModel):
name: str
price: float
in_stock: bool
category: str
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "user", "content": "Extract product info: 'Blue Widget, $14.99, available in warehouse'"}
],
response_format=ProductExtraction,
)
product = response.choices[0].message.parsed
print(product.name, product.price, product.in_stock)
The .parsed attribute gives you a Pydantic model directly. No JSON parsing, no validation step. If the model returns something that violates the schema, the SDK raises an error rather than silently returning malformed data.
Anthropic Tool Use for Structured Extraction
Claude doesn’t have a structured_output mode in the same form, but tool use reliably produces schema-conforming data because the model must fill in a typed function call:
import anthropic
import json
client = anthropic.Anthropic()
tools = [{
"name": "extract_product",
"description": "Extract product information from text",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"},
"category": {"type": "string"}
},
"required": ["name", "price", "in_stock", "category"]
}
}]
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=tools,
tool_choice={"type": "tool", "name": "extract_product"},
messages=[
{"role": "user", "content": "Extract product info: 'Blue Widget, $14.99, available in warehouse'"}
]
)
tool_input = response.content[0].input
print(tool_input)
# {'name': 'Blue Widget', 'price': 14.99, 'in_stock': True, 'category': 'hardware'}
tool_choice: {"type": "tool", "name": "extract_product"} forces the model to fill in that specific tool call. Combined with a well-defined input schema, this is reliable enough for production use without retry logic.
Validation With Retry
When you can’t use constrained generation (self-hosted models, providers without structured output support, models too small to follow complex schemas reliably), validation with retry is the fallback:
import json
from jsonschema import validate, ValidationError
import time
PRODUCT_SCHEMA = {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
},
"required": ["name", "price", "in_stock"]
}
def extract_json(text: str) -> dict:
# Strip markdown code fences if present
text = text.strip()
if text.startswith("```"):
lines = text.split("\n")
text = "\n".join(lines[1:-1])
return json.loads(text)
def get_structured_output(prompt: str, schema: dict, max_retries: int = 3) -> dict:
validation_hint = f"\n\nReturn ONLY valid JSON matching this schema:\n{json.dumps(schema, indent=2)}"
for attempt in range(max_retries):
response_text = call_llm(prompt + validation_hint)
try:
data = extract_json(response_text)
validate(instance=data, schema=schema)
return data
except (json.JSONDecodeError, ValidationError) as e:
if attempt < max_retries - 1:
prompt += f"\n\nPrevious attempt failed: {str(e)[:200]}. Try again."
time.sleep(0.5 * (attempt + 1))
else:
raise ValueError(f"Failed to get valid output after {max_retries} attempts") from e
The key detail: append the specific error to the retry prompt. “Your previous response failed JSON validation: Missing required property ‘price’” gets a better correction than a generic “try again.”
The instructor Library
instructor (by Jason Liu) is the most widely used library for this pattern. It wraps OpenAI, Anthropic, and several other providers with automatic validation and retry using Pydantic models:
import instructor
from anthropic import Anthropic
from pydantic import BaseModel, field_validator
client = instructor.from_anthropic(Anthropic())
class ProductExtraction(BaseModel):
name: str
price: float
in_stock: bool
category: str
@field_validator("price")
@classmethod
def price_must_be_positive(cls, v):
if v < 0:
raise ValueError("Price cannot be negative")
return v
product, completion = client.chat.completions.create_with_completion(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract product info: 'Blue Widget, $14.99, available'"}
],
response_model=ProductExtraction,
)
instructor handles the retry logic, the prompt injection of the schema, and the Pydantic validation. It also provides usage stats on the completion object so you can track token costs across retries.
The library supports streaming with partial validation, which is useful for long extraction tasks where you want to start processing before the full response arrives.
Schema Design for Better Results
How you define the schema affects compliance rate, not just what gets validated.
Use specific types with descriptions:
# Vague — the model doesn't know what format to use
class Bad(BaseModel):
date: str # "January 5th", "2026-01-05", "01/05/26" — all valid strings
# Unambiguous
class Good(BaseModel):
date: str = Field(description="ISO 8601 date string, e.g. '2026-01-05'")
Break complex nested objects into smaller schemas. A schema with 15 required fields and 3 levels of nesting will see more failures than one with 5 fields. If you need complex data, extract it in stages: first the top-level structure, then nested details for each item.
Add descriptions to fields:
class InvoiceLineItem(BaseModel):
description: str = Field(description="Short description of the item or service")
quantity: float = Field(description="Number of units, can be fractional for hourly work")
unit_price: float = Field(description="Price per unit in USD, not the line total")
total: float = Field(description="quantity * unit_price")
The description shows up in the generated JSON Schema and gets included in the prompt. Models read it and conform to it.
When to Use Each Approach
| Situation | Best Option |
|---|---|
| OpenAI models, schema matters | Native structured outputs |
| Anthropic models, schema matters | Tool use with forced tool_choice |
| Any model, moderate complexity | instructor library |
| Self-hosted or fine-tuned model | Validation with retry |
| Simple extraction, high volume | One-shot prompt + JSON parse + cheap retry |
| Complex nested schema | Stage it: extract in multiple passes |
Common Failure Patterns
The explainer: The model writes “Here is the JSON you requested:” before the JSON. Strip preamble before parsing. Constrained generation prevents this; prompting-only approaches do not.
The approximator: When asked for a number, the model returns "approximately 50" or "~50". Add a field description: “Return as a plain number with no text, e.g. 50”.
The null evader: The model returns an empty string "" instead of null for missing optional fields. Use Optional[str] = None in Pydantic and let instructor catch the validation failure.
The hallucinating enumerator: When you have an enum field (e.g. category from a fixed list), the model invents a category that doesn’t exist. Use Literal["A", "B", "C"] as the type so validation rejects unknown values.
Structured outputs have gotten reliable enough that the parse-and-hope approach from a few years ago is not the right default anymore. For any production feature that depends on structured data from an LLM, use one of the constraint-based approaches and validate with Pydantic. The retry cost is a small fraction of what bad parses cost at scale.
Sponsored
More from this category
More from AI Integration
LLM Evals in Practice: Testing AI Features Before They Go Wrong
Feature Flags for AI Features: Shipping Safely When Outputs Are Non-Deterministic
LLM API Costs Are Out of Control: A Production Guide to Cutting Your Bill
Sponsored
The dispatch
Working notes from
the studio.
A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored