Bryan Doss for AWS Community Builders

Posted on Feb 15

Your JSON Schema Is a Prompt - Tips for AWS Bedrock Structured Output

#bedrock #aws #llm #python

The JSON schema you pass to an LLM isn't just a structural contract. It's a prompt that directly controls output quality. Field names, descriptions, ordering, and enum values all steer the model's behavior just like your system prompt does. Get them right and you'll get reliably excellent outputs. Get them wrong and you'll get structurally valid garbage.

AWS Bedrock just shipped constrained decoding that guarantees your responses match a JSON schema. But guaranteed structure doesn't mean guaranteed quality. This post is about the second half of that equation: how to design schemas that produce the right data, not just the right shape.

Full runnable examples can be found in the accompanying GitHub repo.

The 30-second version of how Bedrock enforces schemas

When you submit a JSON schema to Bedrock's API:

Bedrock validates your schema against JSON Schema Draft 2020-12
On first use, it compiles the schema into a grammar (can take up to a few minutes)
The compiled grammar is cached for 24 hours per account
During generation, invalid tokens are masked out so the model literally cannot produce output that violates the schema

This isn't "generate then validate." The model is physically prevented from producing wrong-typed values, missing fields, or malformed JSON. Here's the basic API call:

import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
    },
    "required": ["name", "sentiment"],
    "additionalProperties": False,  # MANDATORY on every object
}

response = bedrock.converse(
    modelId="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{
        "role": "user",
        "content": [{"text": "Analyze: 'I love this product!' - Sarah"}],
    }],
    inferenceConfig={"maxTokens": 256},
    outputConfig={
        "textFormat": {
            "type": "json_schema",
            "structure": {
                "jsonSchema": {
                    "schema": json.dumps(schema),
                    "name": "sentiment_analysis",
                }
            },
        }
    },
)

data = json.loads(response["output"]["message"]["content"][0]["text"])
# {"name": "Sarah", "sentiment": "positive"} - guaranteed.

That's your foundation. Now let's talk about what makes the schema itself good or bad.

Tip 1: Field names are implicit instructions

LLMs generate tokens one at a time. When the model writes "customer_full_name":, that string becomes context that influences the next token. The model has been trained on billions of lines of code and docs. It knows what customer_full_name expects. It has no idea what cust_nm means. As AWS's docs put it: clear names like customer_email outperform generic names like field1.

Bad schema:

schema = {
    "type": "object",
    "properties": {
        "nm": {"type": "string"},
        "val": {"type": "number"},
        "cat": {"type": "string"},
    },
    "required": ["nm", "val", "cat"],
    "additionalProperties": False,
}
# Model output: {"nm": "Widget A", "val": 3, "cat": "B"}
# What does val mean? What's cat? The model is guessing too.

Good schema:

schema = {
    "type": "object",
    "properties": {
        "product_name": {"type": "string"},
        "rating_out_of_five": {"type": "number"},
        "product_category": {
            "type": "string",
            "enum": ["electronics", "clothing", "food", "home", "other"],
        },
    },
    "required": ["product_name", "rating_out_of_five", "product_category"],
    "additionalProperties": False,
}
# Model output: {"product_name": "Widget A", "rating_out_of_five": 4.2, "product_category": "electronics"}
# Every field is self-explanatory. The model knows exactly what to produce.

Same tokens, same price, dramatically better results.

Tip 2: Descriptions are micro-prompts

Schema description fields aren't just for humans. They're sent to the model as input context and function as inline instructions. The PARSE research system demonstrated that optimizing field descriptions (with other optimizations) achieved up to 64.7% improvement in extraction accuracy.

Here's a real-world example: extracting support tickets from emails:

schema = {
    "type": "object",
    "properties": {
        "customer_name": {
            "type": "string",
            "description": "Full name of the person who sent the email",
        },
        "issue_summary": {
            "type": "string",
            "description": "One-sentence summary of the problem, max 20 words",
        },
        "severity": {
            "type": "string",
            "enum": ["low", "medium", "high", "critical"],
            "description": "low=cosmetic, medium=degraded, high=broken, critical=data loss or outage",
        },
        "product_area": {
            "type": "string",
            "enum": ["billing", "api", "dashboard", "auth", "other"],
            "description": "Which product area the issue relates to. Use 'other' if unclear.",
        },
        "requires_followup": {
            "type": "boolean",
            "description": "True if the customer asked a direct question or requested a callback",
        },
    },
    "required": ["customer_name", "issue_summary", "severity", "product_area", "requires_followup"],
    "additionalProperties": False,
}

Notice how the severity description encodes business logic: what "high" means in your org's terms. Without it, the model guesses. With it, the model follows your definitions.

The issue_summary description includes a soft length constraint. Bedrock's grammar engine can't enforce word counts (no maxLength), but the model will follow it as an instruction.

Pro tip: changing descriptions doesn't invalidate Bedrock's grammar cache. You can iterate on wording to improve quality without triggering recompilation.

Tip 3: Field order controls reasoning quality

I think this is the most under appreciated principle. LLMs generate fields sequentially, so the order they appear determines the order the model thinks about them. Put an answer before the reasoning and the model commits before it thinks. Dylan Castillo demonstrated with statistical significance (p < 0.01) that placing reasoning fields before answer fields produces substantially better results.

Bad: answer first, reasoning after:

# ❌ Don't do this
schema = {
    "type": "object",
    "properties": {
        "is_fraudulent": {"type": "boolean"},
        "confidence": {"type": "number"},
        "reasoning": {"type": "string"},
    },
    "required": ["is_fraudulent", "confidence", "reasoning"],
    "additionalProperties": False,
}
# Model decides is_fraudulent BEFORE it reasons.
# The reasoning becomes post-hoc justification.

Good: reasoning first, then conclusion:

# ✅ Do this instead
fraud_schema = {
    "type": "object",
    "properties": {
        "risk_indicators": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Suspicious patterns observed in the transaction",
        },
        "analysis": {
            "type": "string",
            "description": "Step-by-step reasoning about whether this is fraudulent",
        },
        "is_fraudulent": {
            "type": "boolean",
            "description": "Final determination based on the analysis above",
        },
        "confidence": {
            "type": "number",
            "description": "Confidence between 0.0 and 1.0",
        },
    },
    "required": ["risk_indicators", "analysis", "is_fraudulent", "confidence"],
    "additionalProperties": False,
}

Now the model identifies indicators, reasons through them, then decides. It's the single biggest quality lever in schema design.

Here's that fraud detector wired up to Bedrock:

transaction = """
Transaction: $4,299 at "ELCTRNX STORE" on 2026-02-14 at 3:47 AM
Card ending: 8842, Location: Lagos, Nigeria
Cardholder home: Portland, Oregon
Previous transactions: Average $85, all within Oregon
"""

response = bedrock.converse(
    modelId="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{
        "role": "user",
        "content": [{"text": f"Analyze this transaction for fraud:\n{transaction}"}],
    }],
    inferenceConfig={"maxTokens": 1024},
    outputConfig={
        "textFormat": {
            "type": "json_schema",
            "structure": {
                "jsonSchema": {
                    "schema": json.dumps(fraud_schema),
                    "name": "fraud_analysis",
                }
            },
        }
    },
)

result = json.loads(response["output"]["message"]["content"][0]["text"])
if result["is_fraudulent"] and result["confidence"] > 0.8:
    block_transaction(result)

Tip 4: Enums do two jobs at once

Enums constrain outputs mechanically (the model can't produce tokens outside the set) and semantically (they tell the model what categories exist). No other schema feature does both. The AWS Bedrock ML blog recommends using enums whenever possible to improve accuracy.

Without enums:

{"sentiment": {"type": "string"}}
# You'll get: "positive", "Positive", "POSITIVE", "good",
# "mostly positive", "👍"... good luck with downstream logic.

With enums:

{"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral", "mixed"]}}
# Exactly one of those four values. Every. Single. Time.

Three design rules: use human-readable values ("positive" not "pos"), always include a fallback ("other" or "unknown"), and define meanings in descriptions.

Here's a full runnable example: structured code review:

code_review_schema = {
    "type": "object",
    "properties": {
        "issues": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "line_range": {
                        "type": "string",
                        "description": "e.g. '12-15' or '42'",
                    },
                    "severity": {
                        "type": "string",
                        "enum": ["critical", "warning", "suggestion", "nitpick"],
                        "description": "critical=bugs/security, warning=likely problems, suggestion=improvements, nitpick=style only",
                    },
                    "category": {
                        "type": "string",
                        "enum": ["security", "performance", "correctness", "readability", "other"],
                    },
                    "what": {"type": "string", "description": "What the issue is and why it matters"},
                    "fix": {"type": "string", "description": "Concrete code or approach to fix it"},
                },
                "required": ["line_range", "severity", "category", "what", "fix"],
                "additionalProperties": False,
            },
        },
        "overall_quality": {
            "type": "string",
            "enum": ["excellent", "good", "needs_work", "poor"],
        },
        "summary": {"type": "string", "description": "2-3 sentence overall assessment"},
    },
    "required": ["issues", "overall_quality", "summary"],
    "additionalProperties": False,
}

code_snippet = '''
def process_payment(card_number, amount):
    query = f"INSERT INTO payments VALUES ('{card_number}', {amount})"
    db.execute(query)
    return True
'''

response = bedrock.converse(
    modelId="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{
        "role": "user",
        "content": [{"text": f"Review this Python code:\n```
{% endraw %}
python\n{code_snippet}\n
{% raw %}
```"}],
    }],
    inferenceConfig={"maxTokens": 1024},
    outputConfig={
        "textFormat": {
            "type": "json_schema",
            "structure": {
                "jsonSchema": {
                    "schema": json.dumps(code_review_schema),
                    "name": "code_review",
                }
            },
        }
    },
)

review = json.loads(response["output"]["message"]["content"][0]["text"])
critical = [i for i in review["issues"] if i["severity"] == "critical"]
print(f"Found {len(critical)} critical issues")
for issue in critical:
    print(f"  Line {issue['line_range']}: [{issue['category']}] {issue['what']}")

Tip 5: Handle missing data with nullable types

If a field is required but the source data doesn't contain that info, the model may hallucinate rather than leave it empty. When using Instructor, their library's prompting guide recommends nullable types and fallback values to prevent this.

# ❌ Forces hallucination when company isn't mentioned
{
    "company_name": {"type": "string"}
}

# ✅ Lets the model say "I don't know"
{
    "company_name": {
        "type": ["string", "null"],
        "description": "Company name if mentioned in the text, null otherwise"
    }
}

Keep it required (Bedrock wants this), but make the type nullable. The description reinforces when null is appropriate.

Putting it all together: job posting extractor

Here's a complete pipeline combining every tip: reasoning first, descriptive names, rich descriptions, enums with fallbacks, and nullable types for missing data:

import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

job_schema = {
    "type": "object",
    "properties": {
        # REASONING FIRST: let the model observe before concluding
        "observations": {
            "type": "string",
            "description": "Key details noticed: compensation, requirements, red flags, remote policy",
        },
        # THEN structured extraction with descriptive names
        "job_title": {"type": "string", "description": "Exact job title as listed"},
        "company_name": {"type": "string"},
        "location": {
            "type": "object",
            "properties": {
                "city": {"type": ["string", "null"]},
                "state_or_country": {"type": ["string", "null"]},
                "remote_policy": {
                    "type": "string",
                    "enum": ["fully_remote", "hybrid", "on_site", "not_specified"],
                },
            },
            "required": ["city", "state_or_country", "remote_policy"],
            "additionalProperties": False,
        },
        "salary": {
            "type": "object",
            "properties": {
                "min_usd": {"type": ["integer", "null"], "description": "Min salary in USD, null if not listed"},
                "max_usd": {"type": ["integer", "null"], "description": "Max salary in USD, null if not listed"},
                "period": {
                    "type": "string",
                    "enum": ["annual", "monthly", "hourly", "not_specified"],
                },
            },
            "required": ["min_usd", "max_usd", "period"],
            "additionalProperties": False,
        },
        "experience_level": {
            "type": "string",
            "enum": ["intern", "entry", "mid", "senior", "staff", "principal", "executive", "not_specified"],
            "description": "Infer from title and requirements if not explicitly stated",
        },
        "required_skills": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Skills explicitly listed as required (not nice-to-have)",
        },
        "nice_to_have_skills": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Skills listed as preferred or bonus",
        },
        # CONCLUSIONS LAST
        "is_legitimate": {
            "type": "boolean",
            "description": "False if the posting shows scam signs (vague company, unrealistic pay)",
        },
    },
    "required": [
        "observations", "job_title", "company_name", "location",
        "salary", "experience_level", "required_skills",
        "nice_to_have_skills", "is_legitimate",
    ],
    "additionalProperties": False,
}

posting = """
Senior Backend Engineer - FinTech Startup (Series B)
Location: San Francisco, CA (Hybrid - 3 days/week)
Salary: $185,000 - $220,000 + equity

We're building the future of real-time payments. You'll own services
handling 50M+ transactions/day in Go on Kubernetes + PostgreSQL.

Must have: 5+ years backend, strong Go, distributed systems experience.
Bonus: payments industry experience, Rust.
"""

response = bedrock.converse(
    modelId="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{
        "role": "user",
        "content": [{"text": f"Extract job posting data:\n\n{posting}"}],
    }],
    inferenceConfig={"maxTokens": 1024},
    outputConfig={
        "textFormat": {
            "type": "json_schema",
            "structure": {
                "jsonSchema": {
                    "schema": json.dumps(job_schema),
                    "name": "job_extraction",
                }
            },
        }
    },
)

job = json.loads(response["output"]["message"]["content"][0]["text"])

print(f"{job['job_title']} at {job['company_name']}")
print(f"  Location: {job['location']['city']}, {job['location']['state_or_country']} ({job['location']['remote_policy']})")
if job['salary']['min_usd']:
    print(f"  Salary: ${job['salary']['min_usd']:,} - ${job['salary']['max_usd']:,} {job['salary']['period']}")
print(f"  Level: {job['experience_level']}")
print(f"  Required: {', '.join(job['required_skills'])}")
print(f"  Nice to have: {', '.join(job['nice_to_have_skills'])}")
print(f"  Legit: {'Yes' if job['is_legitimate'] else 'SUSPICIOUS'}")

Three things that will still break you

Token limit truncation. If maxTokens is too low, the JSON gets cut off mid-structure. Check stopReason. If it's "max_tokens", your output is probably malformed. Set it generously.
Safety refusals. If the model declines for policy reasons, you'll get a non-conforming response.
Structurally valid but semantically wrong. Constrained decoding guarantees the shape. It does NOT guarantee the content. A well-designed schema is your best defense, but validate business-critical outputs.

TL;DR

Your JSON schema is a prompt. Treat it like one.

Name fields descriptively: customer_full_name not cust_nm
Write descriptions: they're instructions the model follows
Order fields: reasoning first, conclusions last. Biggest quality lever
Use enums with fallbacks: mechanical + semantic constraint in one
Make absent data nullable: don't force hallucination
Keep schemas flat: deep nesting increases latency and errors
Set additionalProperties: false: Bedrock requires it everywhere
Set maxTokens high enough: truncation breaks everything

The schema is the prompt. Design it accordingly.

See accompanying GitHub repo for runnable code examples.

You can find me on LinkedIn | CTO & Partner @ EES.

DEV Community