Your AI Agent Is Failing Because of Your Data Layer, Not Your Model

#agents #ai #dataengineering #llm

Here's a pattern I keep seeing: a team builds an AI agent, the demo works, they ship it, and within a few weeks the outputs are unreliable. Someone opens a ticket about hallucinations. Someone else suggests switching to a better model.

The model isn't the issue. The data feeding the model is.

The actual failure anatomy

Multi-agent frameworks like OpenHands and MetaGPT show failure rates above 85% in production-like conditions. The failures cluster around one root cause: the agent received ambiguous, inconsistent, or semantically wrong context — and produced a confident answer based on it.

Three patterns account for most of what I see:

1. Undocumented schemas

Your agent is calling a database tool and getting back rows from a table called accounts. What does status mean in that table? What are the valid values? Does null mean inactive, never set, or pending review?

The model doesn't know. It infers from context. Sometimes it guesses right. Often it doesn't.

The fix is a schema registry — a structured description of every field your agent will query, written in natural language and attached as system context.

SCHEMA_REGISTRY = {
    "accounts": {
        "status": {
            "type": "enum",
            "values": ["active", "pending", "churned", "suspended"],
            "null_means": "record created but onboarding not completed",
            "notes": "EU records use 'suspended' for GDPR-deleted accounts, not 'churned'"
        },
        "revenue_usd": {
            "type": "float",
            "notes": "6-month trailing average as of last ETL run. NOT point-in-time.",
            "freshness_sla_hours": 24
        }
    }
}

def build_agent_context(table_name: str, rows: list) -> str:
    schema = SCHEMA_REGISTRY.get(table_name, {})
    schema_block = "\n".join(
        f"- {col}: {meta.get('notes', '')} | null_means: {meta.get('null_means', 'unknown')}"
        for col, meta in schema.items()
    )
    return f"Schema context for {table_name}:\n{schema_block}\n\nData:\n{rows}"

2. No normalization before inference

If your agent draws from more than one data source — and it almost certainly does — those sources use different conventions. One vendor sends dates as MM/DD/YYYY. Your internal system uses ISO 8601. Your CRM exports currency as $1,234.56. Your warehouse stores it as a float in cents.

def normalize_record(record: dict, source: str) -> dict:
    normalized = record.copy()

    # Normalize dates to ISO 8601
    for field in ["created_at", "updated_at", "contract_end"]:
        if field in normalized and normalized[field]:
            normalized[field] = parse_date_any_format(normalized[field])

    # Normalize currency to float USD
    if "revenue" in normalized:
        val = str(normalized["revenue"]).replace("$", "").replace(",", "").strip()
        if source == "crm_legacy":
            normalized["revenue"] = float(val) / 100  # legacy stores in cents
        else:
            normalized["revenue"] = float(val)

    normalized["_source"] = source
    return normalized

3. No freshness tracking

Your agent is confident. It's using your pricing data to answer a customer question. That pricing data was last updated 72 hours ago and there was a change yesterday. The agent doesn't know.

def get_data_with_freshness(table: str, db_conn) -> dict:
    rows = db_conn.query(f"SELECT * FROM {table}")
    last_updated = db_conn.query(f"SELECT MAX(updated_at) as ts FROM {table}")[0]["ts"]
    age_hours = (datetime.utcnow() - last_updated).total_seconds() / 3600
    freshness_sla = SCHEMA_REGISTRY.get(table, {}).get("freshness_sla_hours", 24)

    return {
        "data": rows,
        "freshness": {
            "last_updated": last_updated.isoformat(),
            "age_hours": round(age_hours, 1),
            "within_sla": age_hours <= freshness_sla,
            "warning": f"Data is {age_hours:.0f}h old (SLA: {freshness_sla}h)" if age_hours > freshness_sla else None
        }
    }

Pass the freshness metadata to the model. Tell it to caveat answers when data is stale.

The build order that actually works

When we take on an AI deployment at Nu Terra Labs, the first two weeks are almost entirely data infrastructure. Schema audit, normalization pipeline, freshness monitoring, validation sets. The actual agent code comes third.

This feels backwards to most clients. They hired us to build AI, not to document database fields. But this sequencing is why the things we build work in month six the way they worked in week one.

Build your data layer first. Your model doesn't need to be smarter. It needs better inputs.

If you're hitting this in production and want a second set of eyes, feel free to DM me — happy to dig in.

Top comments (1)

Pranav Gore • Jun 3

Hi, I hope you are doing well. We are a software development team. We hunt for US jobs using Us job profile. So we are looking for a senior developer who can work with us.
Your role is to take part in the job interviews and pass the interviews. If your English is fluent, we can work together. If you are interested, please kindly send me message. I will explain more detail. Thank you!
Whatsapp: +1 (351) 234-6532
Telegram: @lionking06230810