DEV Community

AristoAIStack
AristoAIStack

Posted on • Originally published at aristoaistack.com

OpenAI and Anthropic Are Becoming AI Consultants — And That Should Terrify You

Here's the most expensive admission in AI history, and nobody's treating it like one.

OpenAI is hiring hundreds of engineers for a technical consulting team. Not to build better models. Not to ship new features. To sit next to enterprise customers and hold their hand while deploying AI agents that — and I cannot stress this enough — don't work reliably out of the box.

Anthropic? Same story. Working directly with customers to get their agents functioning. The two most advanced AI companies on Earth, the ones promising us AGI by Tuesday, are pivoting to what is essentially... Accenture with better branding.

Let that sink in for a moment.

The companies building the most sophisticated AI models ever created have discovered that selling those models is the easy part. Getting them to actually do useful things in a real business? That's where the dream hits the drywall.


The Fnac Story Is the Whole Industry in Miniature

Here's a detail from The Information's reporting that should be printed on warning labels:

French retailer Fnac tested AI models from OpenAI and Google for customer support. Simple enough use case, right? Chat with customers, look up orders, handle returns. The kind of task every AI demo absolutely nails on stage.

The agents kept mixing up serial numbers.

Not sometimes. Regularly. Consistently enough that Fnac couldn't deploy them. They only got it working after bringing in AI21 Labs for specialized help.

Think about that. Two of the world's most powerful AI systems — GPT and Gemini — couldn't reliably handle serial numbers. Not quantum physics. Not legal reasoning. Serial numbers. The kind of task that a SQL query and a dropdown menu have been handling flawlessly since 1998.

This is the gap between AI demos and AI deployments. And it's not a gap. It's a canyon. With spikes at the bottom.


95% of GenAI Pilots Fail to Reach Production

That's not my number. That's MIT's. Ninety-five percent.

Let me translate that for the executives currently budgeting seven figures for their "AI transformation initiative": for every 20 companies that spin up an AI pilot, 19 of them will never ship it to production. They'll burn through budget, demoralize their engineering teams, and quietly shelve the project during a quarterly review while pretending it was always just "exploratory."

And yet — and yet — Gartner predicts 40% of enterprise applications will include AI agents by 2026. That's not a prediction. That's a collision between expectations and reality happening at highway speed.

The question isn't whether AI agents will eventually work in enterprise. They will. The question is how much money, time, and credibility gets incinerated between now and then.


Why Enterprise AI Agents Actually Fail

The AI hype machine has convinced a lot of executives that deploying an agent is like installing Slack — sign up, configure a few settings, and you're live by Friday. The reality is more like performing open-heart surgery on a patient who's running a marathon.

Here's what actually breaks:

1. The Context Problem

LLMs don't understand your business. They understand language. There's a chasm between those two things.

Your company has tribal knowledge embedded in Confluence pages nobody reads, Slack threads from 2023, process documents that contradict each other, and institutional memory stored exclusively in the heads of three people who've been there since the founding. An AI agent has none of this. It has a system prompt and a prayer.

When Fnac's agents mixed up serial numbers, it wasn't because GPT-4 is dumb. It's because the model didn't have the contextual grounding to distinguish between product identifiers, order numbers, and reference codes that might look similar but mean completely different things in Fnac's systems.

2. The Integration Nightmare

AI agents don't exist in a vacuum. They need to connect to your CRM, your ERP, your ticketing system, your inventory database, your knowledge base, your authentication layer, and whatever ancient SOAP API your IT department built in 2009 that somehow still runs 40% of your operations.

OpenAI's new Frontier enterprise platform reveals just how deep this rabbit hole goes. Their architecture diagram shows layers upon layers: systems of record, business context layers, agent orchestration, permission management, monitoring dashboards. It looks less like a product and more like an enterprise architecture consulting engagement. Which, of course, is exactly what it is now.

3. The Reliability Floor

Here's the math that kills agent dreams:

If an AI agent is 95% accurate on each individual step of a 10-step workflow, its end-to-end reliability is 0.95^10 = 59.87%. That means roughly 4 out of 10 task completions will have at least one error somewhere in the chain.

For a demo? Incredible. For a production system handling customer data, financial transactions, or healthcare records? Career-ending.

Humans tolerate a lot of jank from other humans. We do not extend the same grace to robots. One hallucinated serial number in a customer support interaction does more brand damage than a hundred slow human responses.

4. The Security Time Bomb

Anthropic launched Claude Cowork — their ambitious autonomous agent for non-technical users. Within days, researchers demonstrated a file-stealing prompt injection attack against it.

Days. Not months. Not after some exotic nation-state attack. Days, by security researchers doing exactly what you'd expect security researchers to do.

Now multiply that risk by the number of enterprise systems an agent needs access to. Every integration point is an attack surface. Every permission granted to an agent is a permission that could be exploited. And unlike a compromised employee, a compromised AI agent can operate at machine speed across every system it touches simultaneously.


The Consulting Pivot: Brilliant Business, Damning Admission

Let's be clear about what's happening strategically. OpenAI currently has about 60 consulting engineers plus over 200 in technical support. They're scaling this to hundreds more. This isn't a side hustle — it's becoming a core part of their enterprise go-to-market.

And honestly? It's brilliant business. Professional services have fat margins, create deep customer lock-in, and generate the kind of real-world deployment data that makes models better. Every time an OpenAI engineer sits with a Fortune 500 client and debugs their agent workflow, OpenAI learns something about how to make their next model more enterprise-ready.

But it's also the most damning admission possible about the state of AI agents.

When your product needs a team of engineers from the company that built it to make it work at a basic level, you don't have a product. You have a service that comes with some software. That's not SaaS. That's consulting with better marketing.

The irony is thick enough to cut with a knife. These are the same companies whose pitch decks promise to eliminate the need for human workers. And their first major enterprise move is... hiring hundreds of humans to make the AI work.


What Smart Companies Are Actually Doing

The enterprises that are winning with AI right now aren't the ones with the biggest budgets or the flashiest agent demos. They're the ones who've internalized a deeply unsexy truth: AI works best when you constrain it aggressively.

Start With Augmentation, Not Automation

The companies seeing real ROI from AI agents aren't trying to replace entire workflows. They're inserting AI into specific, well-defined steps where the failure mode is low-stakes and the human oversight is high.

Instead of: "Build an AI agent that handles all customer support."
Try: "Build an AI that drafts response suggestions for human agents to review and send."

The second version is boring. It won't get you a keynote slot at a tech conference. It also actually works, ships in weeks instead of months, and doesn't mix up serial numbers.

Invest in Data Infrastructure Before AI Infrastructure

Most enterprises trying to deploy AI agents discover — painfully and expensively — that their data is a mess. Duplicated, siloed, inconsistent, undocumented. The AI agent isn't the bottleneck. The data the agent needs to be useful is the bottleneck.

Every dollar spent on cleaning up your data layer, building proper APIs, documenting your institutional knowledge, and creating reliable data pipelines will generate more AI ROI than any amount spent on model fine-tuning.

Build Evaluation Before Building Agents

If you can't measure whether your agent is working correctly, you can't deploy it. Full stop.

The enterprises actually succeeding with AI agents have invested heavily in evaluation frameworks before building the agents themselves. They have clear metrics, automated test suites, human review processes, and rollback mechanisms. They treat AI agents like software (because that's what they are) rather than magic (which is what the marketing says they are).

Accept the 80/20 Reality

AI agents are phenomenal at handling the 80% of cases that are routine, well-documented, and follow predictable patterns. They're terrible at the 20% that require judgment, nuance, context, or common sense.

The winning strategy is designing your system so agents handle the 80% autonomously while gracefully escalating the 20% to humans. Not trying to push that number to 100%. Not yet. Maybe not for years.


The Bigger Picture: Hype Meets Gravity

We're witnessing the most predictable correction in recent tech history.

Phase 1 (2023-2024): "AI can do everything!" — VCs throw money, startups multiply, stocks moon.

Phase 2 (2025): "Let's deploy AI everywhere!" — Enterprises buy in, pilots launch, budgets inflate.

Phase 3 (2026 — you are here): "Wait, why doesn't this actually work?" — Reality arrives. OpenAI starts a consulting division.

Phase 4 (2027+): "Okay, here's how AI actually works in practice." — The boring, productive, profitable phase that nobody will write breathless articles about.

We're in the trough of disillusionment, and the view from down here is sobering. But it's also where the real value gets built. The companies that survive Phase 3 — the ones who build real capabilities instead of demo magic — are the ones that will dominate Phase 4.

This is normal. The internet went through it. Mobile went through it. Cloud went through it. The technology is real. The timeline isn't.


What This Means for You

If you're a solopreneur or small team reading this thinking "good, the enterprise AI stuff doesn't affect me" — think again.

The tools you use are built by companies that need enterprise revenue to survive. When enterprise adoption stalls, it affects funding, which affects development, which affects whether your favorite $20/month AI tool exists next year.

More importantly, the same reliability problems that plague enterprise deployments also affect you. Every time your AI agent hallucinates a fact, drops context mid-conversation, or confidently gives you wrong information — that's the same fundamental reliability gap, just at a smaller scale.

The lesson is the same whether you're a Fortune 500 CIO or a one-person startup: use AI for what it's good at today, not what it promises to be good at tomorrow. Build systems, not magic tricks. Constrain aggressively. Verify obsessively. And don't trust anyone who tells you their AI agent "just works."

Including, apparently, OpenAI and Anthropic themselves.


AristoAIStack covers the real state of AI tools — no hype, no BS. If you want to actually use AI effectively instead of burning budget on broken promises, explore our practical guides and agent breakdowns.

Top comments (0)