It demoed perfectly.
Clean output, every case green, the kind of run that makes you screenshot the terminal.
Then it reached production and fell apart inside a day.
What followed was the worst kind of debugging. The agent would return something wrong, and I could not tell why. Was it the prompt. Was it a timeout on a tool call. Was it stale data. Was it a retry firing twice and confusing the next step. I spent far longer staring at that agent than I ever spent building it.
That pain was on the dev.to front page this week. someone wrote about spending ten times longer debugging AI code than writing it. I felt that one in my spine.
Here is what I eventually found. one layer of my agent was quietly doing three jobs at once.
Most agent failures that people blame on the model are not reasoning failures at all. They are IO failures wearing a reasoning costume. The model looked broken because the layer around it was broken, and the two were tangled into the same lump of code.
As a Mastra Agent Ambassador I now build every agent around one promise. each layer does exactly one job. Three layers, three responsibilities, nothing crossing a line it should not.
Reasoning. the layer that decides.
This is the model. Its only job is to think and choose. Given clean inputs, what is the next step.
It should never fetch data. It should never retry a failed call. It should never format a payload for a downstream system. The moment reasoning starts doing IO, you can no longer tell a bad decision apart from a bad fetch, and your debugging time doubles overnight.
Keep this layer boring. Inputs in, a decision out.
IO. the layer that talks to the world.
Every call to an API, a database, a tool, a file. All of it lives here, and all of it is deterministic.
This is where retries belong. Where timeouts belong. Where validation belongs. Where the guardrails belong, the checks that clean what comes in before the model ever sees it, and what goes out before a user ever does. Mastra gives you input and output processors for exactly this, a home for that discipline so it is not smeared across your prompts.
When IO is its own layer, a failed call reads as a failed call. It surfaces as an IO error, not as a confusing model answer. That one separation is what gave me my nights back.
Orchestration. the layer that sequences.
This is the loop. It holds state, decides when to call reasoning and when to call IO, manages approvals, and knows when the task is finished.
It does not think, that belongs to reasoning. It does not fetch, that belongs to IO. It conducts.
When orchestration is thin and explicit, you can read your agent like a short story. step, check, step, check. When it is tangled with the other two, you get the spaghetti I shipped the first time.
Why the framework alone will not save you.
Here is the opinion I will defend. the framework is not the thing that keeps you clean. the layer discipline is.
You can write the same spaghetti inside Mastra that you would write with nothing at all. The tools make the right structure easy, but they do not force it on you. If you let reasoning fetch, and IO decide, and orchestration think, no framework on earth will make that agent debuggable.
The discipline stays yours to keep. The framework only makes keeping it cheaper.
What changed after I split them.
The next agent I built with the three layers held apart was not smarter. The model was the same model. But when something broke, I knew within a minute which box to open. A wrong answer was a reasoning issue. A missing value was an IO issue. A step out of order was orchestration. The fog was gone.
That is the whole payoff. not a cleverer agent, a debuggable one. And a debuggable agent is the only kind you can run in production without dreading the pager.
If you are building your first real agent, do not start with the prompt. start by drawing three boxes and promising yourself that nothing leaks across a line it should not own. It is the least glamorous decision you will make, and the one that saves you the most sleep.
Your turn
Which layer do you overload by accident most, reasoning, IO, or orchestration.
If this was useful
I work through this in public, the wins and the freezes both, mostly on LinkedIn and YouTube. If the real version of building agents in the open is useful to you, that is where it lives. LinkedIn, YouTube and X under Mirza Iqbal, and the work at next8n.com.
Top comments (1)
The reasoning/IO/orchestration cut is the right one, but there's a failure mode it doesn't catch on its own: orchestration quietly becoming a fourth IO layer. Retry counts, last-error, partial results, idempotency keys — that state has to live somewhere, and "the loop" is where it accumulates by default. You end up with three boxes on the diagram and two-and-a-half in the code.
The test I trust more than "did I draw three boxes" is replayability: can I feed the reasoning layer a recorded input set, offline, every tool stubbed, and get the identical decision? If not, IO state is still entangled even when the call sites look separated — reasoning is reading something the orchestrator stashed.
The other line I'd add lives inside your IO layer: separate performing an effect from attesting it. A failed call reads as a failed call to you because you can see the IO error — but the agent's own report of "success" to anything downstream is the agent grading its own homework. If the check that an effect happened runs on the same code path that produced it, a confident-but-wrong success is indistinguishable from a real one. That's the same spaghetti, just moved one hop out. Cheapest fix I've found is making the effect and its verification separately readable, so something that didn't perform the action can re-derive whether it actually happened.