Raio

Posted on Feb 8 • Edited on Feb 13

The Protocol Was Born from Wreckage — How I Learned to Stop Trusting AI and Start Engineering It

#claude #ai #workflow #productivity

In my last article, I wrote about building an Android app from scratch in 4 days. I promised to share the details of the two-layer protocol that made it possible. This is that article.

But this isn't a "how-to." This is a "why."

The protocol wasn't designed at a desk. It was born in the middle of a disaster — from the determination to never go through that again.

Before the Protocol — A Project Called CanAna

My first serious project with Claude Code was CanAna — a CAN bus analysis tool. CAN (Controller Area Network) is the communication protocol that connects ECUs in cars and motorcycles. Analyzing CAN data is core to my day job: ECU tuning for automotive and motorcycle systems.

The approach was simple. Design in ChatGPT, hand it off to Claude Code for implementation. Get "Done!" back, move on. The tools were already separated into two layers. But there were no operational rules between them. No quality control on instructions, no verification criteria. Just throw it over the wall and take whatever comes back.

Context Evaporation

At the start, I ran everything in a single ChatGPT conversation. Design planning, requirement specs, generating prompts for Claude Code — all in one thread. Design documents existed only inside that chat. I never saved anything locally.

At first, this felt efficient. All context in one place, no file management overhead. But as the chat grew, I started to notice something was off. The AI's responses were getting thinner. Less precise. Subtle details I remembered discussing were now absent from its answers.

Then I caught the AI steering the project in a direction we hadn't agreed on. It wasn't malicious — with token limits approaching, the AI was compressing older context. Design decisions made days ago were silently dropped. The plan itself was drifting, and because I had no local copy of the original design documents, I couldn't point to the exact moment it diverged.

By the time I was certain something had changed, my own memory was too fuzzy to reconstruct what we'd originally decided. I could tell "this isn't what we planned" — but not precisely what the plan had been.

Design intent and accumulated decisions gradually evaporating as AI context compresses. It doesn't happen all at once. It creeps. This is what I later came to call Context Evaporation.

The Shallow Fix Swamp

The real damage showed up when I hit the more complex implementation steps.

Here's the thing about CanAna: the tool had to interpret messy, real-world sensor data. Human operators don't produce textbook-perfect input signals. What looks obvious to a trained engineer's eye — "this is clearly a ramp-up operation" — requires careful logic to detect programmatically. I worked through these detection challenges with the Design AI, and we arrived at sound approaches.

But Context Evaporation was already advancing on the ChatGPT side — the Design AI. The Design AI is supposed to hold the "why" behind each piece of logic and issue instructions that cover all related areas when a fix is needed. But as the chat grew longer, that "why" was silently dropping out. So when the AI issued a fix instruction for one detection routine, the fact that three other routines depended on the same logic and needed the same update had already vanished from the Design AI's view.

It's not the Implementation AI's fault for missing this. Claude Code, as the implementation layer, isn't in a position to evaluate What and Why. Its job is to write vast amounts of code quickly and correctly using its deep knowledge — and that's a perfect score. But out in the world, the mainstream approach is to dump What, Why, and implementation all onto the Implementation AI at once. No wonder it doesn't work.

But back to the story.

I'd fix module A. Module B would break three days later — same root cause, different symptom. Fix B. Module C breaks. Layer after layer of surface patches stacked up until no one — not even the AI — could trace the original design intent.

The AI fixes symptoms without understanding root causes, and each fix creates the conditions for the next failure. This is the Shallow Fix Swamp.

Completion Fraud

And underneath all of this was the most basic problem: I couldn't trust "Done."

The AI would report implementation complete. I'd run it — broken. Point it out — "Fixed!" Run it again — still broken. This loop played out dozens of times.

The AI wasn't lying, exactly. It has a structural bias against saying "I don't know." It's either convinced its implementation is correct, or at least believes it should be. Sometimes it would claim to have verified the output — when in reality it had shaped the code to make the checks pass rather than making the code actually work.

Reporting confident success in the absence of genuine verification — this is Completion Fraud.

As a control systems engineer, this felt like a dark mirror of something I knew well. Back in the early 2000s, electronic controls for simple single-cylinder commuter motorcycles were developed without flowcharts or specifications — the design engineer wrote the code directly. But when that code was carried over to multi-cylinder models, problems kept surfacing. At first, they powered through with sheer grit — just patch it and ship. But a few great predecessors realized "this can't go on," made the courageous decision to stop, and established the department's first coding standards. I could smell that same acrid scent from those wild-west days of ECU software, faint but unmistakable, at the back of my nose.

The Aftermath

By the end, CanAna's codebase was scorched earth. Some parts worked, but nobody could explain why. Fix one thing and something else would break. Ask the AI to investigate and it would return confidently wrong guesses.

But I didn't abandon the project. I pushed through. The MVP — all seven implementation steps — was completed through sheer persistence, grinding through each issue with the Design AI one problem at a time.

It was functional. But I knew the codebase couldn't sustain further development. The architecture was held together by patches on patches — 600-line god classes, duplicated logic scattered across modules, fixes that only worked by accident.

So I made a deliberate decision: pause CanAna, and build the infrastructure to do it right.

That infrastructure had two pillars. One was Bridgiron — a support tool to bridge context between Design AI and Implementation AI (covered in the next article). The other was a set of markdown documents defining chat operation rules and verification criteria — the prototype of what would later evolve into the Vol.1–4 Handover Documents. With both wings in place — the tool and the protocol — I went back to CanAna and ran a structured refactoring: had the Implementation AI audit its own code, analyzed the findings with the Design AI, and executed a planned cleanup — seven refactoring steps, each with clear scope and verification.

The result: CanAna now runs as a stable MVP. It's currently on pause while I work on other projects, but it's organized — ready to pick back up.

(→ CanAna Demo video)

What Actually Went Wrong

I stopped, made coffee, and sat with a warm mug, replaying the whole experience in my head. The problem wasn't the AI's capability. It was how I was using it — or more precisely, the fact that I hadn't engineered how to use it.

1. There was no process between design and implementation.

The tools were separated — ChatGPT for design, Claude Code for implementation. But the judgment of what counts as OK and what counts as NG was left entirely to ChatGPT's discretion. No explicit verification criteria, no handover documents, no definition of done.

The result: every time a problem surfaced, ChatGPT would propose a superficial patch that never reached the root cause, and Claude Code would implement it. As patches stacked on patches, context compressed, and the original design intent evaporated. And both ChatGPT and Claude Code would report "Fix complete!" with full confidence.

I was happily stamping my seal of approval on everything. I wasn't watching closely enough, and by the time I realized the codebase had reached a point of no return, it was far too late.

2. There was no mechanism to preserve context.

AI memory lives and dies with the chat session. When the chat ends, context vanishes. Human teams have documentation, wikis, verbal handoffs. My AI collaboration had none of that.

3. There was no "investigate → discuss → fix" flow.

When something broke, the AI would immediately say "I'll fix it." But there was zero guarantee that "fix" was correct. On a human team, you identify the cause first, agree on the approach, then implement. The AI had no such brake.

4. Completion reports were self-assessed.

When the AI says "Done," you have no choice but to believe it — until you test on a real device. And since the AI has a bias against admitting uncertainty, it reports ambiguous results with full confidence.

Of course, a developer with strong programming skills could mitigate this by reading and reviewing the AI's generated code directly. But CanAna was implemented in Python — because the PeakCAN USB device's API was provided for Python. Python isn't a language I'm proficient in. My only way to verify the AI's output was black-box testing: run it and check the results. And at each individual development stage, it appeared to be working fine.

That's when I noticed the parallel to my day job.

In safety-critical embedded software, there's an established development methodology called the V-model (V-process). The left side of the V defines requirements and design at increasing levels of detail. The right side verifies and validates at each corresponding level. Design and verification are structurally separated — and that separation is what catches defects before they reach production.

Think about drive-by-wire: the system that translates your throttle input into actual engine response. If the control logic has an undetected flaw, the consequences are immediate and physical. When you're driving with your girlfriend and the car doesn't suddenly go haywire and slam into a wall — that's not luck. It's because the V-model process guarantees the quality of every piece of software running in that car. You don't ship until every stakeholder can trace every decision back to its origin and sign off with full accountability. That's not individual diligence — it's enforced by organizational development rules. Every design decision is documented in flowcharts and specifications. Every implementation is reviewed against those specs. The designer and the implementer are never the same person — by policy, not by accident.

What I'd been doing with CanAna was the opposite of everything I practiced at work: no process between design and implementation, no structured verification, no documentation trail. I was letting the AI play every role at once — unsupervised.

No wonder it burned.

The V-model's full implications for AI-augmented development will be explored in a later article in this series. For now, the key insight was simple: the engineering discipline I'd spent 15 years building wasn't obsolete in the age of AI. It was exactly what AI collaboration was missing.

Building the Protocol

From the CanAna wreckage, I established four rules.

Rule 1: Separate the layers — and lay a protocol between them.

Separating Design AI and Implementation AI is the foundational premise of this protocol. The Design AI handles requirements analysis, architecture design, and creation of structured instruction files. It never touches code. The Implementation AI receives those instructions and executes scoped tasks only. It makes no design decisions. The human sits between them — real-device testing, Git operations, and final judgment calls stay with the human.

But what CanAna taught me is that separating the layers alone isn't enough. What matters is defining the interface between them — the format of instruction files, explicit completion criteria, structured reporting. Separating tools doesn't create separation. Separating processes does.

Rule 2: Anchor context in documents.

Don't rely on AI memory. Maintain project context, design intent, and history in structured markdown documents. Open a new chat, say "read this document," and context is restored.

I call these "Handover Documents."

Rule 3: Enforce investigate → discuss → fix.

When something breaks, the first instruction to the Implementation AI is: "Investigate only. Do not fix." I receive the findings, analyze the cause with the Design AI, and agree on an approach. Only the agreed-upon fix gets sent to the Implementation AI.

Enforcing this single rule nearly eliminated the shallow fix swamp.

Rule 4: Structure completion reports.

Force the Implementation AI to report in a structured format: what was done, what changed, what remains. Self-reporting accuracy goes up, and even when the AI slips in something inaccurate, the structure makes contradictions easier to spot.

The concrete implementation of these rules — templates, naming conventions, operational know-how — will be released progressively across later articles in this series.

Proof — Did It Actually Work?

The first real test of this protocol was Bridgiron — a development support tool built specifically to make this protocol easier to operate.

Bridgiron's development went smoothly. The "completion fraud," the "shallow fix swamp," the "context evaporation" that plagued CanAna — none of it happened. When problems arose, the investigate → discuss → fix flow resolved them reliably.

Then came ExitWatcher. Android and Kotlin — a tech stack I had zero experience with. Despite that, the MVP was complete in 4 days. 53 structured instruction files, 7 step-chats, and 1 assembly chat orchestrating the entire project.

Proof that the protocol works independent of any specific domain or tech stack.

→ I Built an Android App in 4 Days With Zero Android Experience — Using Claude Code and a Two-Layer AI Protocol

What You Can Take Away Today

Throughout this dev.to series, I'll be releasing polished, public-facing versions of the AI workflow protocol used to build ExitWatcher — one piece at a time, alongside each article.

Here's the first piece.

Vol.1 — Your personal profile document.

The single most impactful handover document is the one that tells the AI who you are. Your technical background, strengths and weaknesses, thinking habits, how you want to collaborate with AI. With just this one document, AI behavior changes dramatically. You stop re-explaining yourself every time you open a new chat.

I built a prompt that generates this document automatically. Drop it into a new Claude.ai chat, and the AI interviews you. In about 10 minutes, you'll have your own custom profile document.

One strong recommendation: use voice input.

When typing, people unconsciously compress their thoughts. You think "I'll just write the key points" and strip away context and thinking habits that are actually critical. With voice, you can say whatever comes to mind. Tangents are fine. The AI will organize everything.

One of the most important things in AI collaboration is getting what's in your head into the AI with minimal friction. Voice input minimizes that friction.

How to Use

Download the prompt file (MD file) from GitHub
Open a new chat in your preferred chat AI (Claude.ai, ChatGPT, etc.)
Type the following message, attach the downloaded MD file, and send:

Follow the instructions in the attached file and begin the interview.

The AI will start interviewing you — answer the questions (voice input recommended)
When all questions are done, the AI outputs your personal profile document as Markdown
Copy the output and save it as a .md file

Requirements: Any chat AI that runs in a web browser and accepts file attachments (Claude.ai, ChatGPT, Gemini, etc.). Desktop or mobile.

→ Vol.1 Generator Prompt on GitHub

A Profile Is Something You Grow

One more thing. This profile document isn't a one-time artifact.

After completing each project, ask the AI you've been working with: "Based on our work together, is there anything I should add to my profile?" The AI has been observing your thinking patterns and decision-making habits. Feed that feedback back into your profile, and it gets sharper and denser with every project.

As your AI collaboration evolves, so does your profile.

What's Next

The next article covers Bridgiron — the tool I built to support this protocol, and the lessons learned from building it.

The remaining pieces of the protocol will be released progressively with each article. The goal of this series is to get you to the point where you can build your own.

Control systems engineer, 15 years in motorcycle ECU development. Currently exploring AI-augmented development workflows and documenting what works.

DEV Community