KAIzen — What Agile Needs for the AI Era

#ai #agile #scrum #productivity

How a small team at a gaming company went from 32% flow efficiency to 85% — by changing what we gave the AI

Our team was running Scrum by the book. Two-week sprints. Grooming. Planning poker. Retros. By every conventional measure, we were doing Agile correctly.

Then I measured our flow efficiency — the ratio of active work time to total elapsed time — and it was 32%. For every hour on the clock, we were actively working for about 19 minutes. The rest was waiting. Waiting for grooming. Waiting for clarification. Waiting for review. Waiting to align on what the story actually meant.

Industry average for software teams is 15-25%. We were above average. But "above average at wasting time" isn't a metric anyone puts on a slide.

What made it worse was that we'd started using AI coding assistants. The promise was faster delivery. The reality was faster code generation — but the code was often wrong, because the input was vague. A user story that says "As a user, I want to receive rewards so that I feel valued" gives a human enough context to ask smart questions. It gives an AI enough context to hallucinate confidently.

AI didn't just speed up coding. It moved the bottleneck. The bottleneck was no longer "how fast can we write code?" It became "how precisely can we define what we want?" And our entire process was optimized for a world where humans were the bottleneck. That world was gone.

I should be honest: I didn't call what followed "KAIzen" at the time. We didn't have names for any of it. We just started changing how we worked. The vocabulary in this post — Blueprint, Runbook — came later, to make the patterns shareable. The work was real. The naming is an afterthought.

The Inspiration — And Why I Needed Something Different

I wasn't starting from zero. Amazon's AI-DLC (AI-Driven Development Lifecycle) was a major inspiration. AWS had shown that spec-driven, AI-augmented development could work at scale. But when I looked at applying it to my team, the cost was high: the AI-DLC replaces your entire development process. New phases, new roles, new artifacts, new way of working from the ground up.

We didn't have that luxury. We were mid-sprint, mid-quarter, mid-delivery. I needed something that could plug into our existing process — not replace it. Where the AI-DLC asks you to change everything, I wanted to change one thing: the quality of our input to AI. Keep our sprints, keep our board, add a layer on top.

I now call this approach KAIzen, from kaizen (改善) — the Japanese philosophy of continuous improvement. Small changes, led by the people who do the work. KAIzen applies that principle with AI as the lever. Not a new methodology. Not a process overhaul. A layer you add on top of whatever Agile process you already run.

Specification as the Primary Lever

The turning point was small. Instead of writing a user story, I wrote a detailed engineering spec for a feature — inputs, outputs, edge cases, constraints, acceptance criteria. I fed it to our AI assistant and the generated code was review-ready on the first pass.

The previous feature — similar complexity, described as a user story — had taken three rounds of review, two Slack threads, and a sync meeting. Same AI. Same team. The difference was entirely in the input.

The spec is the product now. Not the code. The quality of your specification determines the quality of everything that follows.

I call this a Blueprint — a structured spec precise enough for AI to build against. For complex work, you also need a Runbook — an ordered implementation plan derived from the Blueprint. For a small fix, a lightweight Blueprint is enough.

Here's the part that changes the adoption story: the AI agent drafts the Blueprint. The product owner gives us a feature brief — goals, context, user needs. We feed that brief to our GitHub Copilot custom agent (we call it SpecKit), and it generates a first draft of the Blueprint: inputs, outputs, edge cases, constraints, acceptance criteria.

But the draft isn't the artifact — the reviewed Blueprint is. A developer still spends real time reviewing and refining it, often up to two hours for complex features. That investment is the point. A precise Blueprint is what makes the Runbook coherent and the AI-generated code review-ready. The agent removes the blank-page problem and gets you 70% of the way there. The developer's judgment closes the last 30% — and that's where the quality lives.

Over time, something unexpected happened. Our product owner started using the same agent to write the feature brief itself — structuring it so the downstream Blueprint would be cleaner. The whole chain tightened: better brief → better Blueprint → better Runbook → better AI-generated code → fewer review cycles. The agent didn't just help developers. It pulled the entire team toward precision.

What Dissolved

We didn't decide to stop doing Scrum. We just started writing Blueprints inside our sprints. But things dissolved on their own. Grooming became redundant — the Blueprint already answered every question grooming was designed to surface. Estimation stopped making sense — spec-driven work is inherently scoped. Sprint planning became just prioritization: "which Blueprints next?"

We didn't switch to Kanban. We just stopped needing the ceremonies that were solving problems the Blueprint solved better. What survived: prioritization, standups, retros. Whether you call the outer loop Scrum or Kanban stops mattering. The inner loop — spec-first, AI-augmented — is what drives results.

This is the core difference from the AI-DLC approach: we didn't need anyone's permission to start. No process overhaul. No new roles. No org-wide buy-in. One team, one Blueprint, one sprint. The layer proved itself through results, not a proposal deck.

The Numbers

Three epics, same area, similar complexity. Flow efficiency: 32% → 47% → 85%. Cycle time: 36 days → 36 days → 13 days. The active work time barely changed. What collapsed was the waiting — grooming, clarification, alignment overhead that was invisible inside sprint velocity.

The caveats: three epics is a signal, not a proof. They weren't identical in scope. The team was small and I was coaching directly. I'd rather you hear these caveats from me. Three data points isn't proof. It's a signal worth investigating.

What I Learned

The Blueprint is the new bottleneck — but it's a better bottleneck. With SpecKit drafting the first pass, the blank-page problem is gone. But the review still takes real time, and it should — that's where engineering judgment lives. The developer's job shifts from "write the spec from scratch" to "validate and sharpen the spec," which is a better use of their expertise.

Not everyone wants to write specs. Resistance collapses after one demonstration. Show a developer AI output from a vague story next to AI output from a good Blueprint. After that, most people write the spec — not because of a process argument, but because it makes their afternoon easier.

This is kaizen — continuous improvement, from the ground up. We changed one thing, measured what happened, and kept improving.

But there's a limit to what one team can achieve alone. Our flow efficiency hit 85% within our area. Then we got an initiative spanning Gaming, Rewards, and Sportsbook — and suddenly our speed didn't matter. Blocked by another team's API. Debating event schemas in Slack. Sitting in alignment meetings where six people discussed what two could have decided in a DM.

One team improving means nothing if features get stuck at the boundary. That's Part 2.

Part 2: "KAIzen Across Boundaries" — coming next week.

Want to try it today? Pick your next feature. Feed your product brief to an AI assistant and ask it to generate a spec — inputs, outputs, edge cases, constraints, acceptance criteria. Refine it. Build against it. Measure your flow efficiency before and after. One spec. See what happens.

#ai #agile #softwareengineering #productivity