DEV Community

Cover image for handoff: Keeping AI Coding Sessions on Track
Semih ERDOGAN
Semih ERDOGAN

Posted on • Edited on • Originally published at semiherdogan.net

handoff: Keeping AI Coding Sessions on Track

AI coding tools are very good at helping with the next step.

They are much worse at reliably carrying the full thread of a feature across multiple sessions.

That was the first problem I kept running into.

I would start a feature with Claude, ChatGPT, Copilot, or another assistant, make real progress, then step away. When I came back, I had the same questions again:

  • What exactly was I building?
  • What decisions had already been made?
  • What was the current step?
  • What should happen next?

The model had no memory. I had partial memory. The repo had some memory. None of it was structured enough.

But continuity was only part of the issue.

The deeper problem was that intent, requirements, decisions, current progress, and validation evidence were scattered across chat history and code changes.

So I built handoff.

What handoff is

handoff is a local-first CLI for structured AI coding workflows.

It creates a small workspace inside your repository under .handoff/ and uses plain Markdown files as the source of truth for the feature you are working on.

The core workflow now revolves around six files:

  • FEATURE.md: raw feature intent
  • SPEC.md: normalized requirements and acceptance criteria
  • DESIGN.md: optional technical design
  • DECISIONS.md: durable product and architecture decisions
  • STATE.md: execution plan, progress, and evidence
  • SESSION.md: continuation-safe session summary

No cloud sync. No provider lock-in. No hidden agent runtime.

Just files, prompts, and a deterministic workflow from intent to execution.

The real problem was not only memory

At first, I thought continuation was the only missing piece.

After enough AI-assisted coding sessions, I realized there were actually several different problems:

  • the assistant forgets where the work stopped
  • the assistant starts implementation before the work is decomposed well enough
  • important decisions lose their reasoning
  • completed work lacks evidence
  • implementation quietly drifts away from the spec

Those later problems matter just as much.

If the feature is vague, the assistant tends to improvise. Sometimes that works. Sometimes it creates drift, partial implementation, or too much code too early.

That is why handoff is intent-aware, planning-aware, decision-aware, and continuation-aware.

The workflow I wanted

I wanted something with the strengths of structured decomposition, but without forcing people into one IDE or one proprietary workflow.

I also did not want a system where users had to manually juggle several tools just to start a feature.

So the result became a hybrid workflow:

  • simple when you want speed
  • explicit when you want control

The default path

For most features, the flow starts like this:

handoff init payment-integration
Enter fullscreen mode Exit fullscreen mode

Then you edit .handoff/current/FEATURE.md with the feature request, requirements, and constraints.

After that:

handoff run --copy
Enter fullscreen mode Exit fullscreen mode

That is now the default entry point.

handoff run looks at the saved workspace state and decides what prompt should come next:

  • if planning is incomplete, it emits a planning prompt
  • if the execution plan is ready, it emits an execution prompt
  • if execution is already underway, it emits a continuation prompt

That keeps the simple path simple.

You can also use:

handoff next
handoff status
Enter fullscreen mode Exit fullscreen mode

to inspect what the tool thinks should happen next without generating another prompt.

The advanced path

Sometimes you want to review the planning before any code is written.

For that, handoff also has:

handoff spec --copy
handoff design --copy
handoff tasks --copy
Enter fullscreen mode Exit fullscreen mode

Those commands let you inspect the work in stages:

  • spec: turn feature intent into clear requirements
  • design: map those requirements to a practical implementation approach
  • tasks: generate an execution-ready task list in STATE.md

Then you can run:

handoff start --copy
Enter fullscreen mode Exit fullscreen mode

and begin implementation from a cleaner plan.

That distinction matters:

  • run is the default state-aware entry point
  • start is for direct execution when a valid plan already exists

There is also a closing-loop command:

handoff drift --copy
Enter fullscreen mode Exit fullscreen mode

handoff drift does not modify code. It generates a structured audit prompt that asks an assistant to compare the saved intent, spec, design, decision log, state, session summary, and implementation.

That gives you a final check before you call the feature done.

Why the file split matters

The value is not just "more files."

The value is that each file has one job:

  • FEATURE.md captures intent
  • SPEC.md captures what must be true
  • DESIGN.md captures how to approach it
  • DECISIONS.md captures why durable choices were made
  • STATE.md captures what is being done right now
  • STATE.md also captures evidence for completed steps
  • SESSION.md captures what the next session must know

That separation gives the assistant better footing.

It also gives you better reviewability. You can inspect the spec before implementation, challenge the design before code, review decisions later, and check whether the task list actually matches the feature.

Decisions are part of the memory

Requirements are not the only thing worth preserving.

In long-running work, the more expensive loss is often decision history:

  • Why did we choose this approach?
  • What alternatives did we reject?
  • Is this decision still valid?
  • Should a future assistant re-open this topic?

That is why new feature workspaces include DECISIONS.md.

It is intentionally lightweight. It is not for every small implementation detail. It is for durable product or architecture choices that future sessions should not re-litigate without new evidence.

Evidence changes the execution loop

AI-assisted coding often ends with a vague claim that work is done.

That is not enough.

The default execution prompts now ask the assistant to record evidence in STATE.md after completed micro-steps:

  • changed files
  • commands or tests run
  • result
  • notes or remaining risks

That turns the loop from:

Task -> "done"
Enter fullscreen mode Exit fullscreen mode

into:

Task -> code -> evidence
Enter fullscreen mode Exit fullscreen mode

It is still simple Markdown, but it gives you something concrete to review.

Drift audit before closing

The last failure mode is silent drift.

A feature can have a good spec, a reasonable plan, and passing tests while still missing part of the original intent.

handoff drift --copy exists for that moment.

It emits a prompt for an audit, not an automatic verdict. That distinction is important. The CLI stays deterministic and provider-agnostic; the assistant does the code inspection using the saved artifacts as the checklist.

The goal is to catch mismatches like:

  • a requirement in SPEC.md that never reached implementation
  • an accepted decision in DECISIONS.md that code ignored
  • a completed task in STATE.md without matching evidence
  • a session summary that no longer reflects the repo

Why determinism matters

One of the design goals of handoff is determinism.

That is why handoff continue is guarded.

If the execution plan is invalid, the command fails with a deterministic error instead of pretending everything is fine.

Examples:

  • no execution plan initialized
  • multiple current [>] steps
  • no remaining steps

That behavior is deliberate.

I do not want a workflow that silently fixes state by guessing. I want the handoff to be inspectable and stable.

A concrete example

Imagine I am adding a payment flow.

I might start with:

handoff init payment-flow
Enter fullscreen mode Exit fullscreen mode

Then in FEATURE.md:

  • support Stripe checkout
  • keep existing order flow intact
  • show clear errors
  • do not refactor unrelated modules

From there I have two options.

Fast path:

handoff run --copy
handoff next
Enter fullscreen mode Exit fullscreen mode

Reviewable path:

handoff spec --copy
handoff design --copy
handoff tasks --copy
handoff start --copy
Enter fullscreen mode Exit fullscreen mode

Once work is in motion, I can continue with:

handoff continue --copy
Enter fullscreen mode Exit fullscreen mode

The next session gets a prompt grounded in the existing state, not a vague memory of yesterday.

Before closing the work, I can ask for a drift audit:

handoff drift --copy
Enter fullscreen mode Exit fullscreen mode

That prompt checks whether the implementation still matches the saved intent.

Better continuity is not only about prompts

One thing I like about the current shape of handoff is that it is not only a prompt generator anymore.

It also helps surface state, decisions, evidence, and drift.

That is why commands like these matter:

  • handoff status
  • handoff next
  • handoff validate

They make the saved workflow visible instead of burying it inside one long prompt.

That is important when you are trying to answer simple questions like:

  • Is this feature ready for execution?
  • What is the current step?
  • Why is the workflow blocked?
  • What evidence exists for completed work?
  • Did implementation drift from the spec or decisions?
  • What command should I run next?

Why local-first still matters

I wanted this to work with any coding assistant.

That meant the core could not depend on one provider, one editor, or one hosted workflow system.

So handoff stays local-first:

  • Markdown files in your repository
  • prompt generation from a CLI
  • no provider dependency in the core flow
  • no cloud requirement

You can use it with ChatGPT today, Claude tomorrow, Copilot later, or another tool entirely.

The workflow stays yours.

Why repository context matters too

Feature state is only part of the story.

Repository context matters too.

If your README.md and AGENTS.md are missing, stale, or too thin, AI sessions still waste time rediscovering the same project facts.

That is why handoff init can flag missing high-value context, and why there is also a handoff prompt context flow for improving repo-level guidance without writing application code.

That may sound small, but it matters in practice.

A feature plan works much better when the surrounding repository is legible.

Why I still like the CLI model

There is a lot of value in editor-native workflows, and I may support more of them over time.

But the CLI model has one major strength: portability.

The moment a workflow depends too heavily on one IDE, it becomes harder to reuse across tools, harder to debug, and harder to trust.

handoff keeps the source of truth in files you can inspect directly.

That makes it easier to reason about, easier to version, and easier to carry across environments.

Who this is for

handoff is a good fit if:

  • you build features across multiple AI sessions
  • you switch between assistants or editors
  • you want better task decomposition before coding
  • you want decision history outside chat
  • you want evidence attached to completed work
  • you want drift checks before closing features
  • you care about deterministic state and inspectable workflow
  • you prefer local tools over hosted orchestration

It is probably not for you if you want a full IDE platform with built-in visual workflow management and heavy automation everywhere.

That is fine. The tool is intentionally narrower than that.

The short version

handoff gives AI coding workflows intent, memory, decisions, evidence, and a deterministic continuation path without taking ownership of your editor or your repository.

That is the whole idea.

Try it

If you want to try the current default flow:

handoff init my-feature
# edit .handoff/current/FEATURE.md
handoff run --copy
handoff next
handoff status
handoff drift --copy
Enter fullscreen mode Exit fullscreen mode

If you want to inspect planning in stages first:

handoff init my-feature
# edit .handoff/current/FEATURE.md
handoff spec --copy
handoff design --copy
handoff tasks --copy
handoff start --copy
handoff continue --copy
handoff drift --copy
Enter fullscreen mode Exit fullscreen mode

Project link:

If you are building with AI every day, you already know the pain this solves.

The interesting part is not that the model needs context.

It is that context needs structure.

Top comments (0)