Semih ERDOGAN

Posted on May 8 • Edited on Jun 1 • Originally published at semiherdogan.net

handoff: Keeping AI Coding Sessions on Track

#ai #cli #productivity #tooling

AI coding tools are very good at helping with the next step.

They are much worse at reliably carrying the full thread of a feature across multiple sessions.

That was the first problem I kept running into.

I would start a feature with Claude, ChatGPT, Copilot, or another assistant, make real progress, then step away. When I came back, I had the same questions again:

What exactly was I building?
What decisions had already been made?
What was the current step?
What should happen next?

The model had no memory. I had partial memory. The repo had some memory. None of it was structured enough.

But continuity was only part of the issue.

The deeper problem was that intent, requirements, decisions, current progress, and validation evidence were scattered across chat history and code changes.

So I built handoff.

What handoff is

handoff is a local-first CLI for structured AI coding workflows.

It creates a small workspace inside your repository under .handoff/ and uses plain Markdown files as the source of truth for the feature you are working on.

The core workflow now revolves around six files:

FEATURE.md: raw feature intent
SPEC.md: normalized requirements and acceptance criteria
DESIGN.md: optional technical design
DECISIONS.md: durable product and architecture decisions
STATE.md: execution plan, progress, and evidence
SESSION.md: continuation-safe session summary

No cloud sync. No provider lock-in. No hidden agent runtime.

Just files, prompts, and a deterministic workflow from intent to execution.

The real problem was not only memory

At first, I thought continuation was the only missing piece.

After enough AI-assisted coding sessions, I realized there were actually several different problems:

the assistant forgets where the work stopped
the assistant starts implementation before the work is decomposed well enough
important decisions lose their reasoning
completed work lacks evidence
implementation quietly drifts away from the spec

Those later problems matter just as much.

If the feature is vague, the assistant tends to improvise. Sometimes that works. Sometimes it creates drift, partial implementation, or too much code too early.

That is why handoff is intent-aware, planning-aware, decision-aware, and continuation-aware.

The workflow I wanted

I wanted something with the strengths of structured decomposition, but without forcing people into one IDE or one proprietary workflow.

I also did not want a system where users had to manually juggle several tools just to start a feature.

So the result became a hybrid workflow:

simple when you want speed
explicit when you want control

The default path

For most features, the flow starts like this:

handoff init payment-integration

Then you edit .handoff/current/FEATURE.md with the feature request, requirements, and constraints.

After that:

handoff run --copy

That is now the default entry point.

handoff run looks at the saved workspace state and decides what prompt should come next:

if planning is incomplete, it emits a planning prompt
if the execution plan is ready, it emits an execution prompt
if execution is already underway, it emits a continuation prompt

That keeps the simple path simple.

You can also use:

handoff next
handoff status

to inspect what the tool thinks should happen next without generating another prompt.

The advanced path

Sometimes you want to review the planning before any code is written.

For that, handoff also has:

handoff spec --copy
handoff design --copy
handoff tasks --copy

Those commands let you inspect the work in stages:

spec: turn feature intent into clear requirements
design: map those requirements to a practical implementation approach
tasks: generate an execution-ready task list in STATE.md

Then you can run:

handoff start --copy

and begin implementation from a cleaner plan.

That distinction matters:

run is the default state-aware entry point
start is for direct execution when a valid plan already exists

There is also a closing-loop command:

handoff drift --copy

handoff drift does not modify code. It generates a structured audit prompt that asks an assistant to compare the saved intent, spec, design, decision log, state, session summary, and implementation.

That gives you a final check before you call the feature done.

Why the file split matters

The value is not just "more files."

The value is that each file has one job:

FEATURE.md captures intent
SPEC.md captures what must be true
DESIGN.md captures how to approach it
DECISIONS.md captures why durable choices were made
STATE.md captures what is being done right now
STATE.md also captures evidence for completed steps
SESSION.md captures what the next session must know

That separation gives the assistant better footing.

It also gives you better reviewability. You can inspect the spec before implementation, challenge the design before code, review decisions later, and check whether the task list actually matches the feature.

Decisions are part of the memory

Requirements are not the only thing worth preserving.

In long-running work, the more expensive loss is often decision history:

Why did we choose this approach?
What alternatives did we reject?
Is this decision still valid?
Should a future assistant re-open this topic?

That is why new feature workspaces include DECISIONS.md.

It is intentionally lightweight. It is not for every small implementation detail. It is for durable product or architecture choices that future sessions should not re-litigate without new evidence.

Evidence changes the execution loop

AI-assisted coding often ends with a vague claim that work is done.

That is not enough.

The default execution prompts now ask the assistant to record evidence in STATE.md after completed micro-steps:

changed files
commands or tests run
result
notes or remaining risks

That turns the loop from:

Task -> "done"

into:

Task -> code -> evidence

It is still simple Markdown, but it gives you something concrete to review.

Drift audit before closing

The last failure mode is silent drift.

A feature can have a good spec, a reasonable plan, and passing tests while still missing part of the original intent.

handoff drift --copy exists for that moment.

It emits a prompt for an audit, not an automatic verdict. That distinction is important. The CLI stays deterministic and provider-agnostic; the assistant does the code inspection using the saved artifacts as the checklist.

The goal is to catch mismatches like:

a requirement in SPEC.md that never reached implementation
an accepted decision in DECISIONS.md that code ignored
a completed task in STATE.md without matching evidence
a session summary that no longer reflects the repo

Why determinism matters

One of the design goals of handoff is determinism.

That is why handoff continue is guarded.

If the execution plan is invalid, the command fails with a deterministic error instead of pretending everything is fine.

Examples:

no execution plan initialized
multiple current [>] steps
no remaining steps

That behavior is deliberate.

I do not want a workflow that silently fixes state by guessing. I want the handoff to be inspectable and stable.

A concrete example

Imagine I am adding a payment flow.

I might start with:

handoff init payment-flow

Then in FEATURE.md:

support Stripe checkout
keep existing order flow intact
show clear errors
do not refactor unrelated modules

From there I have two options.

Fast path:

handoff run --copy
handoff next

Reviewable path:

handoff spec --copy
handoff design --copy
handoff tasks --copy
handoff start --copy

Once work is in motion, I can continue with:

handoff continue --copy

The next session gets a prompt grounded in the existing state, not a vague memory of yesterday.

Before closing the work, I can ask for a drift audit:

handoff drift --copy

That prompt checks whether the implementation still matches the saved intent.

Better continuity is not only about prompts

One thing I like about the current shape of handoff is that it is not only a prompt generator anymore.

It also helps surface state, decisions, evidence, and drift.

That is why commands like these matter:

handoff status
handoff next
handoff validate

They make the saved workflow visible instead of burying it inside one long prompt.

That is important when you are trying to answer simple questions like:

Is this feature ready for execution?
What is the current step?
Why is the workflow blocked?
What evidence exists for completed work?
Did implementation drift from the spec or decisions?
What command should I run next?

Why local-first still matters

I wanted this to work with any coding assistant.

That meant the core could not depend on one provider, one editor, or one hosted workflow system.

So handoff stays local-first:

Markdown files in your repository
prompt generation from a CLI
no provider dependency in the core flow
no cloud requirement

You can use it with ChatGPT today, Claude tomorrow, Copilot later, or another tool entirely.

The workflow stays yours.

Why repository context matters too

Feature state is only part of the story.

Repository context matters too.

If your README.md and AGENTS.md are missing, stale, or too thin, AI sessions still waste time rediscovering the same project facts.

That is why handoff init can flag missing high-value context, and why there is also a handoff prompt context flow for improving repo-level guidance without writing application code.

That may sound small, but it matters in practice.

A feature plan works much better when the surrounding repository is legible.

Why I still like the CLI model

There is a lot of value in editor-native workflows, and I may support more of them over time.

But the CLI model has one major strength: portability.

The moment a workflow depends too heavily on one IDE, it becomes harder to reuse across tools, harder to debug, and harder to trust.

handoff keeps the source of truth in files you can inspect directly.

That makes it easier to reason about, easier to version, and easier to carry across environments.

Who this is for

handoff is a good fit if:

you build features across multiple AI sessions
you switch between assistants or editors
you want better task decomposition before coding
you want decision history outside chat
you want evidence attached to completed work
you want drift checks before closing features
you care about deterministic state and inspectable workflow
you prefer local tools over hosted orchestration

It is probably not for you if you want a full IDE platform with built-in visual workflow management and heavy automation everywhere.

That is fine. The tool is intentionally narrower than that.

The short version

handoff gives AI coding workflows intent, memory, decisions, evidence, and a deterministic continuation path without taking ownership of your editor or your repository.

That is the whole idea.

Try it

If you want to try the current default flow:

handoff init my-feature
# edit .handoff/current/FEATURE.md
handoff run --copy
handoff next
handoff status
handoff drift --copy

If you want to inspect planning in stages first:

handoff init my-feature
# edit .handoff/current/FEATURE.md
handoff spec --copy
handoff design --copy
handoff tasks --copy
handoff start --copy
handoff continue --copy
handoff drift --copy

Project link:

handoff on GitHub

If you are building with AI every day, you already know the pain this solves.

The interesting part is not that the model needs context.

It is that context needs structure.

DEV Community