This is a submission for the GitHub Finish-Up-A-Thon Challenge
I wired edge-context-mode into my own Claude Code setup. Then I stopped using it. Not because it didn't work — because I didn't understand what I'd built.
Six weeks later, this challenge made me come back. What I found: a tool that was half-finished, a Durable Object secretly lying to me, and — when I finally looked at the code honestly — something actually worth finishing.
What I Built
The problem is simple to describe and annoying to live with.
You start a session with Claude. You read a file, run a command, ask a question. Twenty minutes in the answers get worse. Forty minutes in it's forgotten the context. An hour in you're hitting limits and starting over.
The cause: raw output floods the context window. A cat on a 500-line file puts 500 lines in context. npm list adds 200 more. git log adds more. The context fills with output the LLM will never reference again, and the things that actually matter — decisions, architecture, what you chose and why — get pushed out.
edge-context-mode intercepts that:
Normal: cat large-file.ts → 500 lines flood into context
edge-context: ctx_execute(...) → [ctx:ab3f9x] + "12 line(s): interface User..."
The raw output goes to Cloudflare D1 at the edge. The LLM gets a reference token and a 50-word summary. Context stays clean. Sessions stay coherent.
Search is hybrid: D1's FTS5 gives you BM25 keyword matching out of the box. Pair it with vectorize-mcp-worker — another tool I built — and ctx_search runs semantic vector search on top. Same stored data, same reference tokens. The retrieval layer just gets smarter.
That's the design. What was actually shipped in April was a different story.
This directly solves the compaction problem
If you've used Claude Code for a long session, you've probably seen this:
"Anyone else notice that compaction seems to lose more details than normal? It never seemed to matter before, but I'm seeing it frequently now."
That's the same problem, one layer up. When Claude Code hits context limits, it compacts — auto-summarises the conversation to make room. The details that disappear are the exact things that matter: error messages from 30 minutes ago, what was tried and failed, the architectural choice that explains why the code looks the way it does.
They disappear because they were sitting raw in the context window. Compaction summarises them aggressively and the specifics are gone.
edge-context-mode attacks this in two ways. First, prevention: every ctx_execute call keeps raw output out of the context entirely — only a 50-word summary and a reference token go in. Less in context means compaction triggers less often. Second, survival: everything stored via ctx_execute and ctx_annotate lives in D1, outside the context. Compaction can't touch it. After compaction wipes your conversation:
ctx_history(session_id: "myproject-2026-05-22")
→ full chronological list of everything that happened, pulled from D1
ctx_reflect(session_id: "myproject-2026-05-22")
→ "Session has 14 entries over ~47 min. Fixed D1 FK constraint, added
ctx_get tool, updated README, decided 512KB cap on raw_output..."
The session memory didn't compact. Only the conversation did.
The same day I wrote this, this appeared on X:
@ankkala: "There should be an entire new class of LLM dedicated just to compaction." My reply: "Compression without loss is the oldest hard problem in cognition. Summarization fails the same way bad thinking fails — not because the words are wrong, but because the underlying structure was never identified in the first place. A specialized compaction model doesn't fix that. It just obscures where the reasoning broke down."
The argument for a dedicated compaction model assumes the problem is compression efficiency. It isn't. The problem is that the wrong things are in the context in the first place. A better compressor produces confident-sounding summaries of the wrong things. edge-context-mode doesn't make compaction smarter — it reduces what has to be compacted at all.
One caveat, now partially lifted: edge-context-mode is an MCP server. When I wrote this it was wired only into Claude Code. By the time I finished, I'd registered it in VS Code and GitHub Copilot discovered all 8 tools automatically — same server, zero changes to the code. ChatGPT and Gemini still require function-calling adapters (v1.1). But "Claude Code only" undersold it from the start.
Demo
GitHub: https://github.com/dannwaneri/edge-context-mode
Release: https://github.com/dannwaneri/edge-context-mode/releases/tag/v1.0.0
Local setup — 4 commands, no cloud account:
git clone https://github.com/dannwaneri/edge-context-mode
cd edge-context-mode
npm install
npm run migrate:local
npm run local
Register with Claude Code:
claude mcp add edge-context-mode -- node /path/to/edge-context-mode/src/local.ts
What a session looks like now:
# Run a command — only a reference token enters context
ctx_execute("node -e \"require('./package.json').dependencies\"", "check deps")
→ [ctx:kx92ma3b1p]
→ 1 line(s): { hono: '^4.7.11', '@modelcontextprotocol/sdk': '^1.12.0'...
# Need the actual output? Pull it by reference
ctx_get("[ctx:kx92ma3b1p]")
→ ref: [ctx:kx92ma3b1p]
→ summary: 1 line(s): { hono: '^4.7.11'...
→ --- raw output ---
→ { hono: '^4.7.11', '@modelcontextprotocol/sdk': '^1.12.0', ... }
# Save a decision without running a command
ctx_annotate("decided to cap raw_output at 512KB — D1 row limit is ~1MB, leaving headroom")
→ [ctx:mw71nx4d2q]
# Search past context semantically
ctx_search("D1 storage decisions")
→ [ctx:mw71nx4d2q] [score:1.84] annotation: decided to cap raw_output at 512KB...
# Health check
ctx_doctor
→ { "d1": "ok", "execution_mode": "local-stdio", "vectorize_mcp": "configured", "sessions": 6, "entries": 9 }
The Comeback Story
On April 15th, I shipped the initial release. Two commits, deployed to Cloudflare, wired into my own setup. Then I moved on.
Coming back for this challenge, I read the code properly for the first time. Here's what was actually there.
The Durable Object was returning a placeholder.
This is the one I'm most embarrassed about. ExecutorDO.ts — the Cloudflare Durable Object that was supposed to sandbox execution in Workers mode — had this in production:
return {
stdout: `[DO received: ${command} ${args.join(" ")}]`,
exit_code: 0,
timed_out: false,
};
If you deployed to Cloudflare Workers and called ctx_execute, you'd get back a fake success. No output. No error. Just a quietly wrong result. I'd left a comment: "integrate with a Workers AI function or trusted external runner" — and never did it.
The fix wasn't to build the external runner. Cloudflare Workers genuinely cannot spawn subprocesses, and building a remote execution service in two weeks isn't the right call. The honest fix was to say so: replace the silent stub with a clear error that tells you exactly what to run instead.
ctx_get didn't exist.
The entire architecture depends on [ctx:id] references being retrievable. Store a summary, get back a token, pull the original when you need it. That was the design. There was no tool to do the pulling. Every reference was write-only. I'd built half a memory system and hadn't noticed.
Added ctx_get — strips the [ctx:] prefix, queries D1 by ID, checks expiry, returns the summary and raw output. If it's gone: "Entry not found or expired." No crash, no drama.
ctx_annotate didn't exist either.
Context only accumulated through ctx_execute — shell commands. You couldn't save why you made a decision. You couldn't annotate an architectural choice. You couldn't store a note without wrapping it in a fake command. The tool only captured what you ran, not what you thought.
Added ctx_annotate — takes text, stores it as type: "annotation", shows up in ctx_search and ctx_history. The session history now reflects intent, not just execution.
Raw output was never stored.
There's a comment in the original schema that says exactly this: "raw output is NEVER stored here — only the summary." Deliberate design. The problem: ctx_get needs something to return. You can't have a retrieval tool with nothing to retrieve.
Migration 0002_raw_output.sql recreates the table with a raw_output TEXT column and an updated CHECK constraint to include annotation as a valid entry type. Full stdout now stored in D1, capped at 512KB. Old entries retain NULL gracefully.
Setup required three services before anything ran.
The original README listed as prerequisites: Cloudflare account with Workers Paid plan, a D1 database, and a separately deployed vectorize-mcp-worker. Three services, three secrets, before the server would start. Most people would give up at step two.
Local mode now works with zero cloud setup. Four commands. The vectorize-mcp-worker is an optional semantic search upgrade — worth deploying once you're running sessions regularly, but not a requirement to get started.
ctx_search had a silent phrase-matching bug — found it live.
This one I found after v1.0.0 shipped, while testing the tools in a real session. I stored an annotation, then searched for words I knew were in it: "Vectorize optional". No results.
The bug was in ftsPhrase() in store.ts. It wrapped the entire query in double quotes:
// Before — strict phrase search
return `"${q.replace(/"/g, '""')}"`;
// "Vectorize optional" only matches if those two words are adjacent
The annotation text said "Vectorize **stays* optional"*. One word between them. No match.
The fix: quote each term individually so FTS5 treats them as implicit AND — all terms must appear in the document, anywhere, not consecutively:
// After — per-term quoting
return q.trim().split(/\s+/).filter(Boolean)
.map(word => `"${word.replace(/"/g, '""')}"`)
.join(" ");
// "Vectorize" "optional" — matches regardless of what's between them
Special-char safety (hyphens, numbers) preserved. Fixed and shipped as a post-v1.0.0 patch the same day.
The migration runner was silently wiping all data on every restart.
This one I found after the article was mostly written, while investigating why ctx_get returned (raw output not available) on every annotation.
The migration runner in local.ts used a try/catch pattern:
try { db.exec(sql); } catch { /* already applied */ }
The assumption: if the SQL fails, the migration was already applied. The problem: migration 0002 starts with DROP TABLE IF EXISTS context_entries. IF EXISTS never throws — so the catch never fires. Every server restart ran 0002 from scratch, dropping the entire context_entries table and recreating it empty. All stored context wiped. Silently.
The fix: a _migrations table that records each .sql file by name. On startup, already-applied files are skipped entirely:
db.exec(`CREATE TABLE IF NOT EXISTS _migrations (name TEXT PRIMARY KEY, applied_at INTEGER)`);
// ...
const already = db.prepare("SELECT 1 FROM _migrations WHERE name = ?").get(f);
if (already) continue;
Data now survives restarts. This is the kind of bug that only shows up when you actually use the thing — not in tests, not in code review, only when you store something, close the terminal, reopen it, and find nothing there.
My Experience with GitHub Copilot
I want to be honest here because vague Copilot praise is exactly what's eroding trust in challenge submissions right now.
The code work in this finish-up — the migration, the new tools, the server fixes — I did with Claude Code. Copilot was where the project had been failing silently since April: tests.
vitest was in package.json from the initial commit. Zero tests had ever been written. I'd been aware of this the way you're aware of a leak you haven't fixed.
I opened src/tools/executor.ts in VS Code and gave Copilot Chat this prompt:
"Write vitest unit tests for the
validateCommandfunction in this file. Test: a whitelisted command like 'node' passes, an unknown binary like 'rm' is rejected with COMMAND_NOT_ALLOWED, a path traversal attempt with '../etc' is blocked, and a git subcommand not in the allowlist is rejected."
Copilot opened package.json to check the test configuration, created executor.spec.ts (+51 lines), ran npm test --silent to verify, and reported back: "Tests added and run: all 4 passing." The agentic loop — read config, write file, run tests, confirm results — without me prompting each step.
Then I asked it to look at summarise for missing coverage. It came back with eight specific edge cases: empty output, whitespace-only lines, Windows line endings where \r could leak into summaries stored in D1, lines with leading spaces, the SUMMARY_MAX_CHARS boundary, mixed empty and non-empty lines.
The Windows one stopped me. I'm on Windows. \r\n line endings are something I live with and had completely stopped thinking about. A summary with a trailing \r stored in D1 and returned to the LLM is a subtle, real bug I would not have found on my own.
That's what honest Copilot use looks like. It found what I'd been ignoring and made me own it.
Post-Publication: It Runs in GitHub Copilot Too
After the article went live, I added two lines to .vscode/mcp.json and restarted VS Code. GitHub Copilot discovered all 8 tools automatically.
Same server. Zero code changes. The claim that this is "Claude Code only" was wrong the moment MCP support shipped in VS Code.
The .vscode/mcp.json config:
{
"servers": {
"edge-context-mode": {
"type": "stdio",
"command": "cmd",
"args": ["/c", "C:\\path\\to\\edge-context-mode\\start-mcp.cmd"]
}
}
}
What's Next
The question I keep getting: can this work outside Claude Code?
Already answered: Cursor, Windsurf, Claude Desktop, GitHub Copilot in VS Code — anything that speaks MCP works today. "Claude Code only" was underselling it from the start.
Genuinely universal requires one more step: function-calling adapters for OpenAI and Gemini. The storage layer — D1, FTS5, the reference system — doesn't change at all. Same data, same search, same [ctx:id] tokens. You'd just expose the tools as OpenAI function definitions or Gemini tool declarations instead of MCP tool registrations.
The Workers HTTP mode is already the bridge. Any LLM with tool/function calling can hit the deployed endpoint directly if you wire up the schema on their side.
The seamless part — where raw output is automatically kept out of context before you even think about it — still requires the client to route through edge-context-mode. MCP does that natively. For non-MCP LLMs you'd call ctx_execute manually instead of running commands directly. Less automatic. Still useful. The memory survives either way.
That's v1.1: OpenAI and Gemini adapters, same core, no storage changes. If you're building on a different stack and want to help, the repo is open.
One more gap worth naming: you can't retroactively capture a session that started before edge-context-mode was running. If you worked for an hour before registering the MCP server, that context lived in the Claude Code conversation window only — edge-context-mode never saw it.
The workaround right now is ctx_annotate — manually summarise what happened before the tool was active. It works but it's manual.
I tested this on the very session in which I'm writing this article. It hit context limits once, compacted, and continued. I opened the .jsonl file, found the compaction summary (stored as a type: "user" entry with isCompactSummary: true), and ran one ctx_annotate call:
ctx_annotate("SESSION IMPORT (finishupathon, 2026-05-22) — edge-context-mode v1.0.0 decisions:
ExecutorDO stub → honest error. ctx_get added. ctx_annotate added. raw_output migration.
Local mode: 4 commands, zero cloud. ftsPhrase() per-term fix.")
→ [ctx:8wo1rh1buy]
One compaction, one call. That context is now in D1. If this session compacts again tomorrow, ctx_search("edge-context-mode v1.0.0") will surface exactly what was decided and why.
The proper fix: Claude Code stores every session as a .jsonl file in ~/.claude/projects/. Full conversation, every tool call, every output. A ctx_import command that reads those files and bulk-loads them into D1 would close the gap completely — retroactive context, searchable, surviving all future compaction. The storage layer already handles it. It just needs a reader for the .jsonl format (compact summaries are the isCompactSummary: true entries) and a bulk insert path. That's v1.2.
Built with TypeScript, Cloudflare Workers, Durable Objects, D1, and a belated appreciation for what I'd actually made.





Top comments (8)
This nails the thing most "AI memory" tools miss: the bottleneck isn't compression, it's that the wrong things were in the context to begin with. I wrote a piece a while back arguing agents don't need a better model, they need a context layer — and your ctx_execute → reference-token pattern is the cleanest version of that idea I've seen. Keep raw output out, hand the model a token plus a 50-word summary, let it pull the full thing by reference only when it actually needs it. That's how we feed agents in our own stack too: structured context loaded on demand instead of dumping everything and hoping compaction is kind. The part I respect most is the honesty about the silent migration-wipe — DROP TABLE IF EXISTS never throwing, so the catch never fired. That's the canonical bug class: it doesn't fail, it quietly does the wrong thing, and no test catches it because the failure only exists across a restart. A _migrations ledger is exactly the right fix. One question: have you thought about TTL or relevance decay on stored entries? On a long-lived project, "survives compaction" eventually becomes "resurfaces a decision that was true six weeks ago and isn't anymore."
"canonical bug class" is sharper than how I described it. A crash is findable . stderr tells you where to look. Silent wrong wears correctness as a disguise. The server starts, the table exists, no error anywhere. The only signal is absent data, which you only notice if you're already looking for it. I wasn't.
On TTL: you're pointing at something harder than expiry. The 30-day window handles gone entries but not contradicted ones . old annotation says X, newer one says not-X, search returns both at equal score. The workaround right now is manual: ctx_annotate("SUPERSEDED: …"). The proper fix is weighted search, scoring by recency against BM25. That's v1.2.
"Structured context loaded on demand" . what's your storage layer, and do agents in your stack tend to over-request or under?
Storage is Postgres + pgvector — embeddings, audit log, the lot, all self-hosted (on a Mac mini, honestly). But the "loaded on demand" part matters more than the store: agents pull context through an MCP server, so retrieval is a tool call scoped to the unit of work, not a dump of the whole repo. On your question — over-request, every single time, if you let them. Hand an agent the full repo and it'll happily burn it; the win was scoping retrieval to the one ticket it's working. Your SUPERSEDED + recency-weighted BM25 plan is exactly right — contradiction, not expiry, is the hard part. Ping me when v1.2 lands, I want to see how the recency weighting behaves on conflicting annotations.
Self-hosted Postgres + pgvector on a Mac mini is the right call for retrieval latency — the store being local matters more than compute being elastic.
"Scoped to the unit of work, not the repo" is the thing I haven't surfaced cleanly in edge-context-mode. session_id is date-based by convention roughly a day, not a ticket. That's the wrong granularity for what you're describing. The scope should track the work boundary. Right now that's naming discipline on the user's end; it should be a first-class concept.
Over-request as the default is useful data . it confirms that the scoping problem isn't the agent being greedy, it's the retrieval API not constraining it. Flagging you on v1.2. One question: ticket scope is that enforced programmatically (branch name, issue ID passed to the MCP call) or are agents trusted to stay in scope themselves?
Daniel, I think many of us have at least one project sitting around that we keep meaning to come back to someday. Glad you gave this one another shot and got it across the finish line. Looking forward to seeing future updates.
The finish line moved twice while I was crossing it . that's usually the sign it was worth finishing. 👊
Deep stuff, as usual - bookmarked to re-read it later!
appreciate it leob . curious what lands differently on the second read...