Context window math: what MEMORY.md actually costs you

#ai #agents #programming #devtools

Every OpenClaw agent loads MEMORY.md into the system prompt on every single message. It's right there in AGENTS.md: "Read MEMORY.md — this is who you are." The agent doesn't skim it. It doesn't pick the relevant bits. It dumps the whole file into the context window, every turn.

That's fine when your MEMORY.md is 20 lines. It's less fine six months in.

The math

Let's work through real numbers.

A typical MEMORY.md after a few months of use is around 5KB of text. Run that through a tokenizer and you get roughly 1,500 tokens. Not a disaster on its own, but tokens compound.

Claude Sonnet 4 input pricing: $3 per million tokens. Claude Opus 4: $15 per million.

One message with a 5KB MEMORY.md:

Model	MEMORY.md cost per message
Sonnet 4	1,500 × $3/1M = $0.0045
Opus 4	1,500 × $15/1M = $0.0225

That's just the memory file. Your actual prompt, conversation history, tool results — those are on top.

Now scale it. Say you exchange 50 messages a day with your agent (light usage if you're building something):

Model	Daily	Weekly	Monthly
Sonnet 4	$0.23	$1.58	$6.75
Opus 4	$1.13	$7.88	$33.75

$33.75/month just to carry around a file. And that's before the memory file grows. I've seen MEMORY.md files hit 15-20KB. Triple the numbers above and you're spending $100/month on Opus just to re-read the same notes.

There's also a less obvious cost: every token your memory file uses is a token unavailable for actual conversation. Context windows are big now, but they're not free. A 15KB MEMORY.md eats ~4,500 tokens of your context budget on every turn. That's space that could hold tool outputs, longer conversations, or — you know — the thing you're actually trying to do.

What MemoClaw does differently

MemoClaw stores memories in a vector database and retrieves only what's relevant to the current query. Instead of injecting 1,500 tokens every message, you get back 200-500 tokens of context that actually matters.

Here's what that looks like:

# Store a memory
memoclaw store "User prefers dark mode and vim keybindings" --tags preferences

# Store something important
memoclaw store "NEVER deploy to prod on Fridays - learned this the hard way 2025-12-19" --importance 0.9 --tags rules

# Recall relevant context
memoclaw recall "what are the user's editor preferences"

When your agent calls recall, MemoClaw runs a semantic search and returns the 5 most relevant memories. Not the whole file. Not everything you've ever noted down. Just the bits that match what you're doing right now.

The token math changes:

Approach	Tokens per message	Cost per message (Opus 4)
MEMORY.md (5KB)	~1,500	$0.0225
MemoClaw recall	~200-500	$0.003-0.0075

That's a 3-7x reduction in memory-related token costs per message.

Over a month at 50 messages/day on Opus 4, using MemoClaw recall instead of MEMORY.md:

MEMORY.md: $33.75/month in memory tokens
MemoClaw (avg 350 tokens): $7.88/month in memory tokens + API costs

The API cost: MemoClaw charges $0.005 per recall. At 50 recalls/day, that's $0.25/day or $7.50/month. So your total memory cost with MemoClaw on Opus is ~$15.38/month vs $33.75 with the file. On Sonnet the savings are slimmer since input tokens are cheaper, but you still get the benefit of relevant-only context.

Your first 100 API calls are free, no payment needed. After that it's $0.005 per store or recall, paid via x402 with USDC on Base.

The real tradeoff: latency

I should be honest about this. MEMORY.md is a file read. It's instant. Zero network latency, zero API calls, zero chance of a service being down.

A MemoClaw recall is an API call to api.memoclaw.com. That means:

Network round trip (typically 100-300ms)
If the API is down, your agent has no memory that turn
You're adding a dependency to an external service

For some setups, that matters. If your agent runs in a tight loop processing hundreds of messages and needs sub-second responses, the latency adds up. If you're having a normal conversation with your agent and sending a message every few minutes, 200ms is invisible.

There's also a hybrid approach: keep a small MEMORY.md with the absolute essentials (5-10 lines, under 500 bytes) and use MemoClaw for everything else. You get instant access to core identity plus semantic search for the long tail.

Migrating from MEMORY.md

If you've got an existing MEMORY.md, you can migrate it:

memoclaw migrate --file ~/.openclaw/workspace/MEMORY.md

This splits your file into individual memories, generates embeddings, and stores them. Each distinct piece of information becomes a separate, searchable memory.

After migrating, trim your MEMORY.md down to the bare essentials or remove it entirely. Your agent can use memoclaw recall through the MemoClaw skill to pull relevant context on each turn.

When to stick with MEMORY.md

MemoClaw isn't always the right call. Keep using the file if:

Your memory is small (under 1KB / ~300 tokens) and you want zero dependencies
You need guaranteed offline access
Every piece of context is relevant to every conversation (rare, but possible)
You don't want to deal with wallet setup for payments

When to switch

Consider MemoClaw when:

Your MEMORY.md has grown past 3-5KB
You're on Opus or another expensive model where input tokens hurt
Most of your stored context is only relevant some of the time
You want to share memories across agents (same wallet, different namespaces)

The math is straightforward. Count your MEMORY.md tokens, multiply by your daily message count, and check what that costs you per month. If the number makes you uncomfortable, semantic recall is worth a look.

Install the CLI: npm install -g memoclaw — Docs — MemoClaw skill on ClawHub