Somewhere in mid-2025, the conversation shifted. People stopped obsessing over the perfect prompt and started talking about context engineering instead. The term caught fire after Tobi Lütke described it as "the art of providing all the context for the task to be plausibly solvable by the LLM," and Andrej Karpathy followed up calling it the "delicate art and science of filling the context window with just the right information for the next step."
They're both describing the same realization: the prompt is only a fraction of what determines whether your agent succeeds or fails. What matters is everything else in the context window. The system instructions, the conversation history, the retrieved documents, the tool definitions, and, critically, the memories.
The old approach: stuff everything in
If you've built an agent on OpenClaw, you've probably done some version of this. You write a system prompt, and it grows. First it's 200 tokens. Then you add user preferences. Project context. A few examples. Some rules. Before long, your system prompt is 4,000 tokens of carefully organized information, and your agent still forgets that the user prefers dark mode.
The problem isn't the model. It's the approach. Dumping everything into the system prompt treats the context window like a filing cabinet. Throw enough stuff in there and hope the model finds the right folder when it needs it. In practice, models get worse at following instructions as context length increases. Important details get buried. You're paying for tokens the model doesn't need 90% of the time.
Here's what a typical OpenClaw workspace looks like when you're doing manual context management:
~/.openclaw/workspace/
├── AGENTS.md # 800 tokens of personality
├── MEMORY.md # 2000 tokens and growing
├── USER.md # preferences, corrections
├── TOOLS.md # device names, SSH hosts
└── memory/
├── 2026-02-14.md # yesterday's session
└── 2026-02-15.md # today's session
Every session, your agent loads all of this. Every single token, every time. Your agent reads about that SSH host configuration even when you're asking it to draft a tweet.
What context engineering actually means
Context engineering flips the model. Instead of "what should I pre-load?" you ask "what does the agent need right now, for this specific step?"
LangChain's team breaks it into four strategies: write, select, compress, and isolate. Philipp Schmid at Hugging Face frames it as the components of context: instructions, short-term memory (conversation history), long-term memory (persistent knowledge), retrieved information (RAG), available tools, and output format.
The common thread: context is dynamic. It changes with every turn of the conversation, every tool call, every new piece of information. A good context engineering setup assembles the right context on the fly rather than front-loading everything.
For agent builders, this is where memory services become interesting.
Memory is the dynamic layer
Think about what a memory service actually does. You store a piece of information with some metadata (tags, importance score). Later, when the agent needs context, it runs a semantic search and pulls back only the relevant memories. The model never sees the irrelevant ones.
That's context engineering. You're selectively loading information into the context window based on what's actually needed.
Compare the two approaches:
Manual (static context):
# MEMORY.md — loaded every session, all 2000 tokens
User prefers dark mode.
User's dog is named Pixel.
Project Alpha deadline: March 15.
SSH host: 192.168.1.100, user: admin
User hates when I use emojis in code reviews.
Last meeting with Sarah: discussed Q2 roadmap.
... (50 more lines)
MemoClaw (dynamic context):
# Agent asks: "what do I need to know about the user's preferences?"
memoclaw recall "user preferences and style" --tags preferences --top 5
# Returns only:
# - User prefers dark mode (importance: 0.6)
# - User hates emojis in code reviews (importance: 0.8)
# - Prefers bullet points over paragraphs (importance: 0.5)
The first approach burns 2,000 tokens whether the agent needs them or not. The second uses maybe 200 tokens and they're all relevant.
Putting it together with OpenClaw
If you're running an OpenClaw agent, the MemoClaw skill plugs into this workflow. Install it:
openclaw skill install anajuliabit/memoclaw
Now your agent can store and recall memories as part of its normal operation. Here's what context engineering looks like in practice.
Storing a correction (high importance, tagged):
memoclaw store "User corrected: deploy target is staging-eu, not production" \
--importance 0.9 \
--tags corrections,deploy
Recalling relevant context before a task:
memoclaw recall "deployment process" --tags deploy --top 3
Using namespaces to isolate project context:
memoclaw recall "architecture decisions" --namespace project-alpha --top 5
The agent pulls in 3-5 relevant memories instead of loading an entire MEMORY.md file. Each memory has an importance score, so the agent can prioritize corrections (0.9) over casual preferences (0.4).
Why importance scores matter more than you'd think
Not all memories are equal. When your agent has a 128K context window, this might not seem like it matters. But context engineering isn't just about fitting within limits. It's about signal-to-noise ratio.
Research consistently shows that models perform better with less, more relevant context than with more, less relevant context. A 2,000-token context where every token matters will outperform a 50,000-token context where the answer is buried on page 12.
Importance scores let you control this directly. A user correction ("never deploy to production on Fridays") should always surface. A casual mention ("I had pizza for lunch") probably shouldn't, unless the agent is specifically asked about food preferences.
# This will always surface in deploy-related recalls
memoclaw store "NEVER deploy to production on Fridays - user was very clear" \
--importance 1.0 \
--tags deploy,rules
# This surfaces only when relevant
memoclaw store "User mentioned they like pizza" \
--importance 0.2 \
--tags personal
The MEMORY.md problem
If you're using OpenClaw's default workspace setup, you have a MEMORY.md file. It works. I won't pretend it doesn't. But it has a specific failure mode that gets worse over time.
MEMORY.md is append-friendly. You keep adding to it. After a few weeks, it's a thousand lines of mixed context: preferences, project notes, corrections, random observations. Your agent loads all of it, every session. The file becomes a context tax on every single interaction.
Some people manage this by manually pruning MEMORY.md. That works too, but it means you're doing the context engineering by hand. You're the one deciding what's relevant. A memory service automates that decision with semantic search.
The migration path is straightforward:
memoclaw migrate --file ~/.openclaw/workspace/MEMORY.md --namespace default
This imports your existing memories with embeddings, so they become searchable. You can keep MEMORY.md as a fallback while you test, then gradually shift your agent to use recall instead of loading the file.
What this looks like in 2026
Context engineering is still a new discipline. The term barely existed 18 months ago. But the pattern is clear: agents that manage their context dynamically outperform agents that don't. Memory services are one piece of that puzzle, alongside RAG pipelines, tool selection, and conversation compression.
For OpenClaw users specifically, the shift from static files to semantic memory is probably the highest-leverage change you can make. It's not about replacing your entire setup. It's about letting the agent decide what it needs instead of forcing it to read everything every time.
The context window is working memory. Treat it that way.
MemoClaw is free for the first 100 API calls per wallet. No registration required. Install the skill: openclaw skill install anajuliabit/memoclaw
Top comments (0)