DEV Community

mariatanbobo
mariatanbobo

Posted on

We Tried 6 Memory Providers for Hermes Agent — Here's What We Learned

Giving an AI agent persistent memory sounds simple. Store facts. Recall them later. How hard can it be?

Three weeks and six providers later, I have opinions.

This is the story of what broke, what we discarded, and the one thing that finally worked — and why.


The Setup

I run Hermes Agent on a headless VPS with 4GB RAM. Nothing exotic. The goal was straightforward: the agent should remember things across sessions — my preferences, environment details, lessons learned — without me repeating myself every conversation.

Hermes ships with several bundled memory providers and supports third-party ones via plugins. Should be plug-and-play, right?


Phase 1: The Ones That Failed Silently

AgentMemory

The first provider we had. Node.js runtime, Docker container for the iii-engine, 860 memories at peak. It seemed fine.

Then we switched to a different provider to try it out. AgentMemory's ingestion died instantly — but nothing told us. Tools responded normally. No errors in logs. Just… nothing was being stored anymore.

Root cause: Hermes supports exactly one active memory provider. The switch disabled AgentMemory's sync_turn() without a warning. The deadliest failure mode: total silence.

YantrikDB

Technically, YantrikDB worked. Rust engine, 8 tools, Precision@5 of 0.80. It stored memories. It had a self-maintaining pipeline — deduplication, contradiction detection, recency ranking. We even set up cron jobs to monitor it for updates.

The problem was qualitative. The hooks were too aggressive — it ingested everything, filling up with noise. And when the agent actually needed a memory? YantrikDB was rarely queried at the right moment. The recall was poorly timed, and the stored information was low-signal. It "worked" but never felt useful.

Lesson #1: A memory provider that stores noise and misses the moments that matter is barely better than one that fails silently. Integration quality matters more than feature count.


Phase 2: The One That Wouldn't Die (Or Live)

Hindsight

This one looked promising on paper. Bundled with Hermes. 91.4% on the LongMemEval benchmark. Knowledge graphs, reflect synthesis — the "power pick."

It did not go well. But I want to be honest about what was Hindsight's fault and what was ours, because the distinction matters.

What was our fault:

  1. We installed the wrong package. The Hermes plugin only needs hindsight-client — a lightweight Python library. We ran pip install hindsight-all, which is the "All-in-One Bundle" that bundles the full API server, embedding engine, and an embedded PostgreSQL called pg0. We didn't read the plugin.yaml.

  2. We triggered the pg0 download. hindsight-all pulls in hindsight-api-slim, whose default database is pg0 (embedded PostgreSQL). On first startup it silently downloads and initializes its own database engine. On a 4GB VPS, this hung for 177 seconds. We could have set HINDSIGHT_API_DATABASE_URL to point at our existing system PostgreSQL — the docs document this clearly. We just never read them.

  3. We didn't check LLM compatibility first. Hindsight supports openai, anthropic, gemini, groq, ollama, and lmstudio. We use DeepSeek. There's no HINDSIGHT_API_LLM_BASE_URL to redirect an OpenAI-compatible endpoint to DeepSeek's API. We spent time trying to make it work before discovering this was a dead end. If we'd read the docs upfront, we'd have known DeepSeek wasn't supported and might have skipped the whole thing.

What was Hindsight's fault:

  1. Env var caching bug. The daemon cached environment variables across restarts. We'd change HINDSIGHT_API_LLM_API_KEY, restart the daemon, and nothing would change. Had to kill the process and restart — the daemon didn't re-read its environment on SIGHUP.

  2. Daemon respawn after uninstall (the big one). After full uninstall — pip packages removed, config cleaned, directories deleted, plugin disabled — hindsight-api daemons kept respawning every 2 minutes. The Hermes gateway cached plugin state at startup and kept spawning processes for software that no longer existed on disk.

Breaking the cycle required renaming plugin.yaml to plugin.yaml.disabled, stopping the gateway, killing processes with pkill -9, then restarting. A clean uninstall should not require process hunting.

The bottom line: We were sloppy. We dove into installation without reading what the plugin actually needed, picked the heaviest package, and didn't check whether our LLM provider was supported. But even if we'd done everything right, the env var caching bug and the daemon respawn issue were architectural problems — and the lack of DeepSeek support would have been a dealbreaker regardless.

Lesson #2: Read the plugin.yaml before installing anything. And if uninstallation requires pkill -9, the architecture has a lifecycle problem.


Phase 3: The Evaluation

At this point we had criteria. Real criteria, earned through pain:

  1. Cannot silently fail — if ingestion stops, I need to know
  2. Simple uninstall — no daemon ghosts
  3. Local-first — no cloud dependency, no API key expiry taking down memory
  4. Hermes-specific author instructions — the #1 predictor of whether integration actually works
  5. No double token burn — I'm not paying for inference twice
  6. Signal over noise — if it stores everything, it stores nothing

We surveyed what was available:

Provider Verdict Killer Flaw
Holographic (bundled) Too simple sync_turn() is a no-op — no auto-ingestion
Supermemory (bundled) Cloud-only All cloud. Best benchmarks, but contradicts local-first
Mem0 Double token burn LLM-Embedded: the agent calls an LLM, Mem0 calls its OWN LLM for fact extraction. Pay twice.
MemPalace Wrong platform 96.6% LongMemEval, but built for Claude Code — not Hermes

Phase 4: The One That Worked

Mnemosyne

By AxDSan. Posted directly to r/hermesagent by its author. The README literally says: "The Zero-Dependency, Sub-Millisecond AI Memory System for Hermes Agents."

What makes it different:

In-process Python + SQLite. No separate service. No Docker. No daemon. If the gateway process runs, memory works. There is nothing to fall out of sync with.

Sub-millisecond reads. 0.076ms. 500x faster than the previous-generation providers. You don't feel it.

Three code paths, all verified working:

  • Explicit remember — the agent calls remember() when asked
  • Auto-ingestion — sync_turn captures every conversation turn automatically
  • Context injection — high-importance memories surface in each turn's system prompt

Installation was one command:

pip install mnemosyne-memory[embeddings]
python -m mnemosyne.install
hermes memory setup  # interactive picker → select "mnemosyne"
Enter fullscreen mode Exit fullscreen mode

No [all] — that pulls ctransformers and downloads 1–4GB of GGUF models. On a 4GB machine, that's OOM territory. The [embeddings] extra adds fastembed (133MB ONNX model) for semantic search, and LLM consolidation routes through your existing API key.

After a week of operation:

  • 362 working memories
  • 29 episodic summaries (auto-consolidation working)
  • 27/27 test suite passing
  • Zero silent failures. Zero daemon hunts. Zero forced kills.

The Pattern

Every failed provider shared one architectural decision: an external runtime with its own lifecycle.

AgentMemory's Node.js Docker. Hindsight's separate API server + daemon. When the runtime and the gateway fell out of sync — silent failure, ghost processes, respawn loops.

YantrikDB was different — it was in-process (Rust via PyO3), so it didn't have the lifecycle problem. But it showed a subtler failure mode: hooks that favor quantity over quality. If the memory provider hoovers up every turn indiscriminately, the agent learns to ignore it — and the moments that actually matter get buried in noise.

Mnemosyne's in-process Python + SQLite avoids the lifecycle problem. Its configurable importance scoring and sleep consolidation (summarizing old working memories into episodic ones) avoid the noise problem. It's the simplest thing that could possibly work on both fronts.


What I'd Tell Someone Starting Today

  1. Read the plugin.yaml first. Before pip install anything, check what the plugin actually requires. The difference between hindsight-client and hindsight-all is the difference between a library and an entire server stack.
  2. Local-first, single-process. If memory needs a separate service, it will fail in ways you won't notice.
  3. Verify ingestion before trusting it. After installing any memory provider, store a test fact, restart, and ask for it back.
  4. The author matters. Does the provider's README mention your agent platform by name? If not, you're doing integration work the author didn't do.
  5. Check LLM compatibility before installing. If the provider doesn't support your model, no amount of configuration will fix it.
  6. [all] is a trap. Read the install extras. On constrained hardware, the "everything" option downloads models and databases you don't need.
  7. Clean uninstall is a feature. If removing a provider takes more than deleting a directory, the architecture is fragile.
  8. Signal beats volume. A provider that stores everything indiscriminately trains the agent to ignore it. Better to store 50 high-signal facts than 5,000 noise entries.

I'm @MariaTanBoBo on X. This article was written with Hermes Agent and published via the DEV.to API — yes, an AI agent can publish articles now. The future is weird.

Top comments (0)