Description: Built from frustration with Claude/GPT forgetting everything. 100% local, 100% free alternative to Mem0/Zep. Works with 11+ IDEs.
Why AI Assistants Forget Everything (And How I Fixed It with SuperLocalMemory)
I spent 47 minutes yesterday re-explaining my project architecture to Claude.
Not because Claude is bad. But because AI assistants have amnesia. Every new session is a blank slate. Every conversation starts from zero.
So I copied my project context into a text file. Then that file grew to 5,000 words. Then I started forgetting to update it. Then I had three different context files that contradicted each other.
Sound familiar?
After losing count of how many times I'd typed "We use React with TypeScript, JWT for auth, PostgreSQL for the database..." I finally snapped and built SuperLocalMemory V2.
This is the story of building my first real open-source project, the technical challenges I solved, and why you might want it too.
The Problem: AI Assistants Have Goldfish Memory
Why AI Forgets Everything
AI assistants like Claude, GPT-4, and Cursor's AI are stateless by design. Every conversation is isolated:
Session 1:
You: "I prefer React hooks over class components"
Claude: "Got it, I'll use hooks"
Session 2 (next day):
You: "Build me a component"
Claude: *writes a class component*
You: "I SAID HOOKS!"
This happens because:
- No persistent memory - Conversations don't share context
- Token limits - Context windows are finite (even with 200K tokens, you can't paste your entire codebase)
- No learning - AI doesn't remember your preferences, decisions, or patterns
The Copy-Paste Workaround (That Doesn't Scale)
So developers create project-context.txt:
Project: MyApp
Stack: React, TypeScript, FastAPI, PostgreSQL
Auth: JWT tokens (24h expiration)
Style: Functional components, hooks, TypeScript strict mode
Deployment: Docker Compose
...
Then you copy-paste this into every Claude conversation.
Problems:
- File grows to 10,000+ words (eats your token budget)
- Multiple files (frontend-context.txt, backend-context.txt, database-context.txt)
- Constantly outdated ("Wait, we switched from PostgreSQL to MongoDB last week")
- Manual searching ("Where did I document that auth bug fix?")
Existing Solutions (And Their Problems)
| Solution | Problem | Cost |
|---|---|---|
| Mem0 | Cloud-based (privacy risk), usage-based pricing | Starts at $50/mo |
| Zep | Cloud-only, credit system | $50/mo |
| Supermemory | Token/query limits | $19-399/mo |
| Personal.AI | Closed ecosystem, no free tier | $33/mo |
| Manual notes | Doesn't scale, no search, no AI integration | Time |
None of these worked for me because:
- Cloud-based = privacy risk - I work with client code under NDAs
- Subscription costs - I'm building open-source tools, not paying $600/year for memory
- Limited integrations - Works with ChatGPT but not Cursor? Useless.
- Vendor lock-in - What happens when the service shuts down?
I needed something that was:
- 100% local (my machine, my data)
- 100% free (no usage limits, no credit systems)
- Universal (works with Claude, Cursor, Aider, any AI tool)
- Smart (not just keyword search)
That thing didn't exist.
So I built it.
My Solution: SuperLocalMemory V2
TL;DR: Local-first AI memory system that works with 11+ IDEs, learns your patterns, auto-discovers relationships, and costs $0 forever.
GitHub: https://github.com/varun369/SuperLocalMemoryV2
What It Does
SuperLocalMemory sits between you and your AI assistant:
You → SuperLocalMemory → Claude/GPT/Cursor
(remembers everything)
Save memories:
superlocalmemoryv2:remember "Fixed auth bug - JWT tokens were expiring in 1h, changed to 24h. File: src/auth/tokens.py"
Recall instantly:
superlocalmemoryv2:recall "auth bug"
# ✓ Found: "Fixed auth bug - JWT tokens expiring in 1h, changed to 24h"
# Tags: authentication, bug-fix
# Project: myapp
# Cluster: "Authentication & Security" (related: session management, OAuth)
But it's not just a database. It's intelligent.
Deep-Dive: The 4-Layer Architecture
Most "AI memory" systems are just fancy keyword search over a database. SuperLocalMemory V2 implements four layers of intelligence, each adding context without replacing the others.
Layer 1: Raw Storage (SQLite + FTS5 + TF-IDF)
The foundation is blazing-fast local search:
# SQLite with Full-Text Search (FTS5)
CREATE VIRTUAL TABLE memories_fts USING fts5(content, tags);
# TF-IDF vector embeddings (no external APIs!)
def compute_tfidf(text):
# Pure Python TF-IDF implementation
# No OpenAI, no sentence-transformers, completely local
...
Why SQLite?
- Ships with Python (zero dependencies)
- ACID transactions (your data is safe)
- Full-text search built-in (FTS5 is FAST)
- Single file database (easy backups)
Search speed: 30-45ms for 500 memories. On my laptop. Locally.
Layer 2: Hierarchical Index (PageIndex Approach)
Inspired by Meta AI's PageIndex research, memories form a tree:
Project: MyApp
├── Authentication
│ ├── JWT implementation
│ ├── OAuth flow
│ └── Password reset bug fix
├── Database
│ ├── PostgreSQL → MongoDB migration
│ └── Index optimization
└── Frontend
├── React component patterns
└── State management with Zustand
Why hierarchical?
- Search finds not just the memory, but its context (parent/children)
- Breadcrumbs:
MyApp → Authentication → JWT implementation - O(log n) lookups instead of O(n) scans
Code example:
# Create parent-child relationships
store.add("Implemented JWT auth", parent_id=None) # Root
store.add("JWT tokens expire in 24h", parent_id=1) # Child
# Retrieve with context
memory = store.get(2)
print(memory['breadcrumbs'])
# → "MyApp → Authentication → JWT tokens expire in 24h"
Layer 3: Knowledge Graph (GraphRAG Implementation)
This is where it gets magical. The system auto-discovers relationships you didn't know existed.
How?
- TF-IDF Entity Extraction - Finds important terms from your memories:
# Memory: "Fixed JWT token expiration bug in authentication module"
# Entities extracted: [JWT, token, authentication, expiration]
- Leiden Clustering - Groups related memories automatically:
from leidenalg import find_partition
# Builds graph, runs community detection
# Output: Clusters like "Authentication & Security", "Performance", "Frontend"
- Auto-naming - Names clusters from top entities:
# Cluster 1: [JWT, OAuth, session, token, auth]
# Auto-name: "Authentication & Tokens"
Example output:
python ~/.claude-memory/graph_engine.py build
✓ Processed 47 memories
✓ Created 12 clusters:
- "Authentication & Tokens" (8 memories)
Entities: JWT, OAuth, session, authentication
- "React Components" (11 memories)
Entities: React, hooks, components, useState
- "Database Optimization" (5 memories)
Entities: PostgreSQL, index, query, performance
Why this matters:
When you search for "auth", you also get:
- JWT token implementation
- OAuth flow documentation
- Session management decisions
- Password reset bug fixes
Even if you never tagged them together. The graph discovered the relationships.
Layer 4: Pattern Learning (xMemory Approach)
Over time, SuperLocalMemory learns who you are as a developer:
python ~/.claude-memory/pattern_learner.py update
Your Coding Identity:
- Framework: React (73% confidence)
- Language: Python for APIs, TypeScript for frontend (65% confidence)
- Style: Performance over readability (58% confidence)
- Testing: Jest + React Testing Library (65% confidence)
- API design: REST over GraphQL (81% confidence)
- Security: JWT tokens, never store passwords in plain text
How?
- Frequency analysis - "React" mentioned 23 times, "Vue" mentioned 2 times → React preference
- Context extraction - "prefer functional components" → Style pattern
- Confidence scoring - More mentions = higher confidence
Why this matters:
Your AI assistant can now match your preferences automatically:
You: "Build me an API endpoint"
Claude: *reads your identity patterns*
Claude: "Here's a FastAPI endpoint with JWT auth (I know you prefer FastAPI and JWT from your patterns)..."
No more "Actually, I use FastAPI, not Flask" corrections.
How It All Works Together
When you recall a memory, all 4 layers activate:
query = "authentication patterns"
# Layer 1: Fast keyword search (FTS5)
keyword_results = fts5_search(query) # 30ms
# Layer 1b: Semantic search (TF-IDF vectors)
semantic_results = tfidf_search(query) # 45ms
# Layer 3: Graph enhancement
graph_results = related_memories(semantic_results) # 60ms
# Layer 4: Pattern context
patterns = get_identity_patterns()
# Combine results
final_results = merge([
keyword_results,
semantic_results,
graph_results
]) + patterns
# Total time: ~80ms
You get:
- Exact matches (Layer 1 keyword)
- Conceptually similar memories (Layer 1 semantic)
- Related memories from the graph (Layer 3)
- Your coding preferences (Layer 4)
- Hierarchical context (Layer 2 breadcrumbs)
All in 80 milliseconds. Locally.
Universal Integration: It Just Works Everywhere
Here's the problem with most AI memory tools: they only work with one or two apps.
SuperLocalMemory V2 uses three integration methods so it works everywhere:
Method 1: MCP (Model Context Protocol)
For modern IDEs like Cursor, Windsurf, and Claude Desktop:
// Auto-configured by install.sh
{
"mcpServers": {
"SuperLocalMemory": {
"command": "python3",
"args": ["~/.claude-memory/mcp_server.py"]
}
}
}
In Cursor:
You: "@SuperLocalMemory remember that we use FastAPI with async endpoints"
You: "Build me an API endpoint"
Cursor AI: *automatically retrieves your FastAPI patterns and preferences*
No manual commands. The AI just knows.
Method 2: Skills (Slash Commands)
For Claude Code, Continue.dev, and Cody:
/slm-remember "React hooks for state management" --tags frontend
/slm-recall "state management"
/slm-status
Six universal skills that work across multiple AI assistants.
Method 3: CLI (Universal)
For any terminal, any script, any tool:
# Simple, clean syntax
slm remember "Deploy with Docker Compose"
slm recall "deployment"
slm status
# Use in scripts
#!/bin/bash
slm remember "Build started at $(date)"
npm run build
if [ $? -eq 0 ]; then
slm remember "Build succeeded at $(date)"
fi
The key insight: All three methods write to the same local SQLite database.
No data duplication. No conflicts. One source of truth.
Real-World Usage: Before vs After
Before SuperLocalMemory
Monday morning:
You: "Claude, implement OAuth login"
Claude: "What framework are you using?"
You: "FastAPI. With JWT. PostgreSQL. We went over this last week."
Wednesday:
You: "Why is auth broken?"
Claude: "Let me analyze..."
You: "We fixed this bug on Monday! JWT expiration!"
Claude: "I don't have access to previous conversations"
You: *searches through 15 chat logs manually*
Friday:
You: "Build a new API endpoint"
Claude: "Here's a Flask example"
You: "WE USE FASTAPI!" (3rd time this week)
Time wasted: ~3 hours/week re-explaining context.
After SuperLocalMemory
Monday morning:
slm remember "Implemented OAuth with FastAPI + JWT, tokens expire in 24h, refresh tokens in DB"
Wednesday:
You: "Why is auth broken?"
You: "/slm-recall auth bug"
✓ Found: "JWT tokens expiring too fast - increased to 24h"
✓ Cluster: "Authentication & Tokens"
✓ Related: OAuth implementation, token refresh flow
You: "Check if token expiration is 24h"
Claude: *already has context from memory*
Friday:
You: "Build a new API endpoint"
Claude: *reads your patterns: FastAPI (81% confidence), JWT auth (73% confidence)*
Claude: "Here's a FastAPI endpoint with JWT authentication..."
You: ✓ "Perfect."
Time saved: ~2.5 hours/week. ROI: Install time (5 min) paid back in first week.
Technical Challenges (And How I Solved Them)
Building this wasn't trivial. Here are the hard problems:
Challenge 1: Backward Compatibility
Problem: Users upgrading from v2.0.0 to v2.1.0 shouldn't lose data or experience breaking changes.
Solution: Database migrations with ALTER TABLE checks:
# Add new v2.1.0 columns to existing tables
v2_columns = [
('cluster_id', 'INTEGER'),
('entity_vector', 'TEXT'),
('importance', 'INTEGER DEFAULT 5'),
]
for col_name, col_type in v2_columns:
try:
cursor.execute(f'ALTER TABLE memories ADD COLUMN {col_name} {col_type}')
except sqlite3.OperationalError:
pass # Column already exists (v2.1.0 database)
Result: 100% backward compatible. Zero breaking changes. Users just run ./install.sh.
Challenge 2: Graph Clustering Performance
Problem: Leiden clustering is O(n²) worst case. With 1,000+ memories, it takes 60+ seconds.
Solution: Progressive profiling + clear documentation:
# For >1000 memories, recommend profile splitting
if memory_count > 1000:
print("TIP: Consider splitting into profiles for better performance")
print("Example: slm switch-profile work")
Alternative solution (v2.2.0): Incremental graph updates (still in development).
Challenge 3: Zero External Dependencies
Problem: Most semantic search systems require sentence-transformers (downloads 500MB model). I wanted zero mandatory dependencies.
Solution: Pure Python TF-IDF fallback:
# Try advanced method first
try:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(texts)
except ImportError:
# Fall back to pure Python TF-IDF
embeddings = compute_tfidf_vectors(texts)
Result: System works out-of-the-box with zero pip installs. Optional dependencies improve performance but aren't required.
Challenge 4: Cross-Platform Support
Problem: Users on Mac, Linux, and Windows expect it to "just work."
Solution:
-
install.shfor Mac/Linux (bash) -
install.ps1for Windows (PowerShell) - Path detection for 11+ IDEs
- Shell detection (bash vs zsh)
# Auto-detect shell and configure PATH
if [ -f ~/.bashrc ]; then
echo 'export PATH="$HOME/.claude-memory/bin:$PATH"' >> ~/.bashrc
elif [ -f ~/.zshrc ]; then
echo 'export PATH="$HOME/.claude-memory/bin:$PATH"' >> ~/.zshrc
fi
Result: One command, any platform.
Challenge 5: MCP Server Integration
Problem: Different IDEs implement MCP differently (Cursor vs Windsurf vs Claude Desktop).
Solution: Universal MCP server with auto-configuration:
# mcp_server.py implements MCP spec
# 6 tools, 4 resources, 2 prompts
@server.tool()
async def remember(content: str, tags: str = "", project: str = ""):
"""Save memory with context"""
store.add(content, tags=tags, project=project)
return {"status": "success"}
@server.resource("memory://recent")
async def list_recent():
"""List recent memories"""
return store.list_all(limit=10)
Installation detects IDEs and configures automatically:
./install.sh
✓ Detected: Claude Desktop
✓ Configured: ~/.config/claude/claude_desktop_config.json
✓ Detected: Cursor
✓ Configured: ~/.cursor/mcp.json
✓ Detected: Windsurf
✓ Configured: ~/.windsurf/mcp.json
Result: Zero manual configuration. It just works.
What's Next: Roadmap
Current version: v2.1.0-universal
Planned features (v2.2.0):
- Incremental graph updates (no full rebuild)
- Auto-compression based on access patterns
- Web UI for graph visualization
- Real-time pattern updates
- Multi-language entity extraction
Long-term (v3.0.0):
- npm distribution:
npm install -g superlocalmemory - Same features as V2, easier installation
- Windows installer (.exe)
See full roadmap: https://github.com/varun369/SuperLocalMemoryV2/wiki/Roadmap
Try It Yourself
Installation (5 minutes)
# Clone the repo
git clone https://github.com/varun369/SuperLocalMemoryV2.git
cd SuperLocalMemoryV2
# Run installer (Mac/Linux)
./install.sh
# Or Windows (PowerShell)
.\install.ps1
First Memory
# Save your first memory
slm remember "I prefer React with TypeScript for frontend projects" --tags preferences,frontend
# Build the knowledge graph
slm build-graph
# Check system status
slm status
# Search for it
slm recall "react"
Usage in Claude Code
/slm-remember "FastAPI for APIs, PostgreSQL for database" --tags stack
/slm-recall "database"
/slm-status
Usage in Cursor (MCP)
You: "Remember that we use Docker Compose for deployment"
Cursor AI: *automatically saves to SuperLocalMemory*
You: "How do we deploy this?"
Cursor AI: *retrieves from SuperLocalMemory* "You deploy using Docker Compose..."
Performance Benchmarks
Tested on MacBook Pro M1, 16GB RAM:
| Operation | Time | Dataset Size |
|---|---|---|
| Add memory | <10ms | N/A |
| Search (hybrid) | 80ms | 500 memories |
| Graph build | 2s | 100 memories |
| Graph build | 15s | 500 memories |
| Pattern learning | <2s | 100 memories |
Storage efficiency:
- Tier 1 (active): Full content
- Tier 2 (warm, 30-90 days): 60% compression
- Tier 3 (cold, 90+ days): 96% compression
Example: 1,000 memories = ~15MB (vs 380MB uncompressed).
Comparison with Alternatives
vs Mem0
| Feature | Mem0 | SuperLocalMemory V2 |
|---|---|---|
| Hosting | Cloud (privacy risk) | 100% local |
| Price | Usage-based (~$50/mo) | $0 forever |
| Setup | API keys, cloud account | 5-min install |
| IDE support | Limited | 11+ IDEs |
| Pattern learning | ❌ | ✅ Full |
| Knowledge graphs | ✅ Cloud-based | ✅ Local |
| Data ownership | Vendor | You |
vs Zep
| Feature | Zep | SuperLocalMemory V2 |
|---|---|---|
| Hosting | Cloud-only | 100% local |
| Price | $50/mo | $0 forever |
| Credit system | Yes (limits) | Unlimited |
| Universal CLI | ❌ | ✅ |
| Multi-profile | ❌ | ✅ |
| Open source | Partial | MIT License |
vs Personal.AI
| Feature | Personal.AI | SuperLocalMemory V2 |
|---|---|---|
| Free tier | ❌ None | ✅ Unlimited |
| Price | $33/mo | $0 forever |
| Local-first | ❌ | ✅ |
| IDE integration | ❌ | 11+ IDEs |
| Developer-focused | ❌ | ✅ |
Conclusion: SuperLocalMemory V2 is the only solution that's:
- 100% local (privacy-first)
- 100% free (no limits)
- Universal (works everywhere)
Common Questions
"Why not just use text files?"
Text files don't:
- Auto-discover relationships
- Learn your patterns
- Provide instant search
- Integrate with AI assistants
- Scale beyond 100 notes
"Why not use Notion/Obsidian?"
Notion/Obsidian are great for you to read notes. SuperLocalMemory is for AI assistants to retrieve context:
- APIs for programmatic access
- TF-IDF semantic search
- Knowledge graph integration
- MCP protocol support
- Pattern learning
Different tools, different purposes.
"Is this secure?"
Yes:
- 100% local (data never leaves your machine)
- No telemetry, no tracking, no external API calls
- Standard filesystem permissions
- SQLite ACID transactions
- Open-source (audit the code yourself)
GDPR/HIPAA compliant by default (data is yours).
"Does it work with ChatGPT?"
Yes! ChatGPT Desktop supports MCP. See setup guide: https://github.com/varun369/SuperLocalMemoryV2/blob/main/docs/MCP-MANUAL-SETUP.md
Also works with: Claude, Cursor, Windsurf, Continue.dev, Cody, Aider, Perplexity, Zed, OpenCode, Antigravity.
"How is this different from RAG?"
SuperLocalMemory uses RAG (Retrieval-Augmented Generation) but adds:
- Knowledge graphs (relationships)
- Pattern learning (identity)
- Hierarchical indexing (context)
- Multi-method search (semantic + keyword + graph)
RAG is one layer. SuperLocalMemory is the full stack.
Call-to-Action
If you're tired of re-explaining your project to AI assistants every single day...
If you've spent hours managing context files that never stay updated...
If you want an AI assistant that actually remembers you...
Try SuperLocalMemory V2:
- ⭐ Star on GitHub: https://github.com/varun369/SuperLocalMemoryV2
- 📖 Read the docs: https://github.com/varun369/SuperLocalMemoryV2/wiki
- 🚀 Install in 5 minutes: https://github.com/varun369/SuperLocalMemoryV2/wiki/Installation
- 💬 Ask questions: https://github.com/varun369/SuperLocalMemoryV2/issues
- ☕ Buy me a coffee: https://buymeacoffee.com/varunpratah
100% local. 100% free. 100% yours.
Top comments (0)