varun pratap Bhardwaj

Posted on Feb 7

Why AI Assistants Forget Everything (And How I Fixed It with SuperLocalMemory)

#ai #opensource #architecture #claude

Description: Built from frustration with Claude/GPT forgetting everything. 100% local, 100% free alternative to Mem0/Zep. Works with 11+ IDEs.

Why AI Assistants Forget Everything (And How I Fixed It with SuperLocalMemory)

I spent 47 minutes yesterday re-explaining my project architecture to Claude.

Not because Claude is bad. But because AI assistants have amnesia. Every new session is a blank slate. Every conversation starts from zero.

So I copied my project context into a text file. Then that file grew to 5,000 words. Then I started forgetting to update it. Then I had three different context files that contradicted each other.

Sound familiar?

After losing count of how many times I'd typed "We use React with TypeScript, JWT for auth, PostgreSQL for the database..." I finally snapped and built SuperLocalMemory V2.

This is the story of building my first real open-source project, the technical challenges I solved, and why you might want it too.

The Problem: AI Assistants Have Goldfish Memory

Why AI Forgets Everything

AI assistants like Claude, GPT-4, and Cursor's AI are stateless by design. Every conversation is isolated:

Session 1:
You: "I prefer React hooks over class components"
Claude: "Got it, I'll use hooks"

Session 2 (next day):
You: "Build me a component"
Claude: *writes a class component*
You: "I SAID HOOKS!"

This happens because:

No persistent memory - Conversations don't share context
Token limits - Context windows are finite (even with 200K tokens, you can't paste your entire codebase)
No learning - AI doesn't remember your preferences, decisions, or patterns

The Copy-Paste Workaround (That Doesn't Scale)

So developers create project-context.txt:

Project: MyApp
Stack: React, TypeScript, FastAPI, PostgreSQL
Auth: JWT tokens (24h expiration)
Style: Functional components, hooks, TypeScript strict mode
Deployment: Docker Compose
...

Then you copy-paste this into every Claude conversation.

Problems:

File grows to 10,000+ words (eats your token budget)
Multiple files (frontend-context.txt, backend-context.txt, database-context.txt)
Constantly outdated ("Wait, we switched from PostgreSQL to MongoDB last week")
Manual searching ("Where did I document that auth bug fix?")

Existing Solutions (And Their Problems)

Solution	Problem	Cost
Mem0	Cloud-based (privacy risk), usage-based pricing	Starts at $50/mo
Zep	Cloud-only, credit system	$50/mo
Supermemory	Token/query limits	$19-399/mo
Personal.AI	Closed ecosystem, no free tier	$33/mo
Manual notes	Doesn't scale, no search, no AI integration	Time

None of these worked for me because:

Cloud-based = privacy risk - I work with client code under NDAs
Subscription costs - I'm building open-source tools, not paying $600/year for memory
Limited integrations - Works with ChatGPT but not Cursor? Useless.
Vendor lock-in - What happens when the service shuts down?

I needed something that was:

100% local (my machine, my data)
100% free (no usage limits, no credit systems)
Universal (works with Claude, Cursor, Aider, any AI tool)
Smart (not just keyword search)

That thing didn't exist.

So I built it.

My Solution: SuperLocalMemory V2

TL;DR: Local-first AI memory system that works with 11+ IDEs, learns your patterns, auto-discovers relationships, and costs $0 forever.

GitHub: https://github.com/varun369/SuperLocalMemoryV2

What It Does

SuperLocalMemory sits between you and your AI assistant:

You → SuperLocalMemory → Claude/GPT/Cursor
     (remembers everything)

Save memories:

superlocalmemoryv2:remember "Fixed auth bug - JWT tokens were expiring in 1h, changed to 24h. File: src/auth/tokens.py"

Recall instantly:

superlocalmemoryv2:recall "auth bug"
# ✓ Found: "Fixed auth bug - JWT tokens expiring in 1h, changed to 24h"
#   Tags: authentication, bug-fix
#   Project: myapp
#   Cluster: "Authentication & Security" (related: session management, OAuth)

But it's not just a database. It's intelligent.

Deep-Dive: The 4-Layer Architecture

Most "AI memory" systems are just fancy keyword search over a database. SuperLocalMemory V2 implements four layers of intelligence, each adding context without replacing the others.

Layer 1: Raw Storage (SQLite + FTS5 + TF-IDF)

The foundation is blazing-fast local search:

# SQLite with Full-Text Search (FTS5)
CREATE VIRTUAL TABLE memories_fts USING fts5(content, tags);

# TF-IDF vector embeddings (no external APIs!)
def compute_tfidf(text):
    # Pure Python TF-IDF implementation
    # No OpenAI, no sentence-transformers, completely local
    ...

Why SQLite?

Ships with Python (zero dependencies)
ACID transactions (your data is safe)
Full-text search built-in (FTS5 is FAST)
Single file database (easy backups)

Search speed: 30-45ms for 500 memories. On my laptop. Locally.

Layer 2: Hierarchical Index (PageIndex Approach)

Inspired by Meta AI's PageIndex research, memories form a tree:

Project: MyApp
├── Authentication
│   ├── JWT implementation
│   ├── OAuth flow
│   └── Password reset bug fix
├── Database
│   ├── PostgreSQL → MongoDB migration
│   └── Index optimization
└── Frontend
    ├── React component patterns
    └── State management with Zustand

Why hierarchical?

Search finds not just the memory, but its context (parent/children)
Breadcrumbs: MyApp → Authentication → JWT implementation
O(log n) lookups instead of O(n) scans

Code example:

# Create parent-child relationships
store.add("Implemented JWT auth", parent_id=None)  # Root
store.add("JWT tokens expire in 24h", parent_id=1)  # Child

# Retrieve with context
memory = store.get(2)
print(memory['breadcrumbs'])
# → "MyApp → Authentication → JWT tokens expire in 24h"

Layer 3: Knowledge Graph (GraphRAG Implementation)

This is where it gets magical. The system auto-discovers relationships you didn't know existed.

How?

TF-IDF Entity Extraction - Finds important terms from your memories:

   # Memory: "Fixed JWT token expiration bug in authentication module"
   # Entities extracted: [JWT, token, authentication, expiration]

Leiden Clustering - Groups related memories automatically:

   from leidenalg import find_partition

   # Builds graph, runs community detection
   # Output: Clusters like "Authentication & Security", "Performance", "Frontend"

Auto-naming - Names clusters from top entities:

   # Cluster 1: [JWT, OAuth, session, token, auth]
   # Auto-name: "Authentication & Tokens"

Example output:

python ~/.claude-memory/graph_engine.py build

✓ Processed 47 memories
✓ Created 12 clusters:
  - "Authentication & Tokens" (8 memories)
    Entities: JWT, OAuth, session, authentication
  - "React Components" (11 memories)
    Entities: React, hooks, components, useState
  - "Database Optimization" (5 memories)
    Entities: PostgreSQL, index, query, performance

Why this matters:

When you search for "auth", you also get:

JWT token implementation
OAuth flow documentation
Session management decisions
Password reset bug fixes

Even if you never tagged them together. The graph discovered the relationships.

Layer 4: Pattern Learning (xMemory Approach)

Over time, SuperLocalMemory learns who you are as a developer:

python ~/.claude-memory/pattern_learner.py update

Your Coding Identity:
- Framework: React (73% confidence)
- Language: Python for APIs, TypeScript for frontend (65% confidence)
- Style: Performance over readability (58% confidence)
- Testing: Jest + React Testing Library (65% confidence)
- API design: REST over GraphQL (81% confidence)
- Security: JWT tokens, never store passwords in plain text

How?

Frequency analysis - "React" mentioned 23 times, "Vue" mentioned 2 times → React preference
Context extraction - "prefer functional components" → Style pattern
Confidence scoring - More mentions = higher confidence

Why this matters:

Your AI assistant can now match your preferences automatically:

You: "Build me an API endpoint"
Claude: *reads your identity patterns*
Claude: "Here's a FastAPI endpoint with JWT auth (I know you prefer FastAPI and JWT from your patterns)..."

No more "Actually, I use FastAPI, not Flask" corrections.

How It All Works Together

When you recall a memory, all 4 layers activate:

query = "authentication patterns"

# Layer 1: Fast keyword search (FTS5)
keyword_results = fts5_search(query)  # 30ms

# Layer 1b: Semantic search (TF-IDF vectors)
semantic_results = tfidf_search(query)  # 45ms

# Layer 3: Graph enhancement
graph_results = related_memories(semantic_results)  # 60ms

# Layer 4: Pattern context
patterns = get_identity_patterns()

# Combine results
final_results = merge([
    keyword_results,
    semantic_results,
    graph_results
]) + patterns

# Total time: ~80ms

You get:

Exact matches (Layer 1 keyword)
Conceptually similar memories (Layer 1 semantic)
Related memories from the graph (Layer 3)
Your coding preferences (Layer 4)
Hierarchical context (Layer 2 breadcrumbs)

All in 80 milliseconds. Locally.

Universal Integration: It Just Works Everywhere

Here's the problem with most AI memory tools: they only work with one or two apps.

SuperLocalMemory V2 uses three integration methods so it works everywhere:

Method 1: MCP (Model Context Protocol)

For modern IDEs like Cursor, Windsurf, and Claude Desktop:

// Auto-configured by install.sh
{
  "mcpServers": {
    "SuperLocalMemory": {
      "command": "python3",
      "args": ["~/.claude-memory/mcp_server.py"]
    }
  }
}

In Cursor:

You: "@SuperLocalMemory remember that we use FastAPI with async endpoints"
You: "Build me an API endpoint"
Cursor AI: *automatically retrieves your FastAPI patterns and preferences*

No manual commands. The AI just knows.

Method 2: Skills (Slash Commands)

For Claude Code, Continue.dev, and Cody:

/slm-remember "React hooks for state management" --tags frontend
/slm-recall "state management"
/slm-status

Six universal skills that work across multiple AI assistants.

Method 3: CLI (Universal)

For any terminal, any script, any tool:

# Simple, clean syntax
slm remember "Deploy with Docker Compose"
slm recall "deployment"
slm status

# Use in scripts
#!/bin/bash
slm remember "Build started at $(date)"
npm run build
if [ $? -eq 0 ]; then
  slm remember "Build succeeded at $(date)"
fi

The key insight: All three methods write to the same local SQLite database.

No data duplication. No conflicts. One source of truth.

Real-World Usage: Before vs After

Before SuperLocalMemory

Monday morning:

You: "Claude, implement OAuth login"
Claude: "What framework are you using?"
You: "FastAPI. With JWT. PostgreSQL. We went over this last week."

Wednesday:

You: "Why is auth broken?"
Claude: "Let me analyze..."
You: "We fixed this bug on Monday! JWT expiration!"
Claude: "I don't have access to previous conversations"
You: *searches through 15 chat logs manually*

Friday:

You: "Build a new API endpoint"
Claude: "Here's a Flask example"
You: "WE USE FASTAPI!" (3rd time this week)

Time wasted: ~3 hours/week re-explaining context.

After SuperLocalMemory

Monday morning:

slm remember "Implemented OAuth with FastAPI + JWT, tokens expire in 24h, refresh tokens in DB"

Wednesday:

You: "Why is auth broken?"
You: "/slm-recall auth bug"

✓ Found: "JWT tokens expiring too fast - increased to 24h"
✓ Cluster: "Authentication & Tokens"
✓ Related: OAuth implementation, token refresh flow

You: "Check if token expiration is 24h"
Claude: *already has context from memory*

Friday:

You: "Build a new API endpoint"
Claude: *reads your patterns: FastAPI (81% confidence), JWT auth (73% confidence)*
Claude: "Here's a FastAPI endpoint with JWT authentication..."
You: ✓ "Perfect."

Time saved: ~2.5 hours/week. ROI: Install time (5 min) paid back in first week.

Technical Challenges (And How I Solved Them)

Building this wasn't trivial. Here are the hard problems:

Challenge 1: Backward Compatibility

Problem: Users upgrading from v2.0.0 to v2.1.0 shouldn't lose data or experience breaking changes.

Solution: Database migrations with ALTER TABLE checks:

# Add new v2.1.0 columns to existing tables
v2_columns = [
    ('cluster_id', 'INTEGER'),
    ('entity_vector', 'TEXT'),
    ('importance', 'INTEGER DEFAULT 5'),
]

for col_name, col_type in v2_columns:
    try:
        cursor.execute(f'ALTER TABLE memories ADD COLUMN {col_name} {col_type}')
    except sqlite3.OperationalError:
        pass  # Column already exists (v2.1.0 database)

Result: 100% backward compatible. Zero breaking changes. Users just run ./install.sh.

Challenge 2: Graph Clustering Performance

Problem: Leiden clustering is O(n²) worst case. With 1,000+ memories, it takes 60+ seconds.

Solution: Progressive profiling + clear documentation:

# For >1000 memories, recommend profile splitting
if memory_count > 1000:
    print("TIP: Consider splitting into profiles for better performance")
    print("Example: slm switch-profile work")

Alternative solution (v2.2.0): Incremental graph updates (still in development).

Challenge 3: Zero External Dependencies

Problem: Most semantic search systems require sentence-transformers (downloads 500MB model). I wanted zero mandatory dependencies.

Solution: Pure Python TF-IDF fallback:

# Try advanced method first
try:
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('all-MiniLM-L6-v2')
    embeddings = model.encode(texts)
except ImportError:
    # Fall back to pure Python TF-IDF
    embeddings = compute_tfidf_vectors(texts)

Result: System works out-of-the-box with zero pip installs. Optional dependencies improve performance but aren't required.

Challenge 4: Cross-Platform Support

Problem: Users on Mac, Linux, and Windows expect it to "just work."

Solution:

install.sh for Mac/Linux (bash)
install.ps1 for Windows (PowerShell)
Path detection for 11+ IDEs
Shell detection (bash vs zsh)

# Auto-detect shell and configure PATH
if [ -f ~/.bashrc ]; then
    echo 'export PATH="$HOME/.claude-memory/bin:$PATH"' >> ~/.bashrc
elif [ -f ~/.zshrc ]; then
    echo 'export PATH="$HOME/.claude-memory/bin:$PATH"' >> ~/.zshrc
fi

Result: One command, any platform.

Challenge 5: MCP Server Integration

Problem: Different IDEs implement MCP differently (Cursor vs Windsurf vs Claude Desktop).

Solution: Universal MCP server with auto-configuration:

# mcp_server.py implements MCP spec
# 6 tools, 4 resources, 2 prompts
@server.tool()
async def remember(content: str, tags: str = "", project: str = ""):
    """Save memory with context"""
    store.add(content, tags=tags, project=project)
    return {"status": "success"}

@server.resource("memory://recent")
async def list_recent():
    """List recent memories"""
    return store.list_all(limit=10)

Installation detects IDEs and configures automatically:

./install.sh

✓ Detected: Claude Desktop
✓ Configured: ~/.config/claude/claude_desktop_config.json
✓ Detected: Cursor
✓ Configured: ~/.cursor/mcp.json
✓ Detected: Windsurf
✓ Configured: ~/.windsurf/mcp.json

Result: Zero manual configuration. It just works.

What's Next: Roadmap

Current version: v2.1.0-universal

Planned features (v2.2.0):

Incremental graph updates (no full rebuild)
Auto-compression based on access patterns
Web UI for graph visualization
Real-time pattern updates
Multi-language entity extraction

Long-term (v3.0.0):

npm distribution: npm install -g superlocalmemory
Same features as V2, easier installation
Windows installer (.exe)

See full roadmap: https://github.com/varun369/SuperLocalMemoryV2/wiki/Roadmap

Try It Yourself

Installation (5 minutes)

# Clone the repo
git clone https://github.com/varun369/SuperLocalMemoryV2.git
cd SuperLocalMemoryV2

# Run installer (Mac/Linux)
./install.sh

# Or Windows (PowerShell)
.\install.ps1

First Memory

# Save your first memory
slm remember "I prefer React with TypeScript for frontend projects" --tags preferences,frontend

# Build the knowledge graph
slm build-graph

# Check system status
slm status

# Search for it
slm recall "react"

Usage in Claude Code

/slm-remember "FastAPI for APIs, PostgreSQL for database" --tags stack
/slm-recall "database"
/slm-status

Usage in Cursor (MCP)

You: "Remember that we use Docker Compose for deployment"
Cursor AI: *automatically saves to SuperLocalMemory*

You: "How do we deploy this?"
Cursor AI: *retrieves from SuperLocalMemory* "You deploy using Docker Compose..."

Performance Benchmarks

Tested on MacBook Pro M1, 16GB RAM:

Operation	Time	Dataset Size
Add memory	<10ms	N/A
Search (hybrid)	80ms	500 memories
Graph build	2s	100 memories
Graph build	15s	500 memories
Pattern learning	<2s	100 memories

Storage efficiency:

Tier 1 (active): Full content
Tier 2 (warm, 30-90 days): 60% compression
Tier 3 (cold, 90+ days): 96% compression

Example: 1,000 memories = ~15MB (vs 380MB uncompressed).

Comparison with Alternatives

vs Mem0

Feature	Mem0	SuperLocalMemory V2
Hosting	Cloud (privacy risk)	100% local
Price	Usage-based (~$50/mo)	$0 forever
Setup	API keys, cloud account	5-min install
IDE support	Limited	11+ IDEs
Pattern learning	❌	✅ Full
Knowledge graphs	✅ Cloud-based	✅ Local
Data ownership	Vendor	You

vs Zep

Feature	Zep	SuperLocalMemory V2
Hosting	Cloud-only	100% local
Price	$50/mo	$0 forever
Credit system	Yes (limits)	Unlimited
Universal CLI	❌	✅
Multi-profile	❌	✅
Open source	Partial	MIT License

vs Personal.AI

Feature	Personal.AI	SuperLocalMemory V2
Free tier	❌ None	✅ Unlimited
Price	$33/mo	$0 forever
Local-first	❌	✅
IDE integration	❌	11+ IDEs
Developer-focused	❌	✅

Conclusion: SuperLocalMemory V2 is the only solution that's:

100% local (privacy-first)
100% free (no limits)
Universal (works everywhere)

Common Questions

"Why not just use text files?"

Text files don't:

Auto-discover relationships
Learn your patterns
Provide instant search
Integrate with AI assistants
Scale beyond 100 notes

"Why not use Notion/Obsidian?"

Notion/Obsidian are great for you to read notes. SuperLocalMemory is for AI assistants to retrieve context:

APIs for programmatic access
TF-IDF semantic search
Knowledge graph integration
MCP protocol support
Pattern learning

Different tools, different purposes.

"Is this secure?"

Yes:

100% local (data never leaves your machine)
No telemetry, no tracking, no external API calls
Standard filesystem permissions
SQLite ACID transactions
Open-source (audit the code yourself)

GDPR/HIPAA compliant by default (data is yours).

"Does it work with ChatGPT?"

Yes! ChatGPT Desktop supports MCP. See setup guide: https://github.com/varun369/SuperLocalMemoryV2/blob/main/docs/MCP-MANUAL-SETUP.md

Also works with: Claude, Cursor, Windsurf, Continue.dev, Cody, Aider, Perplexity, Zed, OpenCode, Antigravity.

"How is this different from RAG?"

SuperLocalMemory uses RAG (Retrieval-Augmented Generation) but adds:

Knowledge graphs (relationships)
Pattern learning (identity)
Hierarchical indexing (context)
Multi-method search (semantic + keyword + graph)

RAG is one layer. SuperLocalMemory is the full stack.

Call-to-Action

If you're tired of re-explaining your project to AI assistants every single day...

If you've spent hours managing context files that never stay updated...

If you want an AI assistant that actually remembers you...

Try SuperLocalMemory V2:

⭐ Star on GitHub: https://github.com/varun369/SuperLocalMemoryV2
📖 Read the docs: https://github.com/varun369/SuperLocalMemoryV2/wiki
🚀 Install in 5 minutes: https://github.com/varun369/SuperLocalMemoryV2/wiki/Installation
💬 Ask questions: https://github.com/varun369/SuperLocalMemoryV2/issues
☕ Buy me a coffee: https://buymeacoffee.com/varunpratah

100% local. 100% free. 100% yours.