Three weeks ago, I wrote about knowledge collapse - how our best technical insights are dying in private AI chats while Stack Overflow bleeds 78% of its traffic.
Hundreds of developers agreed we need a solution.
So I built one. And it's running in production right now.
Here's the live demo: https://chat-knowledge-api.fpl-test.workers.dev
Here's the source code: https://github.com/dannwaneri/chat-knowledge
Here's the complete build session you're reading from: 107 chunks, 174 messages, imported via HTML
Let me show you how it works.
The Problem (Quick Recap)
Knowledge collapse is happening right now:
- Your best debugging solutions live in private Claude chats
- No attribution, no discovery, no commons
- Stack Overflow traffic down 78% since ChatGPT launched
- We're optimizing ourselves into a knowledge dead-end
We need "Stack Overflow for AI conversations" - but decentralized, privacy-first, and developer-owned.
What I Actually Built
1. HTML Import System
The workflow is dead simple:
- Have a valuable Claude conversation
- Press
Ctrl+S(save as HTML) - Import it:
node dist/cli/import-html.js chat.html - Done - it's searchable
Tested on this article's build session:
- File size: 4.6MB
- Messages: 136 parsed
- Chunks created: 91
- Time: < 2 seconds
No complex setup. No API keys. Just save and import.
2. Security Scanner (The Critical Feature)
Before any chat goes public, it runs through auto-detection:
🔴 CRITICAL (auto-block):
- API keys, Bearer tokens
- Private URLs (localhost, .internal domains)
- Credentials, passwords
Real results from my build session scan:
Total issues detected: 599
Critical blocks: 3 (2 Bearer tokens, 1 API key)
High severity: Multiple localhost URLs
Safe to share: FALSE ✅ (exactly as designed)
This is the difference between "share everything" and "share safely." One leaked API key costs more than this entire system.
By The Numbers
This Build Session:
- Start: Problem identified (Jan 15)
- Build: 107 conversation chunks
- Messages: 174 total
- File size: 4.6MB HTML
- Parse time: < 2 seconds
- Security scan: 599 issues detected (3 critical auto-blocks)
- Ship date: Feb 8 (24 days from problem to production)
That's faster than most companies decide what to build.
3. Semantic Search
Not keyword matching - actual understanding.
Query: "how to handle vectorize embeddings"
Found: Content about "dimension reduction" and "optimization"
Relevance score: 0.78
The system understood WHAT I meant, not just what I typed.
Tech stack:
- Workers AI (
@cf/baai/bge-base-en-v1.5) - generates embeddings - Vectorize - stores 768-dimension vectors
- D1 - metadata and chat structure
- Cosine similarity search across all imported conversations
4. Federation Protocol (ActivityPub)
This isn't just personal knowledge management. It's designed to federate.
Live endpoints:
- NodeInfo:
/api/federation/nodeinfo(200 OK ✅) - WebFinger:
/api/federation/.well-known/webfinger - Inbox/Outbox: ActivityPub standard
What federation means:
- You run your instance
- I run my instance
- We search across ALL of them
- No single point of control
- No corporate overlord
Exactly like Mastodon, but for developer knowledge.
Technical Architecture (How It Actually Works)
The Stack
Frontend: HTML → Parser → Chunks
Backend: Cloudflare Workers (edge-native)
Database: D1 (SQLite at the edge)
Vector Store: Vectorize (768-dim embeddings)
AI: Workers AI (BGE-base-en-v1.5)
Protocol: ActivityPub (federation standard)
The Flow
1. IMPORT
HTML file → Parse messages → Chunk content → Generate embeddings
2. SECURITY
Scan for secrets → Flag risks → Require review → Safe by default
3. STORAGE
Chunks → D1 (metadata)
Embeddings → Vectorize (semantic search)
4. SEARCH
Query → Generate embedding → Cosine similarity → Ranked results
5. FEDERATION (coming)
Public chats → ActivityPub → Federated timeline → Cross-instance search
What Was Hard
1. HTML Parsing
Claude's HTML export format isn't documented. Had to reverse-engineer:
- Message boundaries
- Code block preservation
- Artifact handling
- Nested content structure
2. Security Scanner
Can't just regex for "API key" - need to understand context:
- Is this a code example or real credential?
- Is localhost URL in docs or actual endpoint?
- Balance: too strict = false positives, too loose = leaks
3. Federation Protocol
ActivityPub is designed for social posts, not Q&A:
- How to represent "question" vs "answer"?
- Vote federation across instances?
- Spam prevention without centralized moderation?
4. Edge-Native Architecture
Cloudflare Workers have constraints:
- 10ms CPU limit per request
- No filesystem
- Async-only database access
Working within constraints = better architecture.
The Security Scanner in Action
// Real security detection from the codebase
const patterns = {
bearerToken: /Bearer\s+[A-Za-z0-9\-._~+/]+=*/gi,
apiKey: /['\"]?api[_-]?key['\"]?\s*[:=]\s*['\"]?[A-Za-z0-9-_]{20,}['\"]?/gi,
localhost: /https?:\/\/(localhost|127\.0\.0\.1|::1)/gi,
internalDomain: /https?:\/\/[a-z0-9.-]+\.(local|internal|corp|dev)/gi
}
// Scan returns: { safe: boolean, issues: Issue[] }
// Auto-blocks if critical issues found
Database Schema (12 tables)
chats -- Core chat storage
chunks -- Content chunks for search
pre_share_scans -- Security scanner results
chunk_redactions -- Auto-redaction tracking
share_approvals -- Sharing workflow
federated_instances -- Federation network
federated_knowledge -- Cross-instance content
federation_activities -- ActivityPub events
knowledge_analytics -- Usage tracking
collections -- Knowledge curation
collection_items -- Collection membership
This is production-grade infrastructure, not a proof of concept.
Why I Built This Now
Timing matters.
Stack Overflow traffic is down 78% and still falling. Every day, thousands of valuable debugging sessions happen in private AI chats and disappear forever.
We're not just losing knowledge - we're losing the HABIT of knowledge sharing.
Building this now means:
- Early adopters shape the protocol
- Federation standards emerge organically
- We avoid corporate capture (no VC, no "pivot to paid")
- Developers own the infrastructure from day one
The best time to rebuild the knowledge commons was before Stack Overflow collapsed.
The second-best time is now.
Why This Matters
It Solves Knowledge Collapse
- ✅ Insights stay discoverable - Semantic search finds relevant content
- ✅ Attribution preserved - Source tracking built-in
- ✅ Privacy respected - Security scanner catches leaks
- ✅ No platform risk - Self-hosted, you control your data
It's Actually Decentralized
- ActivityPub = proven federation protocol (powers Mastodon's 10M+ users)
- Developer-owned instances - Run your own, connect with others
- No "rug pull" risk - Open source, MIT licensed
It's Viable at Edge Scale
Cloudflare Workers handles massive scale:
- Edge-native architecture
- D1 database at the edge
- Vectorize for semantic search
- Workers AI for embeddings
The same tech stack I use for production apps serving thousands of users.
From Discussion to Infrastructure
Three weeks ago, Richard Pascoe asked in the comments: "Could Mastodon servers like Fosstodon help foster a knowledge sharing platform?"
I said yes and built it.
This isn't theoretical infrastructure. It's ActivityPub-compatible, meaning it federates with Mastodon, Fosstodon, and the entire Fediverse network.
Richard's question became the bridge between diagnosis and solution.
@richardpascoe - your instance is ready when you are. 🚀
Real Use Cases (What This Enables)
For Individual Developers
- Portfolio of problem-solving - Your best debugging sessions, searchable
- Learning in public - Share solutions, get feedback, build reputation
- Future reference - "I solved this before, where was that chat?"
For Teams
- Institutional knowledge - Team's collective debugging history
- Onboarding - New devs search team's past solutions
- Pattern recognition - See recurring problems across conversations
For Communities
- Niche expertise - Rust specialists, Cloudflare devs, etc. share domain knowledge
- Federated discovery - Find experts across instances
- Attribution - Credit flows to who actually solved it
For The Commons
- Stack Overflow alternative - But decentralized and community-owned
- AI training data - High-quality, attributed conversations
- Knowledge archaeology - Insights don't die with platforms
What's Next
Immediate (This Week)
- ✅ Open source on GitHub (MIT license)
- ✅ Documentation for self-hosting
- ✅ Production deployment live
Short-term (Next Month)
- Import 50+ historical Claude chats (build the corpus)
- MCP extension ("share this chat publicly" from Claude Code)
- First federation test with another developer
Long-term (3-6 Months)
- 10+ federated instances
- Collections feature (curate knowledge by topic)
- Analytics (which insights are most valuable)
- Cross-instance search
This Is The Foundation in Action
Two weeks ago, we created @the-foundation to preserve developer knowledge publicly.
Richard Pascoe published our first collaborative post on fundamentals.
This is our second: working infrastructure.
The Foundation isn't just writing about the problem. We're shipping solutions.
Join The Foundation
This isn't a solo project. It's infrastructure.
For Developers
- Clone the repo: https://github.com/dannwaneri/chat-knowledge
- Run your own instance - Full setup guide in README
- Contribute to the protocol - Issues and PRs welcome
For Writers
- Import your best AI conversations - Build your knowledge portfolio
- Share safely - Security scanner protects you
- Get discovered - Federated search makes your insights findable
For The Curious
- Star the repo ⭐ - Show you care about preserving knowledge
- Share this article - Help spread the word
- Join the discussion - Comment below with your thoughts
The knowledge commons doesn't rebuild itself. But we can build it together.
Installation (5 Minutes)
# Clone the repo
git clone https://github.com/dannwaneri/chat-knowledge.git
cd chat-knowledge
# Install dependencies
npm install
# Login to Cloudflare (free tier works)
wrangler login
# Create infrastructure
wrangler d1 create chat-knowledge-db
wrangler vectorize create chat-knowledge-embeddings --dimensions=768 --metric=cosine
# Run migrations
wrangler d1 execute chat-knowledge-db --remote --file=migrations/migration-federation.sql
wrangler d1 execute chat-knowledge-db --remote --file=migrations/migration-sanitizer.sql
# Deploy
npm run deploy
That's it. You now have your own federated knowledge instance.
Try It Right Now
Import a chat:
# Save any Claude conversation as HTML (Ctrl+S)
npm run build
node dist/cli/import-html.js path/to/chat.html "My First Import"
Search it:
curl -X POST https://your-worker.workers.dev/search \
-H "Content-Type: application/json" \
-d '{"query": "debugging tips", "maxResults": 5}'
Scan for secrets:
node dist/cli/safe-share.js <chat-id>
# Shows what would leak before you share
The Meta Moment
I wrote about the problem three weeks ago.
Now the solution is running in production.
From observation to shipped product in 21 days.
That's the power of:
- Cloudflare Workers (deploy in seconds)
- AI embeddings (semantic search out of the box)
- ActivityPub (proven federation protocol)
- Building in public (accountability + feedback)
Your move, Stack Overflow. 👊
Related Articles
- My Chrome Tabs Tell a Story - The observation that started it all
- We're Creating a Knowledge Collapse - The problem statement (12K+ views)
- Above the API: What Developers Contribute When AI Can Code - What skills actually matter
- You're here - The solution
Let's Build This Together
GitHub: https://github.com/dannwaneri/chat-knowledge
Live Demo: https://chat-knowledge-api.fpl-test.workers.dev
Twitter: @dannwaneri
If you believe in preserving developer knowledge, star the repo ⭐ and let's make this real.
The foundation is laid. Now we need builders.
Are you in?
Top comments (26)
Brilliant! As @richardpascoe said: Quite simply mind blown! This is the kind of article I love to read and share: a problem and its solution.
appreciate that pascal.
"problem and its solution" is exactly the structure i was going for. too many articles identify problems without shipping anything.
if you share it, curious what audience youre thinking - linkedin? twitter? developer communities?
always learning what resonates
You're welcome Daniel! I share it with some developpers I use to work with, for example… I've also took it in my own wallabag list, to get it at hand when needed.
“Absolutely! Articles like this are gold—clear problem, actionable solution, and real insights. Definitely worth sharing with anyone who wants to learn something meaningful.”
Done & dusted in 3 weeks - that's quicker than it takes most companies to come up with a rough outline of a plan, haha - of course you did benefit from your prior experience with the techniques that you mentioned (as per your previous dev.to articles), but still - epic!
Eager to start looking at it and playing with it .......
😂 exactly most companies would still be in planning meetings.
advantage of building in public. no committees, no bureaucracy, just ship and iterate.
"eager to start looking at it". let me know if you hit any setup issues. trying to get first 5-10 instances deployed this week to test federation.
would love your feedback on the protocol
Yeah having the vision and the skills (and knowing what you want and need) beats any corporate design/planning committee, lol ... brilliant work, excited to check it out!
This is honestly one of the more concrete responses I’ve seen to the “knowledge collapse” problem most discussions stop at diagnosing it, not shipping infrastructure.
The security scanner is the part that really stands out to me. That’s the missing piece in almost every “share your AI chats” idea: people want to share insights, but one leaked token or internal URL is enough to shut the whole thing down. Treating safety as a first-class concern instead of an afterthought makes this feel production-minded rather than experimental.
I also like that you’re leaning on ActivityPub instead of inventing a new federation protocol. Reusing something battle-tested (even if it’s awkward for Q&A semantics) feels like the right tradeoff if the goal is adoption rather than purity.
One question I’m curious about as this scales:
how do you see moderation and trust evolving across federated instances, especially once cross-instance search is live? Is it more “local rules, local reputation,” or do you imagine shared signals emerging over time?
Either way, shipping this end-to-end in ~3 weeks — parsing, embeddings, security, federation is impressive. This feels like real infrastructure, not a demo. I’ll be digging into the repo.
this is THE question. youre right to ask it early.
my thinking. layered moderation
local instance rules: each sets its own policy (strict vs permissive)
shared blocklists: subscribe to curated spam/bad actor lists (activitypub pattern from mastodon).
web of trust: reputation doesnt federate centrally. instead, your instance tracks which OTHER instances you trust. transitive trust emerges (like pgp key signing).
the hard part: cross-instance search
whose rules apply when searching federated content?
current plan:
this means:
does that match your mental model or am i missing something?
"digging into the repo" . would love feedback on protocol design, especially moderation hooks.
Really cool project! 🙌 I love how this tackles decentralized knowledge sharing and puts control back in users’ hands. Looking forward to seeing how the federated AI knowledge commons grows and gets adopted — especially for collaborative learning and research!
This is the logical next step in the conversation about Digital Sovereignty.
I recently wrote about the importance of 'Owning Your Keys' (CMKs) to secure data at rest against vendor lock-in. Seeing this concept applied to the knowledge layer via Federated AI is fascinating.
Centralization offers convenience, but decentralized 'Knowledge Commons' offer true ownership. Excellent work on this architecture.
appreciate you connecting this to digital sovereignty.that's exactly the conversation this needs.
centralization = convenience at the cost of ownership.
federation = ownership at the cost of bootstrapping.
but the cost equation is changing:
the next wave isn't "build vs buy" - it's "own vs rent"
your CMK work on data at rest + this on knowledge at rest = complete digital sovereignty stack
what does the infrastructure layer look like when enterprises want both?
appreciate that richard. your question about fosstodon was the spark.
"living breathing resource" is exactly right. this only works if its adopted by people who care about preserving knowledge publicly.
next step. lets get your instance running.
ive got the setup down to 5 minutes. if youre interested, we can test federation between our instances - prove the protocol works cross-server.
would love your feedback on the activitypub implementation. youve got way more fediverse experience than i do.
lets make this real.
The security-first approach and federation angle make this feel thoughtful, not rushed. Turning private AI chats into safe, searchable, shared knowledge is something a lot of us have felt missing.
Big respect for building this in public and getting it into production so quickly.
appreciate that.the security scanner was non-negotiable.
one leaked api key costs more than this entire system. had to be safe by default.
"something a lot of us have felt missing" -exactly. we all knew this was a problem, someone just had to build it.
I am curious.what would you want to see next? collections? mcp extension? better federation protocol?
this is going to be great. first federation test with someone who actually understands activitypub protocol.
ive documented the deployment but let me know if anything is unclear. we can do a live test once yours is running.
also if you find protocol issues, raise them early. better to fix before 10+ instances deploy.
excited to see fosstodon community connected to this
That sounds really interesting! Can you share a bit about the approach you used to tackle knowledge collapse and what kind of results you saw in those 21 days?
happy to break it down.
problem: stack overflow down 78%. best debugging sessions trapped in private AI chats.
approach:
results in 21 days:
key: make sharing safe + easy while respecting privacy
what interests you most about this?
Hi, I'm Justin, a full-stack engineer with a strong interest in backend systems, async programming, and AI.
I'm currently working a lot with APIs, distributed systems, and LLM-based projects.
Looking forward to learning from you all and contributing where I can.😉
Really interesting perspective. The idea of tackling knowledge collapse in such a short timeframe is impressive. Looking forward to diving deeper into this.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.