DEV Community

Cover image for I Shipped the Solution to Knowledge Collapse in 21 Days
Daniel Nwaneri for The Foundation

Posted on

I Shipped the Solution to Knowledge Collapse in 21 Days

Three weeks ago, I wrote about knowledge collapse - how our best technical insights are dying in private AI chats while Stack Overflow bleeds 78% of its traffic.

Hundreds of developers agreed we need a solution.

So I built one. And it's running in production right now.

Here's the live demo: https://chat-knowledge-api.fpl-test.workers.dev

Here's the source code: https://github.com/dannwaneri/chat-knowledge

Here's the complete build session you're reading from: 107 chunks, 174 messages, imported via HTML

Let me show you how it works.


The Problem (Quick Recap)

Knowledge collapse is happening right now:

  • Your best debugging solutions live in private Claude chats
  • No attribution, no discovery, no commons
  • Stack Overflow traffic down 78% since ChatGPT launched
  • We're optimizing ourselves into a knowledge dead-end

We need "Stack Overflow for AI conversations" - but decentralized, privacy-first, and developer-owned.


What I Actually Built

1. HTML Import System

The workflow is dead simple:

  1. Have a valuable Claude conversation
  2. Press Ctrl+S (save as HTML)
  3. Import it: node dist/cli/import-html.js chat.html
  4. Done - it's searchable

Tested on this article's build session:

  • File size: 4.6MB
  • Messages: 136 parsed
  • Chunks created: 91
  • Time: < 2 seconds

No complex setup. No API keys. Just save and import.

2. Security Scanner (The Critical Feature)

Before any chat goes public, it runs through auto-detection:

🔴 CRITICAL (auto-block):

  • API keys, Bearer tokens
  • Private URLs (localhost, .internal domains)
  • Credentials, passwords

Real results from my build session scan:

Total issues detected: 599
Critical blocks: 3 (2 Bearer tokens, 1 API key)
High severity: Multiple localhost URLs
Safe to share: FALSE ✅ (exactly as designed)
Enter fullscreen mode Exit fullscreen mode

This is the difference between "share everything" and "share safely." One leaked API key costs more than this entire system.

By The Numbers

This Build Session:

  • Start: Problem identified (Jan 15)
  • Build: 107 conversation chunks
  • Messages: 174 total
  • File size: 4.6MB HTML
  • Parse time: < 2 seconds
  • Security scan: 599 issues detected (3 critical auto-blocks)
  • Ship date: Feb 8 (24 days from problem to production)

That's faster than most companies decide what to build.

3. Semantic Search

Not keyword matching - actual understanding.

Query: "how to handle vectorize embeddings"

Found: Content about "dimension reduction" and "optimization"

Relevance score: 0.78

The system understood WHAT I meant, not just what I typed.

Tech stack:

  • Workers AI (@cf/baai/bge-base-en-v1.5) - generates embeddings
  • Vectorize - stores 768-dimension vectors
  • D1 - metadata and chat structure
  • Cosine similarity search across all imported conversations

4. Federation Protocol (ActivityPub)

This isn't just personal knowledge management. It's designed to federate.

Live endpoints:

  • NodeInfo: /api/federation/nodeinfo (200 OK ✅)
  • WebFinger: /api/federation/.well-known/webfinger
  • Inbox/Outbox: ActivityPub standard

What federation means:

  • You run your instance
  • I run my instance
  • We search across ALL of them
  • No single point of control
  • No corporate overlord

Exactly like Mastodon, but for developer knowledge.


Technical Architecture (How It Actually Works)

The Stack

Frontend: HTML → Parser → Chunks
Backend: Cloudflare Workers (edge-native)
Database: D1 (SQLite at the edge)
Vector Store: Vectorize (768-dim embeddings)
AI: Workers AI (BGE-base-en-v1.5)
Protocol: ActivityPub (federation standard)
Enter fullscreen mode Exit fullscreen mode

The Flow

1. IMPORT
   HTML file → Parse messages → Chunk content → Generate embeddings

2. SECURITY
   Scan for secrets → Flag risks → Require review → Safe by default

3. STORAGE
   Chunks → D1 (metadata)
   Embeddings → Vectorize (semantic search)

4. SEARCH
   Query → Generate embedding → Cosine similarity → Ranked results

5. FEDERATION (coming)
   Public chats → ActivityPub → Federated timeline → Cross-instance search
Enter fullscreen mode Exit fullscreen mode

What Was Hard

1. HTML Parsing
Claude's HTML export format isn't documented. Had to reverse-engineer:

  • Message boundaries
  • Code block preservation
  • Artifact handling
  • Nested content structure

2. Security Scanner
Can't just regex for "API key" - need to understand context:

  • Is this a code example or real credential?
  • Is localhost URL in docs or actual endpoint?
  • Balance: too strict = false positives, too loose = leaks

3. Federation Protocol
ActivityPub is designed for social posts, not Q&A:

  • How to represent "question" vs "answer"?
  • Vote federation across instances?
  • Spam prevention without centralized moderation?

4. Edge-Native Architecture
Cloudflare Workers have constraints:

  • 10ms CPU limit per request
  • No filesystem
  • Async-only database access

Working within constraints = better architecture.

The Security Scanner in Action

// Real security detection from the codebase
const patterns = {
  bearerToken: /Bearer\s+[A-Za-z0-9\-._~+/]+=*/gi,
  apiKey: /['\"]?api[_-]?key['\"]?\s*[:=]\s*['\"]?[A-Za-z0-9-_]{20,}['\"]?/gi,
  localhost: /https?:\/\/(localhost|127\.0\.0\.1|::1)/gi,
  internalDomain: /https?:\/\/[a-z0-9.-]+\.(local|internal|corp|dev)/gi
}

// Scan returns: { safe: boolean, issues: Issue[] }
// Auto-blocks if critical issues found
Enter fullscreen mode Exit fullscreen mode

Database Schema (12 tables)

chats                  -- Core chat storage
chunks                 -- Content chunks for search
pre_share_scans        -- Security scanner results
chunk_redactions       -- Auto-redaction tracking
share_approvals        -- Sharing workflow
federated_instances    -- Federation network
federated_knowledge    -- Cross-instance content
federation_activities  -- ActivityPub events
knowledge_analytics    -- Usage tracking
collections            -- Knowledge curation
collection_items       -- Collection membership
Enter fullscreen mode Exit fullscreen mode

This is production-grade infrastructure, not a proof of concept.


Why I Built This Now

Timing matters.

Stack Overflow traffic is down 78% and still falling. Every day, thousands of valuable debugging sessions happen in private AI chats and disappear forever.

We're not just losing knowledge - we're losing the HABIT of knowledge sharing.

Building this now means:

  • Early adopters shape the protocol
  • Federation standards emerge organically
  • We avoid corporate capture (no VC, no "pivot to paid")
  • Developers own the infrastructure from day one

The best time to rebuild the knowledge commons was before Stack Overflow collapsed.

The second-best time is now.


Why This Matters

It Solves Knowledge Collapse

  • Insights stay discoverable - Semantic search finds relevant content
  • Attribution preserved - Source tracking built-in
  • Privacy respected - Security scanner catches leaks
  • No platform risk - Self-hosted, you control your data

It's Actually Decentralized

  • ActivityPub = proven federation protocol (powers Mastodon's 10M+ users)
  • Developer-owned instances - Run your own, connect with others
  • No "rug pull" risk - Open source, MIT licensed

It's Viable at Edge Scale

Cloudflare Workers handles massive scale:

  • Edge-native architecture
  • D1 database at the edge
  • Vectorize for semantic search
  • Workers AI for embeddings

The same tech stack I use for production apps serving thousands of users.


From Discussion to Infrastructure

Three weeks ago, Richard Pascoe asked in the comments: "Could Mastodon servers like Fosstodon help foster a knowledge sharing platform?"

I said yes and built it.

This isn't theoretical infrastructure. It's ActivityPub-compatible, meaning it federates with Mastodon, Fosstodon, and the entire Fediverse network.

Richard's question became the bridge between diagnosis and solution.

@richardpascoe - your instance is ready when you are. 🚀


Real Use Cases (What This Enables)

For Individual Developers

  • Portfolio of problem-solving - Your best debugging sessions, searchable
  • Learning in public - Share solutions, get feedback, build reputation
  • Future reference - "I solved this before, where was that chat?"

For Teams

  • Institutional knowledge - Team's collective debugging history
  • Onboarding - New devs search team's past solutions
  • Pattern recognition - See recurring problems across conversations

For Communities

  • Niche expertise - Rust specialists, Cloudflare devs, etc. share domain knowledge
  • Federated discovery - Find experts across instances
  • Attribution - Credit flows to who actually solved it

For The Commons

  • Stack Overflow alternative - But decentralized and community-owned
  • AI training data - High-quality, attributed conversations
  • Knowledge archaeology - Insights don't die with platforms

What's Next

Immediate (This Week)

  • ✅ Open source on GitHub (MIT license)
  • ✅ Documentation for self-hosting
  • ✅ Production deployment live

Short-term (Next Month)

  • Import 50+ historical Claude chats (build the corpus)
  • MCP extension ("share this chat publicly" from Claude Code)
  • First federation test with another developer

Long-term (3-6 Months)

  • 10+ federated instances
  • Collections feature (curate knowledge by topic)
  • Analytics (which insights are most valuable)
  • Cross-instance search

This Is The Foundation in Action

Two weeks ago, we created @the-foundation to preserve developer knowledge publicly.

Richard Pascoe published our first collaborative post on fundamentals.

This is our second: working infrastructure.

The Foundation isn't just writing about the problem. We're shipping solutions.


Join The Foundation

This isn't a solo project. It's infrastructure.

For Developers

For Writers

  • Import your best AI conversations - Build your knowledge portfolio
  • Share safely - Security scanner protects you
  • Get discovered - Federated search makes your insights findable

For The Curious

  • Star the repo ⭐ - Show you care about preserving knowledge
  • Share this article - Help spread the word
  • Join the discussion - Comment below with your thoughts

The knowledge commons doesn't rebuild itself. But we can build it together.


Installation (5 Minutes)

# Clone the repo
git clone https://github.com/dannwaneri/chat-knowledge.git
cd chat-knowledge

# Install dependencies
npm install

# Login to Cloudflare (free tier works)
wrangler login

# Create infrastructure
wrangler d1 create chat-knowledge-db
wrangler vectorize create chat-knowledge-embeddings --dimensions=768 --metric=cosine

# Run migrations
wrangler d1 execute chat-knowledge-db --remote --file=migrations/migration-federation.sql
wrangler d1 execute chat-knowledge-db --remote --file=migrations/migration-sanitizer.sql

# Deploy
npm run deploy
Enter fullscreen mode Exit fullscreen mode

That's it. You now have your own federated knowledge instance.


Try It Right Now

Import a chat:

# Save any Claude conversation as HTML (Ctrl+S)
npm run build
node dist/cli/import-html.js path/to/chat.html "My First Import"
Enter fullscreen mode Exit fullscreen mode

Search it:

curl -X POST https://your-worker.workers.dev/search \
  -H "Content-Type: application/json" \
  -d '{"query": "debugging tips", "maxResults": 5}'
Enter fullscreen mode Exit fullscreen mode

Scan for secrets:

node dist/cli/safe-share.js <chat-id>
# Shows what would leak before you share
Enter fullscreen mode Exit fullscreen mode

The Meta Moment

I wrote about the problem three weeks ago.

Now the solution is running in production.

From observation to shipped product in 21 days.

That's the power of:

  • Cloudflare Workers (deploy in seconds)
  • AI embeddings (semantic search out of the box)
  • ActivityPub (proven federation protocol)
  • Building in public (accountability + feedback)

Your move, Stack Overflow. 👊


Related Articles

  1. My Chrome Tabs Tell a Story - The observation that started it all
  2. We're Creating a Knowledge Collapse - The problem statement (12K+ views)
  3. Above the API: What Developers Contribute When AI Can Code - What skills actually matter
  4. You're here - The solution

Let's Build This Together

GitHub: https://github.com/dannwaneri/chat-knowledge

Live Demo: https://chat-knowledge-api.fpl-test.workers.dev

Twitter: @dannwaneri

If you believe in preserving developer knowledge, star the repo ⭐ and let's make this real.

The foundation is laid. Now we need builders.

Are you in?

Top comments (26)

Collapse
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO • Edited

Brilliant! As @richardpascoe said: Quite simply mind blown! This is the kind of article I love to read and share: a problem and its solution.

Collapse
 
dannwaneri profile image
Daniel Nwaneri The Foundation

appreciate that pascal.

"problem and its solution" is exactly the structure i was going for. too many articles identify problems without shipping anything.

if you share it, curious what audience youre thinking - linkedin? twitter? developer communities?

always learning what resonates

Collapse
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO

You're welcome Daniel! I share it with some developpers I use to work with, for example… I've also took it in my own wallabag list, to get it at hand when needed.

Collapse
 
harsh2644 profile image
Harsh

“Absolutely! Articles like this are gold—clear problem, actionable solution, and real insights. Definitely worth sharing with anyone who wants to learn something meaningful.”

Collapse
 
leob profile image
leob • Edited

Done & dusted in 3 weeks - that's quicker than it takes most companies to come up with a rough outline of a plan, haha - of course you did benefit from your prior experience with the techniques that you mentioned (as per your previous dev.to articles), but still - epic!

Eager to start looking at it and playing with it .......

Collapse
 
dannwaneri profile image
Daniel Nwaneri The Foundation

😂 exactly most companies would still be in planning meetings.

advantage of building in public. no committees, no bureaucracy, just ship and iterate.

"eager to start looking at it". let me know if you hit any setup issues. trying to get first 5-10 instances deployed this week to test federation.

would love your feedback on the protocol

Collapse
 
leob profile image
leob

Yeah having the vision and the skills (and knowing what you want and need) beats any corporate design/planning committee, lol ... brilliant work, excited to check it out!

Collapse
 
myroslavmokhammadabd profile image
myroslav mokhammad abdeljawwad

This is honestly one of the more concrete responses I’ve seen to the “knowledge collapse” problem most discussions stop at diagnosing it, not shipping infrastructure.

The security scanner is the part that really stands out to me. That’s the missing piece in almost every “share your AI chats” idea: people want to share insights, but one leaked token or internal URL is enough to shut the whole thing down. Treating safety as a first-class concern instead of an afterthought makes this feel production-minded rather than experimental.

I also like that you’re leaning on ActivityPub instead of inventing a new federation protocol. Reusing something battle-tested (even if it’s awkward for Q&A semantics) feels like the right tradeoff if the goal is adoption rather than purity.

One question I’m curious about as this scales:
how do you see moderation and trust evolving across federated instances, especially once cross-instance search is live? Is it more “local rules, local reputation,” or do you imagine shared signals emerging over time?

Either way, shipping this end-to-end in ~3 weeks — parsing, embeddings, security, federation is impressive. This feels like real infrastructure, not a demo. I’ll be digging into the repo.

Collapse
 
dannwaneri profile image
Daniel Nwaneri The Foundation

this is THE question. youre right to ask it early.

my thinking. layered moderation

local instance rules: each sets its own policy (strict vs permissive)

shared blocklists: subscribe to curated spam/bad actor lists (activitypub pattern from mastodon).

web of trust: reputation doesnt federate centrally. instead, your instance tracks which OTHER instances you trust. transitive trust emerges (like pgp key signing).

the hard part: cross-instance search

whose rules apply when searching federated content?

current plan:

  • search returns ALL matches across instances
  • YOUR instance filters locally based on YOUR policy
  • you see what your instance allows

this means:

  • no central authority on "truth"
  • communities maintain own standards
  • federation = discovery, not enforcement

does that match your mental model or am i missing something?

"digging into the repo" . would love feedback on protocol design, especially moderation hooks.

Collapse
 
cyber8080 profile image
Cyber Safety Zone

Really cool project! 🙌 I love how this tackles decentralized knowledge sharing and puts control back in users’ hands. Looking forward to seeing how the federated AI knowledge commons grows and gets adopted — especially for collaborative learning and research!

Collapse
 
alifunk profile image
Ali-Funk

This is the logical next step in the conversation about Digital Sovereignty.
I recently wrote about the importance of 'Owning Your Keys' (CMKs) to secure data at rest against vendor lock-in. Seeing this concept applied to the knowledge layer via Federated AI is fascinating.
Centralization offers convenience, but decentralized 'Knowledge Commons' offer true ownership. Excellent work on this architecture.

Collapse
 
dannwaneri profile image
Daniel Nwaneri The Foundation

appreciate you connecting this to digital sovereignty.that's exactly the conversation this needs.

centralization = convenience at the cost of ownership.
federation = ownership at the cost of bootstrapping.

but the cost equation is changing:

  • cloudflare workers = $5/month for edge deployment
  • activitypub = proven at mastodon scale
  • security scanner = automated trust infrastructure

the next wave isn't "build vs buy" - it's "own vs rent"

your CMK work on data at rest + this on knowledge at rest = complete digital sovereignty stack

what does the infrastructure layer look like when enterprises want both?

Collapse
 
dannwaneri profile image
Daniel Nwaneri The Foundation

appreciate that richard. your question about fosstodon was the spark.

"living breathing resource" is exactly right. this only works if its adopted by people who care about preserving knowledge publicly.

next step. lets get your instance running.
ive got the setup down to 5 minutes. if youre interested, we can test federation between our instances - prove the protocol works cross-server.

would love your feedback on the activitypub implementation. youve got way more fediverse experience than i do.

lets make this real.

Collapse
 
frozenblood profile image
Frozen Blood

The security-first approach and federation angle make this feel thoughtful, not rushed. Turning private AI chats into safe, searchable, shared knowledge is something a lot of us have felt missing.

Big respect for building this in public and getting it into production so quickly.

Collapse
 
dannwaneri profile image
Daniel Nwaneri The Foundation

appreciate that.the security scanner was non-negotiable.

one leaked api key costs more than this entire system. had to be safe by default.

"something a lot of us have felt missing" -exactly. we all knew this was a problem, someone just had to build it.

I am curious.what would you want to see next? collections? mcp extension? better federation protocol?

 
dannwaneri profile image
Daniel Nwaneri The Foundation

this is going to be great. first federation test with someone who actually understands activitypub protocol.

ive documented the deployment but let me know if anything is unclear. we can do a live test once yours is running.

also if you find protocol issues, raise them early. better to fix before 10+ instances deploy.

excited to see fosstodon community connected to this

Collapse
 
sharonoliva profile image
sharon oliva

That sounds really interesting! Can you share a bit about the approach you used to tackle knowledge collapse and what kind of results you saw in those 21 days?

Collapse
 
dannwaneri profile image
Daniel Nwaneri The Foundation

happy to break it down.

problem: stack overflow down 78%. best debugging sessions trapped in private AI chats.

approach:

  • save claude conversations as HTML
  • security scanner auto-blocks API keys/secrets
  • semantic search across all imported chats
  • activitypub federation (like mastodon for Q&A)

results in 21 days:

  • production system live
  • 107 chunks imported, searchable
  • security scanner caught 599 issues (3 critical blocks)
  • richard deploying second instance (first federation test incoming)
  • open source: github.com/dannwaneri/chat-knowledge

key: make sharing safe + easy while respecting privacy

what interests you most about this?

Collapse
 
justin_elliott_129e94b025 profile image
Justin Elliott

Hi, I'm Justin, a full-stack engineer with a strong interest in backend systems, async programming, and AI.
I'm currently working a lot with APIs, distributed systems, and LLM-based projects.
Looking forward to learning from you all and contributing where I can.😉

Collapse
 
harsh2644 profile image
Harsh

Really interesting perspective. The idea of tackling knowledge collapse in such a short timeframe is impressive. Looking forward to diving deeper into this.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.