Daniel Nwaneri for The Foundation

Posted on Feb 9

I Shipped the Solution to Knowledge Collapse in 21 Days

#ai #opensource #activitypub #discuss

Three weeks ago, I wrote about knowledge collapse - how our best technical insights are dying in private AI chats while Stack Overflow bleeds 78% of its traffic.

Hundreds of developers agreed we need a solution.

So I built one. And it's running in production right now.

Here's the live demo: https://chat-knowledge-api.fpl-test.workers.dev

Here's the source code: https://github.com/dannwaneri/chat-knowledge

Here's the complete build session you're reading from: 107 chunks, 174 messages, imported via HTML

Let me show you how it works.

The Problem (Quick Recap)

Knowledge collapse is happening right now:

Your best debugging solutions live in private Claude chats
No attribution, no discovery, no commons
Stack Overflow traffic down 78% since ChatGPT launched
We're optimizing ourselves into a knowledge dead-end

We need "Stack Overflow for AI conversations" - but decentralized, privacy-first, and developer-owned.

What I Actually Built

1. HTML Import System

The workflow is dead simple:

Have a valuable Claude conversation
Press Ctrl+S (save as HTML)
Import it: node dist/cli/import-html.js chat.html
Done - it's searchable

Tested on this article's build session:

File size: 4.6MB
Messages: 136 parsed
Chunks created: 91
Time: < 2 seconds

No complex setup. No API keys. Just save and import.

2. Security Scanner (The Critical Feature)

Before any chat goes public, it runs through auto-detection:

🔴 CRITICAL (auto-block):

API keys, Bearer tokens
Private URLs (localhost, .internal domains)
Credentials, passwords

Real results from my build session scan:

Total issues detected: 599
Critical blocks: 3 (2 Bearer tokens, 1 API key)
High severity: Multiple localhost URLs
Safe to share: FALSE ✅ (exactly as designed)

This is the difference between "share everything" and "share safely." One leaked API key costs more than this entire system.

By The Numbers

This Build Session:

Start: Problem identified (Jan 15)
Build: 107 conversation chunks
Messages: 174 total
File size: 4.6MB HTML
Parse time: < 2 seconds
Security scan: 599 issues detected (3 critical auto-blocks)
Ship date: Feb 8 (24 days from problem to production)

That's faster than most companies decide what to build.

3. Semantic Search

Not keyword matching - actual understanding.

Query: "how to handle vectorize embeddings"

Found: Content about "dimension reduction" and "optimization"

Relevance score: 0.78

The system understood WHAT I meant, not just what I typed.

Tech stack:

Workers AI (@cf/baai/bge-base-en-v1.5) - generates embeddings
Vectorize - stores 768-dimension vectors
D1 - metadata and chat structure
Cosine similarity search across all imported conversations

4. Federation Protocol (ActivityPub)

This isn't just personal knowledge management. It's designed to federate.

Live endpoints:

NodeInfo: /api/federation/nodeinfo (200 OK ✅)
WebFinger: /api/federation/.well-known/webfinger
Inbox/Outbox: ActivityPub standard

What federation means:

You run your instance
I run my instance
We search across ALL of them
No single point of control
No corporate overlord

Exactly like Mastodon, but for developer knowledge.

Technical Architecture (How It Actually Works)

The Stack

Frontend: HTML → Parser → Chunks
Backend: Cloudflare Workers (edge-native)
Database: D1 (SQLite at the edge)
Vector Store: Vectorize (768-dim embeddings)
AI: Workers AI (BGE-base-en-v1.5)
Protocol: ActivityPub (federation standard)

The Flow

1. IMPORT
   HTML file → Parse messages → Chunk content → Generate embeddings

2. SECURITY
   Scan for secrets → Flag risks → Require review → Safe by default

3. STORAGE
   Chunks → D1 (metadata)
   Embeddings → Vectorize (semantic search)

4. SEARCH
   Query → Generate embedding → Cosine similarity → Ranked results

5. FEDERATION (coming)
   Public chats → ActivityPub → Federated timeline → Cross-instance search

What Was Hard

1. HTML Parsing
Claude's HTML export format isn't documented. Had to reverse-engineer:

Message boundaries
Code block preservation
Artifact handling
Nested content structure

2. Security Scanner
Can't just regex for "API key" - need to understand context:

Is this a code example or real credential?
Is localhost URL in docs or actual endpoint?
Balance: too strict = false positives, too loose = leaks

3. Federation Protocol
ActivityPub is designed for social posts, not Q&A:

How to represent "question" vs "answer"?
Vote federation across instances?
Spam prevention without centralized moderation?

4. Edge-Native Architecture
Cloudflare Workers have constraints:

10ms CPU limit per request
No filesystem
Async-only database access

Working within constraints = better architecture.

The Security Scanner in Action

// Real security detection from the codebase
const patterns = {
  bearerToken: /Bearer\s+[A-Za-z0-9\-._~+/]+=*/gi,
  apiKey: /['\"]?api[_-]?key['\"]?\s*[:=]\s*['\"]?[A-Za-z0-9-_]{20,}['\"]?/gi,
  localhost: /https?:\/\/(localhost|127\.0\.0\.1|::1)/gi,
  internalDomain: /https?:\/\/[a-z0-9.-]+\.(local|internal|corp|dev)/gi
}

// Scan returns: { safe: boolean, issues: Issue[] }
// Auto-blocks if critical issues found

Database Schema (12 tables)

chats                  -- Core chat storage
chunks                 -- Content chunks for search
pre_share_scans        -- Security scanner results
chunk_redactions       -- Auto-redaction tracking
share_approvals        -- Sharing workflow
federated_instances    -- Federation network
federated_knowledge    -- Cross-instance content
federation_activities  -- ActivityPub events
knowledge_analytics    -- Usage tracking
collections            -- Knowledge curation
collection_items       -- Collection membership

This is production-grade infrastructure, not a proof of concept.

Why I Built This Now

Timing matters.

Stack Overflow traffic is down 78% and still falling. Every day, thousands of valuable debugging sessions happen in private AI chats and disappear forever.

We're not just losing knowledge - we're losing the HABIT of knowledge sharing.

Building this now means:

Early adopters shape the protocol
Federation standards emerge organically
We avoid corporate capture (no VC, no "pivot to paid")
Developers own the infrastructure from day one

The best time to rebuild the knowledge commons was before Stack Overflow collapsed.

The second-best time is now.

Why This Matters

It Solves Knowledge Collapse

✅ Insights stay discoverable - Semantic search finds relevant content
✅ Attribution preserved - Source tracking built-in
✅ Privacy respected - Security scanner catches leaks
✅ No platform risk - Self-hosted, you control your data

It's Actually Decentralized

ActivityPub = proven federation protocol (powers Mastodon's 10M+ users)
Developer-owned instances - Run your own, connect with others
No "rug pull" risk - Open source, MIT licensed

It's Viable at Edge Scale

Cloudflare Workers handles massive scale:

Edge-native architecture
D1 database at the edge
Vectorize for semantic search
Workers AI for embeddings

The same tech stack I use for production apps serving thousands of users.

From Discussion to Infrastructure

Three weeks ago, Richard Pascoe asked in the comments: "Could Mastodon servers like Fosstodon help foster a knowledge sharing platform?"

I said yes and built it.

This isn't theoretical infrastructure. It's ActivityPub-compatible, meaning it federates with Mastodon, Fosstodon, and the entire Fediverse network.

Richard's question became the bridge between diagnosis and solution.

@richardpascoe - your instance is ready when you are. 🚀

Real Use Cases (What This Enables)

For Individual Developers

Portfolio of problem-solving - Your best debugging sessions, searchable
Learning in public - Share solutions, get feedback, build reputation
Future reference - "I solved this before, where was that chat?"

For Teams

Institutional knowledge - Team's collective debugging history
Onboarding - New devs search team's past solutions
Pattern recognition - See recurring problems across conversations

For Communities

Niche expertise - Rust specialists, Cloudflare devs, etc. share domain knowledge
Federated discovery - Find experts across instances
Attribution - Credit flows to who actually solved it

For The Commons

Stack Overflow alternative - But decentralized and community-owned
AI training data - High-quality, attributed conversations
Knowledge archaeology - Insights don't die with platforms

What's Next

Immediate (This Week)

✅ Open source on GitHub (MIT license)
✅ Documentation for self-hosting
✅ Production deployment live

Short-term (Next Month)

Import 50+ historical Claude chats (build the corpus)
MCP extension ("share this chat publicly" from Claude Code)
First federation test with another developer

Long-term (3-6 Months)

10+ federated instances
Collections feature (curate knowledge by topic)
Analytics (which insights are most valuable)
Cross-instance search

This Is The Foundation in Action

Two weeks ago, we created @the-foundation to preserve developer knowledge publicly.

Richard Pascoe published our first collaborative post on fundamentals.

This is our second: working infrastructure.

The Foundation isn't just writing about the problem. We're shipping solutions.

Join The Foundation

This isn't a solo project. It's infrastructure.

For Developers

Clone the repo: https://github.com/dannwaneri/chat-knowledge
Run your own instance - Full setup guide in README
Contribute to the protocol - Issues and PRs welcome

For Writers

Import your best AI conversations - Build your knowledge portfolio
Share safely - Security scanner protects you
Get discovered - Federated search makes your insights findable

For The Curious

Star the repo ⭐ - Show you care about preserving knowledge
Share this article - Help spread the word
Join the discussion - Comment below with your thoughts

The knowledge commons doesn't rebuild itself. But we can build it together.

Installation (5 Minutes)

# Clone the repo
git clone https://github.com/dannwaneri/chat-knowledge.git
cd chat-knowledge

# Install dependencies
npm install

# Login to Cloudflare (free tier works)
wrangler login

# Create infrastructure
wrangler d1 create chat-knowledge-db
wrangler vectorize create chat-knowledge-embeddings --dimensions=768 --metric=cosine

# Run migrations
wrangler d1 execute chat-knowledge-db --remote --file=migrations/migration-federation.sql
wrangler d1 execute chat-knowledge-db --remote --file=migrations/migration-sanitizer.sql

# Deploy
npm run deploy

That's it. You now have your own federated knowledge instance.

Try It Right Now

Import a chat:

# Save any Claude conversation as HTML (Ctrl+S)
npm run build
node dist/cli/import-html.js path/to/chat.html "My First Import"

Search it:

curl -X POST https://your-worker.workers.dev/search \
  -H "Content-Type: application/json" \
  -d '{"query": "debugging tips", "maxResults": 5}'

Scan for secrets:

node dist/cli/safe-share.js <chat-id>
# Shows what would leak before you share

The Meta Moment

I wrote about the problem three weeks ago.

Now the solution is running in production.

From observation to shipped product in 21 days.

That's the power of:

Cloudflare Workers (deploy in seconds)
AI embeddings (semantic search out of the box)
ActivityPub (proven federation protocol)
Building in public (accountability + feedback)

Your move, Stack Overflow. 👊

My Chrome Tabs Tell a Story - The observation that started it all
We're Creating a Knowledge Collapse - The problem statement (12K+ views)
Above the API: What Developers Contribute When AI Can Code - What skills actually matter
You're here - The solution

Let's Build This Together

GitHub: https://github.com/dannwaneri/chat-knowledge

Live Demo: https://chat-knowledge-api.fpl-test.workers.dev

Twitter: @dannwaneri

If you believe in preserving developer knowledge, star the repo ⭐ and let's make this real.

The foundation is laid. Now we need builders.

Are you in?

Top comments (26)

Pascal CESCATO • Feb 9 • Edited

Brilliant! As @richardpascoe said: Quite simply mind blown! This is the kind of article I love to read and share: a problem and its solution.

Daniel Nwaneri The Foundation • Feb 9

appreciate that pascal.

"problem and its solution" is exactly the structure i was going for. too many articles identify problems without shipping anything.

if you share it, curious what audience youre thinking - linkedin? twitter? developer communities?

always learning what resonates

Pascal CESCATO • Feb 9

You're welcome Daniel! I share it with some developpers I use to work with, for example… I've also took it in my own wallabag list, to get it at hand when needed.

Harsh • Feb 9

“Absolutely! Articles like this are gold—clear problem, actionable solution, and real insights. Definitely worth sharing with anyone who wants to learn something meaningful.”

leob • Feb 9 • Edited

Done & dusted in 3 weeks - that's quicker than it takes most companies to come up with a rough outline of a plan, haha - of course you did benefit from your prior experience with the techniques that you mentioned (as per your previous dev.to articles), but still - epic!

Eager to start looking at it and playing with it .......

Daniel Nwaneri The Foundation • Feb 9

😂 exactly most companies would still be in planning meetings.

advantage of building in public. no committees, no bureaucracy, just ship and iterate.

"eager to start looking at it". let me know if you hit any setup issues. trying to get first 5-10 instances deployed this week to test federation.

would love your feedback on the protocol

leob • Feb 9

Yeah having the vision and the skills (and knowing what you want and need) beats any corporate design/planning committee, lol ... brilliant work, excited to check it out!

myroslav mokhammad abdeljawwad • Feb 9

This is honestly one of the more concrete responses I’ve seen to the “knowledge collapse” problem most discussions stop at diagnosing it, not shipping infrastructure.

The security scanner is the part that really stands out to me. That’s the missing piece in almost every “share your AI chats” idea: people want to share insights, but one leaked token or internal URL is enough to shut the whole thing down. Treating safety as a first-class concern instead of an afterthought makes this feel production-minded rather than experimental.

I also like that you’re leaning on ActivityPub instead of inventing a new federation protocol. Reusing something battle-tested (even if it’s awkward for Q&A semantics) feels like the right tradeoff if the goal is adoption rather than purity.

One question I’m curious about as this scales:
how do you see moderation and trust evolving across federated instances, especially once cross-instance search is live? Is it more “local rules, local reputation,” or do you imagine shared signals emerging over time?

Either way, shipping this end-to-end in ~3 weeks — parsing, embeddings, security, federation is impressive. This feels like real infrastructure, not a demo. I’ll be digging into the repo.

Daniel Nwaneri The Foundation • Feb 9

this is THE question. youre right to ask it early.

my thinking. layered moderation

local instance rules: each sets its own policy (strict vs permissive)

shared blocklists: subscribe to curated spam/bad actor lists (activitypub pattern from mastodon).

web of trust: reputation doesnt federate centrally. instead, your instance tracks which OTHER instances you trust. transitive trust emerges (like pgp key signing).

the hard part: cross-instance search

whose rules apply when searching federated content?

current plan:

search returns ALL matches across instances
YOUR instance filters locally based on YOUR policy
you see what your instance allows

this means:

no central authority on "truth"
communities maintain own standards
federation = discovery, not enforcement

does that match your mental model or am i missing something?

"digging into the repo" . would love feedback on protocol design, especially moderation hooks.

Cyber Safety Zone • Feb 10

Really cool project! 🙌 I love how this tackles decentralized knowledge sharing and puts control back in users’ hands. Looking forward to seeing how the federated AI knowledge commons grows and gets adopted — especially for collaborative learning and research!

Ali-Funk • Feb 10

This is the logical next step in the conversation about Digital Sovereignty.
I recently wrote about the importance of 'Owning Your Keys' (CMKs) to secure data at rest against vendor lock-in. Seeing this concept applied to the knowledge layer via Federated AI is fascinating.
Centralization offers convenience, but decentralized 'Knowledge Commons' offer true ownership. Excellent work on this architecture.

Daniel Nwaneri The Foundation • Feb 10

appreciate you connecting this to digital sovereignty.that's exactly the conversation this needs.

centralization = convenience at the cost of ownership.
federation = ownership at the cost of bootstrapping.

but the cost equation is changing:

cloudflare workers = $5/month for edge deployment
activitypub = proven at mastodon scale
security scanner = automated trust infrastructure

the next wave isn't "build vs buy" - it's "own vs rent"

your CMK work on data at rest + this on knowledge at rest = complete digital sovereignty stack

what does the infrastructure layer look like when enterprises want both?

Daniel Nwaneri The Foundation • Feb 9

appreciate that richard. your question about fosstodon was the spark.

"living breathing resource" is exactly right. this only works if its adopted by people who care about preserving knowledge publicly.

next step. lets get your instance running.
ive got the setup down to 5 minutes. if youre interested, we can test federation between our instances - prove the protocol works cross-server.

would love your feedback on the activitypub implementation. youve got way more fediverse experience than i do.

lets make this real.

Frozen Blood • Feb 9

The security-first approach and federation angle make this feel thoughtful, not rushed. Turning private AI chats into safe, searchable, shared knowledge is something a lot of us have felt missing.

Big respect for building this in public and getting it into production so quickly.

Daniel Nwaneri The Foundation • Feb 9

appreciate that.the security scanner was non-negotiable.

one leaked api key costs more than this entire system. had to be safe by default.

"something a lot of us have felt missing" -exactly. we all knew this was a problem, someone just had to build it.

I am curious.what would you want to see next? collections? mcp extension? better federation protocol?

Daniel Nwaneri The Foundation • Feb 9

this is going to be great. first federation test with someone who actually understands activitypub protocol.

ive documented the deployment but let me know if anything is unclear. we can do a live test once yours is running.

also if you find protocol issues, raise them early. better to fix before 10+ instances deploy.

excited to see fosstodon community connected to this

sharon oliva • Feb 10

That sounds really interesting! Can you share a bit about the approach you used to tackle knowledge collapse and what kind of results you saw in those 21 days?

Daniel Nwaneri The Foundation • Feb 10

happy to break it down.

problem: stack overflow down 78%. best debugging sessions trapped in private AI chats.

approach:

save claude conversations as HTML
security scanner auto-blocks API keys/secrets
semantic search across all imported chats
activitypub federation (like mastodon for Q&A)

results in 21 days:

production system live
107 chunks imported, searchable
security scanner caught 599 issues (3 critical blocks)
richard deploying second instance (first federation test incoming)
open source: github.com/dannwaneri/chat-knowledge

key: make sharing safe + easy while respecting privacy

what interests you most about this?

Justin Elliott • Feb 12

Hi, I'm Justin, a full-stack engineer with a strong interest in backend systems, async programming, and AI.
I'm currently working a lot with APIs, distributed systems, and LLM-based projects.
Looking forward to learning from you all and contributing where I can.😉

Harsh • Feb 10

Really interesting perspective. The idea of tackling knowledge collapse in such a short timeframe is impressive. Looking forward to diving deeper into this.

View full discussion (26 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

The Problem (Quick Recap)

What I Actually Built

1. HTML Import System

2. Security Scanner (The Critical Feature)

By The Numbers

3. Semantic Search

4. Federation Protocol (ActivityPub)

Technical Architecture (How It Actually Works)

The Stack

The Flow

What Was Hard

The Security Scanner in Action

Database Schema (12 tables)

Why I Built This Now

Why This Matters

It Solves Knowledge Collapse

It's Actually Decentralized

It's Viable at Edge Scale

From Discussion to Infrastructure

Real Use Cases (What This Enables)

For Individual Developers

For Teams

For Communities

For The Commons

What's Next

Immediate (This Week)

Short-term (Next Month)

Long-term (3-6 Months)

This Is The Foundation in Action

Join The Foundation

For Developers

For Writers

For The Curious

Installation (5 Minutes)

Try It Right Now

The Meta Moment

Related Articles

Let's Build This Together