DEV Community

Kaushikcoderpy
Kaushikcoderpy

Posted on

๐Ÿง  NeuroDoc: From Broken Prototype to Production-Ready Async AI Documentation Engine

GitHub โ€œFinish-Up-A-Thonโ€ Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

I abandoned this project. Then I resurrected it. Here's how a fragile CLI script became a full-stack async web dashboard with RAG capabilities.

System Execution Timeline

Time Interval Activity Description
00:00 - 00:15 Code Runner Demo execution
00:15 - 00:35 System command execution sequence
00:35 - 00:45 Dependency analysis performed on aiohttp
00:45 - 00:55 RAG (Retrieval-Augmented Generation) task initiated
00:55 System telemetry dashboard display
0.55 - 01.20 Waiting for RAG results
01:21 - 01:28 Backend log visualization
01:29 - 01:41 RAG query results display

๐Ÿงจ The Problem โ€” Why It Was Abandoned

NeuroDoc started as an ambitious idea: a single tool to fetch, scrape, process, and summarize documentation across Python, scikit-learn, PyTorch, and TensorFlow โ€” powered by NLP and multi-core processing.

But it hit a wall fast.

# The villain: a blocking synchronous loop that froze everything
while True:
    query = input("Enter query: ")  # ๐Ÿšซ BLOCKS the main thread
    result = fetch_docs(query)      # ๐Ÿšซ BLOCKS background workers
    print(result)
Enter fullscreen mode Exit fullscreen mode

The original prototype had three fatal flaws:

Problem Impact
input() loop on main thread Blocked all background scraping workers
In-memory task queue All pending jobs vanished on crash
Brittle core resolver Failed silently on dynamic imports

Long-running doc crawls would stall. A single crash wiped the entire task queue. It was a house of cards โ€” impressive from a distance, terrifying up close.

So I shelved it.


๐Ÿ’ก The Comeback โ€” What Changed

Months later, I came back with a clear head and a plan. The rewrite wasn't incremental โ€” it was architectural. Three shifts made everything click:

1. ๐Ÿ”„ Full Async Rewrite with asyncio + aiohttp

Out went the blocking loop. In came a proper async event loop that lets scraping, processing, and serving happen concurrently without stepping on each other.

async def fetch_documentation(url: str, session: aiohttp.ClientSession) -> DocResult:
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as response:
        content = await response.text()
        return await process_content(content)

async def run_pipeline(queries: list[str]) -> list[DocResult]:
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_documentation(q, session) for q in queries]
        return await asyncio.gather(*tasks, return_exceptions=True)
Enter fullscreen mode Exit fullscreen mode

No more frozen terminals. No more stalled workers.


2. ๐Ÿ—„๏ธ Database-Backed Task Queue (Goodbye, Volatile Memory)

The in-memory queue was replaced with a persistent, database-backed task queue using PostgreSQL and asyncpg. Now if the server crashes at 3 AM while crawling PyTorch docs, no work is lost. Tasks resume exactly where they left off.

class TaskQueue:
    async def enqueue(self, task: DocumentationTask) -> str:
        task_id = str(uuid.uuid4())
        await self.db.execute(
            ""INSERT INTO tasks (id, status, payload, created_at) VALUES ($1, $2, $3, $4)",
+            task_id, TaskStatus.PENDING, task.to_json(), datetime.utcnow()",
            (task_id, TaskStatus.PENDING, task.to_json(), datetime.utcnow())
        )
        return task_id

    async def get_next(self) -> DocumentationTask | None:
        row = await self.db.fetchone(
            "SELECT * FROM tasks WHERE status = 'pending' ORDER BY created_at LIMIT 1"
        )
        return DocumentationTask.from_row(row) if row else None
Enter fullscreen mode Exit fullscreen mode

3. ๐Ÿ” RAG โ€” Retrieval-Augmented Generation Layer

This is where NeuroDoc levels up from "scraper" to "intelligent documentation assistant."

Instead of returning raw docs, it:

  1. Chunks scraped content into semantic segments
  2. Embeds them into a vector store
  3. Retrieves the most relevant chunks for a query
  4. Generates a focused, context-aware summary
class RAGPipeline:
    async def query(self, user_query: str) -> RAGResponse:
        # Step 1: Embed the query
        query_embedding = await self.embedder.embed(user_query)

        # Step 2: Retrieve top-k relevant chunks
        relevant_chunks = await self.vector_store.similarity_search(
            query_embedding, top_k=5
        )

        # Step 3: Generate grounded summary
        context = "\n\n".join(chunk.text for chunk in relevant_chunks)
        summary = await self.llm.generate(
            prompt=f"Answer based on this documentation:\n{context}\n\nQuery: {user_query}"
        )

        return RAGResponse(summary=summary, sources=relevant_chunks)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ—๏ธ Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Web Dashboard (FastAPI)            โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚              โ”‚  Submit  โ”‚   Results    โ”‚            โ”‚
โ”‚              โ”‚  Query   โ”‚   Viewer     โ”‚            โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚            โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚       Async Task Dispatcher      โ”‚
          โ”‚    (asyncio + DB task queue)     โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚                  โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Multi-core     โ”‚    โ”‚   RAG Pipeline       โ”‚
    โ”‚  Doc Scraper    โ”‚    โ”‚  (Embed โ†’ Retrieve   โ”‚
    โ”‚  (aiohttp)      โ”‚    โ”‚   โ†’ Generate)        โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚                      โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚            PostgreSQL DB           
    โ”‚   (tasks ยท chunks ยท embeddings ยท results)    โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ฆ Supported Documentation Sources

Library Sections Scraped NLP Processing
๐Ÿ Python stdlib, builtins, language ref Code extraction, summaries
๐Ÿค– scikit-learn API reference, user guide Table parsing, param docs
๐Ÿ”ฅ PyTorch Tensor ops, nn, autograd Code snippets, examples
๐ŸŒŠ TensorFlow Keras, tf.data, layers API signatures, guides

๐Ÿš€ Getting Started

# Clone the repo
git clone https://github.com/kaushikcoderpy1/neurodoc
cd neurodoc

# Install dependencies
pip install -r requirements.txt

# Initialize the database
python -m neurodoc.db init

# Start the async dashboard
uvicorn neurodoc.app:app --reload --port 8000
Enter fullscreen mode Exit fullscreen mode

Then open http://localhost:8000 and start querying.


๐Ÿงช Key Technical Decisions โ€” And Why

Why asyncio over threading?
Doc scraping is I/O-bound (waiting on HTTP). asyncio handles thousands of concurrent requests with a single thread โ€” no GIL fights, no race conditions.

Why SQLite for the task queue instead of Redis?
Zero infrastructure. NeuroDoc is a dev tool โ€” adding a Redis dependency just to persist a queue adds friction. SQLite WAL mode handles concurrent reads/writes cleanly for this use case.

Why RAG over fine-tuning?
Documentation changes constantly. RAG retrieves from live-scraped content. A fine-tuned model would be stale in weeks.



๐Ÿค– How GitHub Copilot Saved NeuroDoc โ€” 4 Critical Bugs It Helped Crush

This section is the heart of the comeback story. NeuroDoc didn't just get rewritten โ€” it got debugged at a deep architectural level with Copilot as a true pair programmer. Here are four real, production-blocking bugs it helped resolve.


๐Ÿ› Bug 1: Async Database Connection Pool Leaks Under Multi-Core Batches

The failure: Under high-concurrency loads via asyncio.gather, edge-case exceptions inside sub-coroutines bypassed connection release hooks โ€” leaving asyncpg pool sockets exhausted and the app hanging silently.

Standard try/finally cleanup blocks failed because they referenced stale async contexts. The pool hit max capacity and froze.

How Copilot helped:

Copilot introduced a strict connection acquisition pattern bound directly to local transaction lifecycles, with absolute timeout guards:

# Copilot-suggested acquisition pattern
async with pool.acquire() as connection:
    async with connection.transaction():
        result = await asyncio.wait_for(
            connection.fetch(query, *args),
            timeout=5.0  # Hard boundary โ€” no silent hangs
        )
Enter fullscreen mode Exit fullscreen mode

It also added global exception wrappers that translate raw driver errors into clean structured responses โ€” guaranteeing connection cleanup even if the downstream scraping pipeline crashed.


๐Ÿ› Bug 2: SpecifierSet .contains() AttributeError Across Packaging Versions

The failure: formatter.py runs dependency diagnostics via DependencyAnalyzer. On environments with older packaging library versions, calling .contains() on a SpecifierSet threw:

AttributeError: 'SpecifierSet' object has no attribute 'contains'
Enter fullscreen mode Exit fullscreen mode

This crashed the entire diagnostic panel before it could render โ€” silently breaking environment validation for a large chunk of users.

How Copilot helped:

Copilot identified that .contains() is version-specific, but the native in operator is universally backward-compatible across all historical releases of packaging:

# โŒ Old failing code
elif not raw_spec.contains(local):

# โœ… Copilot's robust fix โ€” works on every packaging version
elif local not in raw_spec:
Enter fullscreen mode Exit fullscreen mode

One operator swap. Zero crashes across all environments.


๐Ÿ› Bug 3: Implicit String Mappings Breaking Single-Core CLI Dispatch

The failure: In neurodoc.py, CLI input like neurodoc fetch os passed the core ID "1" as a raw string into isinstance(core, Core1PythonBasics) checks. Since "1" is a string, every check silently fell through with:

Unknown core type for str
Enter fullscreen mode Exit fullscreen mode

Worse โ€” the topic "os" was passed into the batch resolver without list wrapping, so it iterated over the characters 'o' and 's' separately instead of treating "os" as a unified module name.

How Copilot helped:

Copilot introduced dynamic string dereferencing that maps string IDs back to their live handler instances, plus list-wrapping for topic encapsulation:

# Dynamic dereference โ€” string โ†’ live core handler
if isinstance(core, str):
    core = self.command_handler.available_cores.get(core)

# Topic wrapped as list โ€” no more character iteration
return await self.call_backend("core1", topics=[topic_f], flags=flags)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ› Bug 4: NLP Tensor Dimension Mismatch in Cross-Encoder Similarity

The failure: nlp_with_cos.py calculates semantic similarity across documentation topics using PyTorch/TensorFlow models. Queries of varying lengths produced tensors with mismatched dimensions, throwing:

RuntimeError: Tensors must be of the same shape
Enter fullscreen mode Exit fullscreen mode

This crashed deep multi-core fetches completely โ€” the most expensive operation in the entire pipeline.

How Copilot helped:

Copilot suggested a preprocessing step using dynamic zero-padding and truncation to align all input vectors before the cosine similarity matrix calculation:

# Copilot's shape-alignment fix
inputs = tokenizer(
    text,
    padding="max_length",
    truncation=True,
    max_length=512,
    return_tensors="pt"
)
Enter fullscreen mode Exit fullscreen mode

All tensors now enter the similarity layer at identical dimensions โ€” no shape mismatches, no crashes.


๐Ÿ’ฌ What Copilot Actually Felt Like as a Pair Programmer

These weren't simple autocomplete suggestions. Copilot reasoned about async lifecycle boundaries, cross-version API compatibility, type system edge cases, and linear algebra constraints โ€” the kind of bugs that take hours of debugging to even locate, let alone fix.

The biggest unlock: it didn't just fix the symptom. For each bug, it explained why the original approach was fragile and offered a pattern that would hold up under production conditions.

That's the difference between a tool and a collaborator.


๐Ÿ”ญ What's Next

  • [ ] Browser extension for one-click doc lookup
  • [ ] Streaming responses via WebSockets
  • [ ] Support for Hugging Face, Pandas, NumPy docs
  • [ ] Self-hosted embedding model (no API key required)
  • [ ] Export summaries as Jupyter notebooks

๐Ÿ”— Links


Built for the DEV.to hackathon. Powered by stubbornness, async Python, and too much coffee.

Top comments (2)

Collapse
 
harjjotsinghh profile image
Harjot Singh

the broken-prototype-to-production arc is the real work, and async is where most AI doc tools fall over. that hardening journey is basically what a harness automates: in Moonshift the prototype-to-production gap is closed by validation between steps before agents build + deploy + market a SaaS overnight. nice to see you push NeuroDoc all the way. first run's free if you want to compare the productionizing path.

Collapse
 
kaushikcoderpy profile image
Kaushikcoderpy

thank you for your insight