DEV Community

Cover image for Building a Private, Local-First AI Scribe with RunAnywhere SDK
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building a Private, Local-First AI Scribe with RunAnywhere SDK

In the world of healthcare, data privacy isn't just a "nice to have" - it's a legal and ethical imperative. Yet, most modern AI solutions rely on the cloud, sending sensitive patient audio and transcripts to distant servers for processing.

Today, I'm sharing how I built a 100% Offline Medical Scribe that transcribes consultations and generates structured SOAP notes entirely within the browser. No cloud, no API keys, no latency—just pure on-device power.

The Vision: Privacy at the Edge

The goal was simple: Create a tool that allows a doctor to record a consultation and walk away with a professional SOAP note (Subjective, Objective, Assessment, Plan) without a single byte of data leaving the room.

To achieve this, I leveraged the RunAnywhere SDK, a high-performance WASM and WebGPU-based AI runtime that brings "heavy" AI models into the web environment.

The Architecture: A Multi-Model Orchestration

A medical scribe isn't just one model; it's a pipeline. Our application coordinates three distinct AI components:

  1. Voice Activity Detection (VAD): I used Silero VAD (running on ONNX) to detect when speech is happening. This saves processing power and improves transcription accuracy by ignoring silence and background noise.
  2. Speech-to-Text (STT): For transcription, I implemented Whisper Zipformer (via sherpa-onnx). This model is highly optimized for streaming, allowing me to show the doctor a live transcript as they speak.
  3. Large Language Model (LLM): Once the consultation is over, an optimized Llama 3.2 1B model analyzes the transcript to generate the structured medical note.

Breaking the Storage Barrier with OPFS

Shipping GB-sized AI models to a browser is usually a nightmare. This is solved by using the Origin Private File System (OPFS).

Unlike standard browser storage, OPFS provides a dedicated, native-speed filesystem. The RunAnywhere SDK uses this to cache models locally. Once a doctor downloads the models once, the app loads them at near-native disk speeds on every subsequent visit.

Solving the "Sloppy JSON" Problem

One challenge of using smaller, 1B-parameter models locally is that they can sometimes be "imperfect" with their formatting. They might cut off a quote or forget a closing brace.

To make this production-ready, I built a Resilient JSON Repair layer. Instead of crashing when the AI output is malformed, our app uses a "Keyword Seek" algorithm that finds the core medical data regardless of the surrounding syntax. This makes the experience rock-solid for the clinician.

Why RunAnywhere SDK?

Building this from scratch would have required months of complex WASM glue code and worker management. The RunAnywhere SDK made it trivial by:

  • Handling the multi-threading of WASM modules.
  • Automatically optimizing for WebGPU if available, with CPU fallback.
  • Providing a unified API for STT, VAD, and LLM services.

The Offline Medical Scribe is more than just a demo, it's a showcase for the future of specialized AI. By moving the processing to the edge, we eliminate privacy risks, reduce server costs to zero, and create a resilient tool that works even in hospitals with poor connectivity.

Check out the code and start building your own private AI tools today!

Credits

Top comments (0)