In the world of healthcare, data privacy isn't just a "nice to have" - it's a legal and ethical imperative. Yet, most modern AI solutions rely on the cloud, sending sensitive patient audio and transcripts to distant servers for processing.
Today, I'm sharing how I built a 100% Offline Medical Scribe that transcribes consultations and generates structured SOAP notes entirely within the browser. No cloud, no API keys, no latency—just pure on-device power.
The Vision: Privacy at the Edge
The goal was simple: Create a tool that allows a doctor to record a consultation and walk away with a professional SOAP note (Subjective, Objective, Assessment, Plan) without a single byte of data leaving the room.
To achieve this, I leveraged the RunAnywhere SDK, a high-performance WASM and WebGPU-based AI runtime that brings "heavy" AI models into the web environment.
The Architecture: A Multi-Model Orchestration
A medical scribe isn't just one model; it's a pipeline. Our application coordinates three distinct AI components:
- Voice Activity Detection (VAD): I used Silero VAD (running on ONNX) to detect when speech is happening. This saves processing power and improves transcription accuracy by ignoring silence and background noise.
- Speech-to-Text (STT): For transcription, I implemented Whisper Zipformer (via sherpa-onnx). This model is highly optimized for streaming, allowing me to show the doctor a live transcript as they speak.
- Large Language Model (LLM): Once the consultation is over, an optimized Llama 3.2 1B model analyzes the transcript to generate the structured medical note.
Breaking the Storage Barrier with OPFS
Shipping GB-sized AI models to a browser is usually a nightmare. This is solved by using the Origin Private File System (OPFS).
Unlike standard browser storage, OPFS provides a dedicated, native-speed filesystem. The RunAnywhere SDK uses this to cache models locally. Once a doctor downloads the models once, the app loads them at near-native disk speeds on every subsequent visit.
Solving the "Sloppy JSON" Problem
One challenge of using smaller, 1B-parameter models locally is that they can sometimes be "imperfect" with their formatting. They might cut off a quote or forget a closing brace.
To make this production-ready, I built a Resilient JSON Repair layer. Instead of crashing when the AI output is malformed, our app uses a "Keyword Seek" algorithm that finds the core medical data regardless of the surrounding syntax. This makes the experience rock-solid for the clinician.
Why RunAnywhere SDK?
Building this from scratch would have required months of complex WASM glue code and worker management. The RunAnywhere SDK made it trivial by:
- Handling the multi-threading of WASM modules.
- Automatically optimizing for WebGPU if available, with CPU fallback.
- Providing a unified API for STT, VAD, and LLM services.
The Offline Medical Scribe is more than just a demo, it's a showcase for the future of specialized AI. By moving the processing to the edge, we eliminate privacy risks, reduce server costs to zero, and create a resilient tool that works even in hospitals with poor connectivity.
Check out the code and start building your own private AI tools today!
Credits
- Github Repo: https://github.com/harishkotra/offline-medical-scribe
- AI Runtime: RunAnywhere SDK
Top comments (0)