DEV Community

trantanhau
trantanhau

Posted on

How I built privacy-first file tools that run AI models directly in your browser

Most online file tools work the same way: you upload your file, their server processes it, they send it back. Simple — but every upload is a privacy risk.

I wanted to build something different. The result is LocalMediaKit — 25+ file processing tools where most operations never leave your browser.

Here's what I learned building it.

Running AI models in the browser

The background removal tool uses U2-Net-P (a lightweight variant of U2-Net) via ONNX Runtime Web. The model runs entirely in the browser — no GPU server, no API calls.

import * as ort from 'onnxruntime-web';

// Point to custom WASM paths
ort.env.wasm.wasmPaths = {
  wasm: '/models/ort-wasm-simd-threaded.wasm',
  mjs: '/models/ort-wasm-simd-threaded.mjs',
};

const session = await ort.InferenceSession.create(modelBuffer, {
  executionProviders: ['wasm'],
});
Enter fullscreen mode Exit fullscreen mode

The model is downloaded once and cached in IndexedDB (not Service Worker cache) — better suited for large binary files. Subsequent uses load instantly from local storage.

For large images, tile-based processing prevents out-of-memory crashes: split into overlapping tiles, process each independently, stitch back together.

PDF processing with WebAssembly

Merge, split, compress, sign — all handled client-side with pdf-lib. The PDF never leaves the browser:

import { PDFDocument } from 'pdf-lib';

const merged = await PDFDocument.create();
for (const file of files) {
  const doc = await PDFDocument.load(await file.arrayBuffer());
  const pages = await merged.copyPages(doc, doc.getPageIndices());
  pages.forEach(page => merged.addPage(page));
}
const output = await merged.save();
Enter fullscreen mode Exit fullscreen mode

Where I had to compromise

Office→PDF and PDF→Word require LibreOffice. There's no way around this — it can't run in a browser.

For these tools, files go through a server I control. Files are processed and immediately discarded, never stored. I'm upfront about this in the UI so users can make an informed choice.

Three things I learned

1. WASM multithreading headers are tricky.
SharedArrayBuffer (needed for multi-threaded WASM) requires Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy headers. The catch: these headers break OAuth popups and third-party embeds. You have to carefully scope them only to routes that need WASM threading, not globally.

2. IndexedDB beats Cache API for large models.
Service Worker Cache API has per-origin storage limits and eviction policies that can remove large model files unexpectedly. IndexedDB gives more control over persistence for large binary assets.

3. Vanilla JS scales better than expected.
No hydration overhead, no virtual DOM, no bundle size fighting. The whole app stays lean and fast without a framework.


If you're curious about the client-side AI pipeline, the tile-based processing approach, or the tradeoffs of running LibreOffice on a controlled server vs full client-side alternatives, happy to go deeper in the comments.

LocalMediaKit is free to try.

Top comments (0)