Most file conversion tools upload your files to a remote server, process them, and send them back. That means your data leaves your device — which can be a problem when you’re working with sensitive documents.
I wanted to explore a more privacy‑friendly approach: do as much as possible directly in the browser, and only fall back to server processing when the web platform can’t do the job well. This “Hybrid (Browser‑First)” model is what I used to build FastlyConvert — a multi‑format conversion & compression suite.
- Website: https://www.fastlyconvert.com
- Privacy policy: https://www.fastlyconvert.com/privacy
The Architecture: What Runs Where
Not every conversion can happen client‑side. Browsers are great at image operations, but heavier workloads (e.g., large video transcoding, speech recognition) still require server compute for good UX and reliability.
Here’s how I split the work today:
| Conversion Type | Where it Runs | Technology |
|---|---|---|
| Image format (JPG ↔ PNG ↔ WebP) | 100% browser | Canvas API + toBlob()
|
| Image resize / compress | 100% browser | Canvas API + (OffscreenCanvas where available) |
| HEIC → JPG/PNG | 100% browser | WebAssembly (e.g., heic2any) |
| Video compression / conversion | Server‑side | FFmpeg |
| Audio format conversion | Server‑side | FFmpeg |
| Audio/Video → Text | Server‑side | Whisper (speech‑to‑text) |
| Text → Speech | Server‑side | OpenAI TTS (MP3 output) |
Rule of thumb: if the Web API can handle it natively, keep it in the browser. Everything else goes to the server with strict privacy controls (e.g., temporary storage + auto‑deletion).
Client‑Side Image Conversion with Canvas API
The simplest conversion — say JPG → PNG — needs surprisingly little code:
async function convertImage(file, targetFormat) {
const img = new Image();
const url = URL.createObjectURL(file);
return new Promise((resolve) => {
img.onload = () => {
const canvas = document.createElement("canvas");
canvas.width = img.naturalWidth;
canvas.height = img.naturalHeight;
const ctx = canvas.getContext("2d");
ctx.drawImage(img, 0, 0);
canvas.toBlob(
(blob) => {
URL.revokeObjectURL(url);
resolve(blob);
},
`image/${targetFormat}`,
0.92
);
};
img.src = url;
});
}
Key gotcha: the quality parameter only applies to JPEG and WebP. PNG is always lossless — so “quality” won’t change file size much for PNG outputs.
Handling HEIC (iPhone Photos) in the Browser
HEIC has been the default photo format on iPhones for years, but most browsers can’t decode HEIC natively. For this, a WebAssembly approach works well.
I used heic2any (WASM‑based):
https://github.com/nicolo-ribaudo/heic2any
import heic2any from "heic2any";
async function convertHeic(file) {
const blob = await heic2any({
blob: file,
toType: "image/jpeg",
quality: 0.92,
});
return blob;
}
This runs entirely in the browser — no server upload needed — which is exactly the kind of task where browser‑first shines.
Server‑Side: Video/Audio Processing with FFmpeg
For large video compression and audio transcoding, the browser can do it in theory (WASM FFmpeg exists), but in practice it’s often:
- too slow on low‑end devices,
- too memory‑heavy for big files,
- and hard to make reliable across browsers.
So I run FFmpeg server‑side for video/audio tasks, and focus on:
- clear presets (e.g., quality vs size),
- predictable outputs (e.g., MP4 H.264 as the default),
- and privacy policies (auto‑deletion).
Example UX pattern that worked well: give users 3–4 “compression modes” (High Quality / Balanced / Max Compress) instead of asking them to tune bitrate and CRF on day one.
Server‑Side: AI Transcription with Whisper (Audio/Video → Text)
For speech recognition, browser‑only options still don’t match Whisper’s quality and language coverage at scale.
The key architectural decisions I made:
- Process only what the user requests (no extra analysis).
- Auto‑delete uploaded files after a short retention window.
- Keep the API simple, and return clean text + language metadata.
Pseudo‑code sketch:
@app.post("/api/transcribe")
async def transcribe(file: UploadFile):
temp_path = save_temp(file)
# schedule deletion
schedule_delete(temp_path, hours=24)
result = whisper_model.transcribe(
temp_path,
task="transcribe",
language=detect_language(temp_path)
)
return {"text": result["text"], "language": result["language"]}
i18n: Supporting 7 Languages Without a Framework
FastlyConvert is plain HTML + vanilla JS (no React/Next). For i18n, I used a simple attribute‑based approach:
document.querySelectorAll("[data-i18n]").forEach((el) => {
const key = el.getAttribute("data-i18n");
el.textContent = translations[currentLang][key] || el.textContent;
});
This supports:
- English (
en) - French (
fr) - Japanese (
ja) - Spanish (
es) - Portuguese (
pt) - Simplified Chinese (
zh-CN) - Traditional Chinese (
zh-TW)
No build step. No runtime framework. Just static pages + lightweight scripts.
Performance: Why No Build Step
Yes — it’s “old school”, but it’s extremely effective for SEO‑driven tool pages:
- Users land directly on specific tools (e.g.,
/video-compressor,/text-to-speech) - Static files are edge‑cached (Vercel/CDN)
- No framework hydration overhead
- Fewer moving parts = fewer deploy surprises
The tradeoff is duplication across many HTML pages, but for a conversion suite where speed + SEO matter most, it’s a tradeoff I’m happy with.
Key Takeaways
- Use Canvas API for common image conversions — no server needed.
- WebAssembly (like
heic2any) unlocks formats browsers can’t decode. - For heavy tasks, a hybrid browser‑first approach gives better UX than “all‑server” or “all‑WASM”.
- Auto‑deletion policies are non‑negotiable for file processing products.
- Vanilla HTML/JS can still win on performance for tool‑style websites.
If you’re building something similar, feel free to check out FastlyConvert and see these patterns in action:
https://www.fastlyconvert.com
What’s your approach to client‑side file processing? Have you shipped a WASM converter in production?
Top comments (0)