Mahmud Rahman

Posted on Feb 18

I Built an AI Reply, Enhance & Translate Assistant, Browser Extension + Web App + Mobile (Open Source)

#webdev #mobile #node #ai

I Built an AI Reply, Enhance & Translate Assistant, Browser Extension + Web App + Mobile (Open Source)

Most developers have, at some point, written the same reply template for the tenth time that week. Or slowly rewritten a perfectly fine email into something that sounds more professional. Or copied text into Google Translate, copied the output back, and wondered why this is still a manual process in 2025.

I got tired of it. So I built something.

Smart Reply, Enhance & Translate is a full-stack AI tool suite — a browser extension, a React web app, and a Flutter mobile app — all backed by a single Node.js server that talks to LLMs via OpenRouter. The whole project is open source, and this post is a walkthrough of what it does, how it's built, and why I made some of the architectural decisions I did.

What It Does

The project has three modes, each available across all three client interfaces:

Smart Reply — you paste a message you've received, and the tool generates 4 context-aware reply suggestions. Useful for support tickets, emails, DMs, anything where you're responding to someone else.

Smart Enhance — you paste your own draft text and get 4 improved variations. Think grammar correction, better clarity, tighter structure, more natural flow. Like Grammarly, but generating alternatives instead of correcting inline.

Smart Translate — paste text, pick a target language, and get 4 translated variations in your chosen tone. The tool supports English, Spanish, French, German, Chinese, Arabic, Bengali, and more.

For all three modes, you can pick from 6 response formats: Professional, Friendly, Casual, Formal, Flirty, and Romantic. Each one changes not just vocabulary but the entire register and feel of the output.

Architecture Overview

smart-reply/
├── backend/           # Node.js + Express — the shared API layer
├── extension/         # Chrome/Firefox (Manifest V3) — vanilla JS
├── frontend/          # React + Zustand + Tailwind + Framer Motion
└── smart_reply_app/   # Flutter + Provider — Android

Three clients, one backend. Every client hits the same three endpoints, and the backend handles model selection, prompt construction, caching, and response formatting.

This was an intentional decision. I didn't want three separate AI integrations to maintain. The backend owns all the LLM logic, and the clients stay thin.

The Backend

The Express server exposes three endpoints:

POST /api/suggest-reply     — Smart Reply mode
POST /api/enhance-text      — Smart Enhance mode
POST /api/translate-text    — Smart Translate mode
GET  /health                — Health check

Each endpoint receives a text payload and a format string, constructs a structured prompt, calls OpenRouter, and returns an array of suggestions/enhancements/translations.

Model Selection Strategy

One thing I wanted to avoid was routing every request to the same model. Different operations benefit from different model strengths. In v0.4, the backend selects from a priority-ordered list of free OpenRouter models per operation type:

// Simplified — actual implementation uses a weighted selector
const models = {
  suggest:   ['xiaomi/mimo-v2-flash:free', 'tngtech/deepseek-r1t2-chimera:free', 'openai/gpt-oss-20b:free'],
  enhance:   ['tngtech/deepseek-r1t2-chimera:free', 'xiaomi/mimo-v2-flash:free', 'openai/gpt-oss-20b:free'],
  translate: ['openai/gpt-oss-20b:free', 'xiaomi/mimo-v2-flash:free', 'tngtech/deepseek-r1t2-chimera:free'],
};

The system is LLM-agnostic. If you have a paid OpenRouter key, you can swap in any model you want. The backend doesn't care.

The Caching System (v0.4)

This was the most meaningful engineering work in v0.4. Before it, every request hit OpenRouter — even if the user had just asked the exact same question 30 seconds ago.

The new caching layer uses SHA256 hashing to generate collision-resistant cache keys from the full prompt + model identifier. Entries expire after 5 minutes (TTL).

const cacheKey = sha256(prompt + modelId);

if (cache.has(cacheKey)) {
  return cache.get(cacheKey); // instant response
}

const result = await callOpenRouter(prompt, modelId);
cache.set(cacheKey, result, { ttl: 300 });
return result;

Why SHA256 and not just a string hash?

Simple string hashing produces collisions at scale. Two different prompts can generate the same hash, which means one user gets another user's cached response. SHA256 eliminates this entirely.

Results from v0.4:

Metric	Before	After
API Cache Hits	0%	~60%
Response Size	100%	~32%
Cache Key Collisions	High	0%
React Re-renders	High	−40–50%
Zombie Requests	Common	Eliminated

The Browser Extension

The extension uses Manifest V3 with vanilla JavaScript — no framework. This was deliberate. Extensions need to be small, fast, and predictable. A React bundle in an extension popup is overkill and adds startup latency.

Key extension features:

Direct text injection — click a suggestion and it inserts directly into the active text field (email composer, chat input, etc.)
Context menu integration — right-click selected text to translate it instantly
Keyboard shortcut — Ctrl+Shift+T triggers the translation flow without opening the popup
Auto-detection — the extension detects selected text on the page and pre-populates the input

The extension communicates with the backend via the user-configured API URL, stored in chrome.storage.sync. This means settings persist across devices when signed into Chrome.

The React Web App

The web frontend uses:

React for the component tree
Zustand for state management — lightweight, no boilerplate, perfect for this scale
Tailwind CSS for styling — utility-first, consistent, fast to iterate
Framer Motion for animations — the mode-switching transition and result card animations make the experience feel polished
Lucide React for icons

One detail I spent time on: the textarea auto-resizes as you type. It sounds trivial but it matters for long inputs — a fixed-height box that scrolls feels unfinished.

Keyboard shortcut Ctrl/Cmd + Enter submits the form, which keeps the interaction flow close to how developers already work.

The Flutter Mobile App

The mobile app uses Flutter with Provider for state management. It hits the same backend endpoints, so every feature available on web is available on mobile.

One configuration note: Android Emulators use 10.0.2.2 to refer to the host machine's localhost. Physical devices need the actual LAN IP of your dev machine.

// lib/utils/constants.dart
const String baseUrl = 'http://10.0.2.2:5006/api'; // emulator
// const String baseUrl = 'http://192.168.1.X:5006/api'; // physical device

Running It Locally

Backend (Option A — local):

cd backend
# Create .env with OPENROUTER_API_KEY and PORT=5006
npm install && npm start

Backend (Option B — Docker):

docker build -t smart-reply-backend ./backend
docker run -d -p 5006:5006 --env OPENROUTER_API_KEY=your_key smart-reply-backend

Web frontend:

cd frontend
# Create .env with VITE_API_ENDPOINT=http://localhost:5006/api
npm install && npm run dev

Browser extension: Load unpacked from chrome://extensions → select the extension/ folder.

Flutter app:

cd smart_reply_app
flutter pub get && flutter run

API Quick Reference

All endpoints accept JSON and return arrays of suggestions:

# Smart Reply
curl -X POST http://localhost:5006/api/suggest-reply \
  -H "Content-Type: application/json" \
  -d '{"message": "Can we push the deadline?", "format": "professional"}'

# Smart Enhance
curl -X POST http://localhost:5006/api/enhance-text \
  -H "Content-Type: application/json" \
  -d '{"text": "plz fix the bug its urgent", "format": "professional"}'

# Smart Translate
curl -X POST http://localhost:5006/api/translate-text \
  -H "Content-Type: application/json" \
  -d '{"text": "Good morning!", "format": "friendly", "language": "spanish"}'

Security (v0.4 Additions)

A few security hardening steps went in with v0.4 that are worth calling out:

Security headers — XSS protection, X-Frame-Options (clickjacking prevention), Content-Type enforcement
Strict input validation — all inputs are validated server-side before being passed to the LLM prompt
Request cancellation — in-flight requests are cancelled when a new request comes in, preventing race conditions and stale responses ("zombie requests")

What's Next

A few things I'm thinking about for future versions:

Streaming responses — returning suggestions one by one as they generate rather than waiting for the full array
A history panel in the web app — saved past suggestions with search
Firefox extension compatibility improvements (Manifest V3 support in Firefox is still a bit fragile)
More languages for translation
A Claude / Gemini / local model option alongside OpenRouter

Contributing

This is fully open source and contributions are welcome. The codebase is clean and reasonably well-organized, and the README covers everything you need to get started.

If you find a bug, want to add a language, improve the caching logic, or build out a new client, open an issue or a PR.

🔗 GitHub: github.com/mahmud-r-farhan/smart-reply

🌐 Live Demo: smart-reply-delta.vercel.app

Closing Thought

The most satisfying tools to build are the ones that remove friction from things you do every day. This project isn't trying to be a platform. It's trying to be genuinely useful in the 30 seconds you spend writing a reply you've written before.

If you find it useful, a ⭐ on GitHub goes a long way.

And if you've built something similar or have ideas for where this could go, drop a comment. I read them all.

DEV Community

I Built an AI Reply, Enhance & Translate Assistant, Browser Extension + Web App + Mobile (Open Source)

I Built an AI Reply, Enhance & Translate Assistant, Browser Extension + Web App + Mobile (Open Source)

What It Does

Architecture Overview

The Backend

Model Selection Strategy

The Caching System (v0.4)

The Browser Extension

The React Web App

The Flutter Mobile App

Running It Locally

API Quick Reference

Security (v0.4 Additions)

What's Next

Contributing

Closing Thought

Top comments (0)