DEV Community

Cover image for I migrated my Python backend to Cloudflare Workers in 4 hours and got 12x speedup
Dmytro Oriekhov
Dmytro Oriekhov

Posted on

I migrated my Python backend to Cloudflare Workers in 4 hours and got 12x speedup

Two days ago I launched JobPilot AI on Product Hunt — a privacy-first job-search PWA. The frontend runs 100% in the browser, the backend is a small Python service that aggregates 11 job boards.

After the launch I watched the first real users hit "Live search" and wait. And wait. Then 30 seconds later the results came back.

Render's free tier sleeps after 15 minutes of inactivity. The first request after sleep takes 30-60 seconds for the container to spin up. For a product whose entire value prop is "fast and private", that's a fatal first impression.

So I decided to migrate to Cloudflare Workers. This is the story of that 4-hour port.

The numbers (after the migration)

Same query, same data sources, fresh test:

Endpoint Render (free, cold) CF Worker Speedup
/api/search 30.9 sec 2.6 sec 12x
/api/parse-url 22.8 sec 1.4 sec 16x

And the bonus:

  • Cold start: gone entirely. Workers boot in ~10ms.
  • Errors after 24 hours of production traffic: 0
  • Cost: $0. Free tier gives 100k requests/day, I'm at ~50/day.
  • Global: 320 edge locations vs Render's 1 US region.

What had to change

The Python backend was 670 lines covering 11 job sources, SSRF protection, rate limiting, caching, and HTML parsing. Here's how each piece translated.

ThreadPoolExecutorPromise.all

Python:

with ThreadPoolExecutor(max_workers=12) as pool:
    for f in [pool.submit(t) for t in tasks]:
        name, jobs, err = f.result()
        ...
Enter fullscreen mode Exit fullscreen mode

Workers JS:

const tasks = [
  fetchRemotive(kw, blocked),
  fetchArbeitnow(kw, blocked),
  fetchHimalayas(kw, blocked),
  // ...
];
const results = await Promise.all(tasks);
Enter fullscreen mode Exit fullscreen mode

One-liner. fetch() doesn't block the event loop, so 12 parallel HTTP calls cost almost the same as one.

urllib.requestfetch

Python's urllib makes you handle gzip manually:

raw = r.read(2_000_000)
enc = r.headers.get("Content-Encoding", "")
if enc == "gzip":
    raw = gzip.decompress(raw)
Enter fullscreen mode Exit fullscreen mode

Workers fetch handles compression for you:

const r = await fetch(url, { headers });
const text = await r.text();  // already decompressed
Enter fullscreen mode Exit fullscreen mode

HTMLParser → regex

Most pages needed only a handful of fields: <title>, og:title, og:description, JSON-LD hiringOrganization. For that, a regex is fine and 10x simpler to read than a stateful parser. Workers do ship an HTMLRewriter API for streaming HTML, but the page is bounded at 80KB anyway.

function extractMeta(html) {
  const title = html.match(/<title\b[^>]*>([\s\S]*?)<\/title>/i)?.[1];
  const ogTitle = pickAttr(html, "meta", "property", "og:title");
  // ...
}
Enter fullscreen mode Exit fullscreen mode

In-memory cache → Cloudflare Cache API

Python kept results in a dict with a threading.Lock. That dies on every cold start. Workers have a built-in edge cache:

async function cacheGet(key) {
  const req = new Request(`https://cache.local/${encodeURIComponent(key)}`);
  const hit = await caches.default.match(req);
  return hit ? await hit.json() : null;
}

async function cacheSet(key, value, ctx) {
  const req = new Request(`https://cache.local/${encodeURIComponent(key)}`);
  const res = new Response(JSON.stringify(value), {
    headers: { "Cache-Control": "max-age=300" },
  });
  ctx.waitUntil(caches.default.put(req, res));
}
Enter fullscreen mode Exit fullscreen mode

The synthetic URL is just a stable key. ctx.waitUntil lets the cache write happen after the response is already on its way to the user.

SSRF protection: simpler in Workers

In Python I had a manual SSRF guard checking DNS-resolved IPs against private ranges. In Workers, this is largely unnecessary: the runtime cannot reach private IP space at all. I still validate scheme and hostname to reject file://, localhost, .internal TLDs, but the heavy lifting is done for me by the platform.

function isSafeUrl(urlStr) {
  if (!urlStr || urlStr.length > 2048) return false;
  let u;
  try { u = new URL(urlStr); } catch { return false; }
  if (u.protocol !== "http:" && u.protocol !== "https:") return false;
  if (u.hostname === "localhost") return false;
  if (u.hostname.endsWith(".internal")) return false;
  // ...
}
Enter fullscreen mode Exit fullscreen mode

Rate limiting: dropped (for now)

Python had a per-IP sliding window with a threading.Lock. In Workers, in-process state is per-isolate and unreliable. The proper move is Durable Objects, but those aren't free anymore. For an indie product at 50 req/day, Cloudflare's built-in DDoS protection is enough — I'll add real rate limiting when traffic grows.

The deploy story

I did NOT cut over straight to the Worker. Instead:

  1. Built the Worker in parallel under a new URL (jobpilot-api.dima-orehov-id.workers.dev).
  2. Added a feature flag in the frontend: localStorage.api_backend = "worker" routes to the new backend; absence routes to Render.
  3. A/B tested with myself for an hour — verified that all 10 sources return the same shape of data.
  4. Flipped the default in code: the frontend now defaults to Worker, with a localStorage.api_backend = "render" escape hatch.
  5. Kept the Render service alive for one week as a hot fallback.

If the Worker had blown up at step 4, the rollback was a one-line code change plus a redeploy — total recovery time under 10 minutes. Render staying up meant the escape hatch actually worked.

What I'd do differently

  • Open source the migration diff. I kept the repo private during launch. In hindsight, an open worker.js would have been a stronger artifact for this very article.
  • Test with Cyrillic queries earlier. I auto-translate Cyrillic search terms to English using MyMemory's free API; that path wasn't smoke-tested until later.
  • Drop dead sources first. Three job boards (Findwork 401, No Fluff Jobs 403, EuroJobs 404) were broken before the migration; I ported them anyway, then ripped them out and replaced with Working Nomads + RemoteOK API + Jobicy v2 JSON. Cleaner result.

Cost comparison

Render free CF Workers free
Daily request budget "until it sleeps" 100,000
Cold start 30-60 sec 0
Global edge 1 region 320 locations
In-built KV/Cache No Yes
Cost above free $7/mo $5/mo (10M req)

For low-traffic indie products, Workers free tier is borderline absurd. 100k requests/day is enough for 10k daily users at 10 req/user. By the time you outgrow it, you've already won.

Try it

Live demo: jobpilot-ai.pages.dev — click "Live search", open DevTools → Network, watch the request hit *.workers.dev.

Happy to answer questions about the port, the privacy model, or running production on free tiers.

Top comments (2)

Collapse
 
harjjotsinghh profile image
Harjot Singh

12x is a great result, and the honest read is that a lot of it is the edge/cold-start model rather than raw compute - Workers killing the per-request container spin-up and running close to the user is where the latency win usually comes from, not Python-vs-JS execution speed. Worth being clear on that for readers, because the gain is real but it comes from the deployment topology (V8 isolates, no cold start, edge locality) more than the language swap, and that framing helps people predict whether their own workload will see the same multiple.

The 4-hours part is the underrated headline though - the migration being fast is the deploy/runtime model being simpler, not the code being trivial. That "make the boring infra part fast and correct" is exactly the gap I work on with Moonshift - a multi-agent pipeline that takes a prompt to a deployed SaaS on your own GitHub + Vercel, with the deploy/runtime wiring handled as verified defaults so the last mile isn't where you lose the day. Multi-model routing keeps a full build ~$3 flat, first run's free no card. Solid writeup. Did you hit the Workers CPU-time / no-long-running-process limits at all, or did your workload fit the request-scoped model cleanly? That's usually the deciding factor on whether the migration holds up past the demo.

Collapse
 
dmytrooriekhov profile image
Dmytro Oriekhov

Hey, thanks for the thoughtful read. You're 100% right and I should have framed it better in the post — the real driver is the deployment topology, not Python-vs-JS.

Specifically:

  1. Render free-tier had ~30 sec container cold-start after 15 min of idle. Workers boot V8 isolates in ~5ms with no container at all. That alone is most of the 12×.

  2. Edge locality matters too — Render is single-region (US), Workers run on the closest CF POP to the user. For my EU/UA traffic that's another ~200-400ms saved per request.

  3. Language execution is actually a wash — both Python and JS make the same outbound fetch() calls to job board APIs, which are network-bound. Interpreter overhead is microseconds either way.

So readers whose workload is already on a hot, edge-distributed runtime won't see the same multiple. The win is migrating from a sleepy free-tier container to an always-warm edge runtime. Good point to highlight.

To your CPU / no-long-running question: fits cleanly. The workload is fully request-scoped — fan out to 11 job board APIs via Promise.all, dedupe + filter, return. Almost all wall time is network I/O (fetch waiting), which doesn't count against CPU budget. Actual CPU per request is ~5-10ms. Caching via caches.default with 5-min TTL. Anything that would need a background process (per-IP rate limit counters, etc.) goes to KV or got dropped intentionally.

What would break this model: stateful scraping sessions, or anything >50ms CPU per request (e.g. on-device LLM inference vs the Workers AI binding I'm using). I stayed shy of both on purpose.

Re: Moonshift — appreciate the share, will take a look. The "verified defaults for the last mile" pitch resonates; that was exactly where I was thrashing on day 2.