Daya Shankar

Posted on Feb 12

How to Set Up Edge Infrastructure for Low-Latency Production Apps in India

#cloud #cdn #cloudcomputing

Edge infrastructure India work comes down to one thing: cut round trips.

Put Cloudflare CDN in front, run Cloudflare Workers for routing/auth short-circuits and cache control, keep your origin in AWS Mumbai, and use Redis for hot state.

Measure p95 by city/ISP, then tighten cache keys, warm critical paths, and cap retry storms.

Start with a latency budget you can defend

If you don’t set a budget per hop, you’ll “optimize” the wrong layer.

Define your target like an SRE, not a slide deck:

User SLO: p95 end-to-end latency (and p99 if you have real SLAs).
Breakdown: DNS + TLS + TTFB + payload.
Scope: split by metro and ISP. India is not one network.

Minimum measurement plan

RUM (Real User Monitoring) from browsers/apps. Tag requests with city, asn, isp if your RUM tool supports it.
Synthetics from at least: Delhi NCR, Mumbai, Bengaluru, Chennai, Hyderabad, Kolkata.
Server timing headers from origin so you can isolate backend time vs network time.

What you want to see on a single chart

Edge TTFB (Cloudflare)
Origin TTFB (Mumbai)
Redis time (if used)
App compute time

If you can’t see those separately, you’re flying blind.

Reference architecture: Cloudflare → Workers → AWS Mumbai → Redis

Lock the shape first so each knob has a clear home.

Here’s the stack you picked, with crisp ownership boundaries.

Client (India ISP)
|
| DNS + TLS + HTTP
v
Cloudflare Edge (CDN)
|
| Worker (routing, cache policy, auth short-circuit)
v
AWS Mumbai Origin (ALB/NLB -> app)
|
| hot state / rate limits / sessions
v
Redis (ElastiCache or self-managed)

What runs where (don’t mix this up)

Layer	Do	Don’t
Cloudflare CDN	cache static + cacheable API responses, terminate TLS, absorb spikes	run business logic that needs DB writes
Workers	route, normalize headers, enforce cache keys, cheap auth gates, redirects	call 5 downstream services from the edge
AWS Mumbai origin	serve uncached requests, durable logic, writes	depend on “edge will save us” if origin is slow
Redis	sessions, rate limits, feature flags, hot lookups	treat it like a source of truth

Configure Cloudflare CDN like you mean it

CDN defaults are generic; your production app needs explicit cache rules.

The #1 reason “edge didn’t help” is this: you didn’t make responses cacheable. The difference between a generic CDN setup and a secure CDN solution is intentional cache design and strict isolation.Step 1: Classify endpoints

You can’t cache what you haven’t categorized.

Make three buckets:

Static: JS/CSS/images/fonts. Cache hard.
Semi-static: config, feature flags, catalog, “home feed” variants. Cache with short TTL + SWR.
Dynamic: personalized, writes, payments. No cache.

Step 2: Control cache keys

Bad cache keys destroy hit ratio and spike origin load.

Rules of thumb:

Strip tracking params (utm_*, fbclid, gclid) from cache keys.
Don’t vary on cookies unless you must.
If you must vary, vary on a small whitelist (e.g., plan_tier, locale), not the full cookie blob.

Step 3: Turn on “stale while revalidate” behavior

SWR converts origin spikes into background refresh.

If Cloudflare features are available in your plan, configure:

short TTL for semi-static API responses (e.g., 30–120s)
allow stale serve during refresh

This is how you keep p95 stable during origin deploys and brief Mumbai hiccups.

Step 4: Avoid cache poisoning and auth leakage

One sloppy header can cache private data for strangers.

Hard rules:

Never cache responses that depend on Authorization unless you fully control the cache key and isolation model.
Set Cache-Control: private for truly user-specific payloads.
For “public but user-aware” endpoints, issue explicit cache keys based on a safe token (not raw auth headers).

Use Workers for short-circuits and cache policy, not heroics

Workers are the glue. Keep them small so you can reason for failure.

Workers shine for:

request normalization (headers, query params)
cheap routing decisions
edge auth gating (basic, not deep)
explicit cache behavior via caches.default

A Worker pattern that helps latency

Route and cache at the edge, then fall back cleanly to Mumbai.

export default {
async fetch(request, env, ctx) {
const url = new URL(request.url);

// Normalize cache-busting junk.
["utm_source","utm_medium","utm_campaign","gclid","fbclid"].forEach(p => url.searchParams.delete(p));

// Cheap gate. Block obvious abuse before it hits Mumbai.
const apiKey = request.headers.get("x-api-key");
if (url.pathname.startsWith("/api/") && !apiKey) {
return new Response("missing api key", { status: 401 });
}

// Only cache safe GETs.
if (request.method !== "GET") {
return fetch(new Request(url.toString(), request));
}

// Cache semi-static API endpoints for short TTL.
const isCacheableApi = url.pathname.startsWith("/api/catalog") || url.pathname.startsWith("/api/config");
if (!isCacheableApi) {
return fetch(new Request(url.toString(), request));
}

const cache = caches.default;

// Build a safe cache key. Keep it small.
const locale = request.headers.get("accept-language")?.split(",")[0] ?? "en";
const cacheKey = new Request(url.toString(), {
method: "GET",
headers: { "x-locale": locale }
});

let resp = await cache.match(cacheKey);
if (resp) return resp;

// Fetch origin, then cache it.
resp = await fetch(new Request(url.toString(), request), {
cf: { cacheTtl: 60, cacheEverything: true }
});

// Don’t cache errors.
if (resp.status >= 200 && resp.status < 300) {
ctx.waitUntil(cache.put(cacheKey, resp.clone()));
}
return resp;
}
}

What this does

It routes only what’s safe.
It caches only what you explicitly allow.
It avoids caching auth-bound content.
It keeps origin clean for truly dynamic calls.

Canary Workers safely

Edge bugs are global bugs.

Canary patterns that work:

enable Worker only on a subset of paths
enable by header: x-edge-canary: 1
enable by % rollout based on a stable hash (cookie/session id)

Keep a kill switch. Script it. Don’t rely on “we can revert fast” during an incident.

Redis: keep hot state hot, and make it disposable

Redis saves RTT when used right; it becomes your bottleneck when used wrong.

Put the right data in Redis

Cache the things that are read-heavy and safe to lose.

Good Redis candidates:

session tokens / session metadata
rate limiting counters
feature flags/config snapshots
small lookup tables (tenant → plan, user → segment)

Bad Redis candidates:

“primary database but faster”
large blobs
unbounded key growth

TTL strategy decides p95

TTL is a latency control knob.

Rules:

Use short TTL for rapidly changing data (seconds to minutes).
Use long TTL only if invalidation is correct.
Never ship “no TTL” in a multi-tenant production system unless you enjoy OOM incidents.

Avoid hot keys and stampedes

One hot key can take down your whole cache layer.

Fixes:

shard counters (key:{user}:{bucket}) instead of one global counter
add jitter to TTL to avoid synchronized expiry
use a single-flight pattern (only one origin fetch per key)

Add circuit breakers

When Redis is slow, fail fast and move on.

Set client timeouts.
Cap retries.
If Redis is down, serve from edge cache or hit origin directly depending on endpoint criticality.

Don’t let Redis become a distributed queue by accident.

Harden the AWS Mumbai origin so edge doesn’t mask real slowness

Edge can cut distance; it can’t fix a slow backend.

Connection reuse matters

Most “Mumbai is slow” tickets are handshake overhead plus pool starvation.

Do the basics:

keep-alive between ALB and pods/instances
HTTP/2 where it makes sense
right-size connection pools to DB and Redis

Make origin cacheable too

CDN misses still happen. Origin should be fast on repeat reads.

Add in-process caches for ultra-hot config.
Cache DB reads where correctness allows.
Precompute expensive aggregates.

Scale the right layer

Scaling pods won’t fix a saturated DB pool.

Watch:

request queue depth at load balancer
DB connection wait time
Redis latency percentiles
CPU throttling if you set tight limits

Scale compute only when compute is the constraint. Everything else is noise.

Observability that catches edge failures fast

“Cache hit ratio” is not an SLO.

Track these as first-class metrics:

p95/p99 latency at the edge (client-facing)
origin TTFB for uncached routes
cache hit/miss per route group
Redis p95 latency and error rate
retry rate (gRPC/HTTP clients)
5xx rate at edge and origin

Correlation trick that saves hours

Add a request ID at the edge.
Pass it to origin as a header.
Log it in app + Redis calls.
Now you can grep one request across layers.

Rollout plan that won’t torch production

Edge changes ship fast; that’s good until it isn’t.

A safe rollout looks like this:

Deploy Worker behind a header flag.
Canary with internal traffic and one low-risk route group.
Measure p95 and origin offload.
Scale rollout by % in steps.
Rollback instantly if p95 moves the wrong way.

Also: test from India, not from your laptop in Europe. Pipe synthetic checks from real metros.

India-specific gotchas you should design for

India traffic punishes extra round trips and large payloads.

Common realities:

Mobile networks with variable loss and jitter.
ISPs with inconsistent peering.
DNS resolution variance.

Practical fixes:

keep payloads small (compress, trim JSON, avoid chatty endpoints)
avoid multi-call fanout on the critical path
cache aggressively where safe
fail fast on slow dependencies to protect p99

The build checklist

If you can’t tick these off, you don’t have an edge setup—you have a proxy.

RUM + synthetics split by metro/ISP
Explicit cache rules for static + semi-static endpoints
Worker only does routing/cache/auth short-circuit
Redis keys have TTL, no hot-key stampedes
Origin in AWS Mumbai is tuned for keep-alive and fast reads
Kill switch for Worker rollout
Dashboards show edge p95/p99 + origin TTFB + Redis p95

DEV Community