Edge infrastructure India work comes down to one thing: cut round trips.
Put Cloudflare CDN in front, run Cloudflare Workers for routing/auth short-circuits and cache control, keep your origin in AWS Mumbai, and use Redis for hot state.
Measure p95 by city/ISP, then tighten cache keys, warm critical paths, and cap retry storms.
Start with a latency budget you can defend
If you don’t set a budget per hop, you’ll “optimize” the wrong layer.
Define your target like an SRE, not a slide deck:
- User SLO: p95 end-to-end latency (and p99 if you have real SLAs).
- Breakdown: DNS + TLS + TTFB + payload.
- Scope: split by metro and ISP. India is not one network.
Minimum measurement plan
- RUM (Real User Monitoring) from browsers/apps. Tag requests with city, asn, isp if your RUM tool supports it.
- Synthetics from at least: Delhi NCR, Mumbai, Bengaluru, Chennai, Hyderabad, Kolkata.
- Server timing headers from origin so you can isolate backend time vs network time.
What you want to see on a single chart
- Edge TTFB (Cloudflare)
- Origin TTFB (Mumbai)
- Redis time (if used)
- App compute time
If you can’t see those separately, you’re flying blind.
Reference architecture: Cloudflare → Workers → AWS Mumbai → Redis
Lock the shape first so each knob has a clear home.
Here’s the stack you picked, with crisp ownership boundaries.
Client (India ISP)
|
| DNS + TLS + HTTP
v
Cloudflare Edge (CDN)
|
| Worker (routing, cache policy, auth short-circuit)
v
AWS Mumbai Origin (ALB/NLB -> app)
|
| hot state / rate limits / sessions
v
Redis (ElastiCache or self-managed)
What runs where (don’t mix this up)
|
Layer |
Do |
Don’t |
|
Cloudflare CDN |
cache static + cacheable API responses, terminate TLS, absorb spikes |
run business logic that needs DB writes |
|
Workers |
route, normalize headers, enforce cache keys, cheap auth gates, redirects |
call 5 downstream services from the edge |
|
AWS Mumbai origin |
serve uncached requests, durable logic, writes |
depend on “edge will save us” if origin is slow |
|
Redis |
sessions, rate limits, feature flags, hot lookups |
treat it like a source of truth |
Configure Cloudflare CDN like you mean it
CDN defaults are generic; your production app needs explicit cache rules.
The #1 reason “edge didn’t help” is this: you didn’t make responses cacheable. The difference between a generic CDN setup and a secure CDN solution is intentional cache design and strict isolation.Step 1: Classify endpoints
You can’t cache what you haven’t categorized.
Make three buckets:
- Static: JS/CSS/images/fonts. Cache hard.
- Semi-static: config, feature flags, catalog, “home feed” variants. Cache with short TTL + SWR.
- Dynamic: personalized, writes, payments. No cache.
Step 2: Control cache keys
Bad cache keys destroy hit ratio and spike origin load.
Rules of thumb:
- Strip tracking params (utm_*, fbclid, gclid) from cache keys.
- Don’t vary on cookies unless you must.
- If you must vary, vary on a small whitelist (e.g., plan_tier, locale), not the full cookie blob.
Step 3: Turn on “stale while revalidate” behavior
SWR converts origin spikes into background refresh.
If Cloudflare features are available in your plan, configure:
- short TTL for semi-static API responses (e.g., 30–120s)
- allow stale serve during refresh
This is how you keep p95 stable during origin deploys and brief Mumbai hiccups.
Step 4: Avoid cache poisoning and auth leakage
One sloppy header can cache private data for strangers.
Hard rules:
- Never cache responses that depend on Authorization unless you fully control the cache key and isolation model.
- Set Cache-Control: private for truly user-specific payloads.
- For “public but user-aware” endpoints, issue explicit cache keys based on a safe token (not raw auth headers).
Use Workers for short-circuits and cache policy, not heroics
Workers are the glue. Keep them small so you can reason for failure.
Workers shine for:
- request normalization (headers, query params)
- cheap routing decisions
- edge auth gating (basic, not deep)
- explicit cache behavior via caches.default
A Worker pattern that helps latency
Route and cache at the edge, then fall back cleanly to Mumbai.
export default {
async fetch(request, env, ctx) {
const url = new URL(request.url);
// Normalize cache-busting junk.
["utm_source","utm_medium","utm_campaign","gclid","fbclid"].forEach(p => url.searchParams.delete(p));
// Cheap gate. Block obvious abuse before it hits Mumbai.
const apiKey = request.headers.get("x-api-key");
if (url.pathname.startsWith("/api/") && !apiKey) {
return new Response("missing api key", { status: 401 });
}
// Only cache safe GETs.
if (request.method !== "GET") {
return fetch(new Request(url.toString(), request));
}
// Cache semi-static API endpoints for short TTL.
const isCacheableApi = url.pathname.startsWith("/api/catalog") || url.pathname.startsWith("/api/config");
if (!isCacheableApi) {
return fetch(new Request(url.toString(), request));
}
const cache = caches.default;
// Build a safe cache key. Keep it small.
const locale = request.headers.get("accept-language")?.split(",")[0] ?? "en";
const cacheKey = new Request(url.toString(), {
method: "GET",
headers: { "x-locale": locale }
});
let resp = await cache.match(cacheKey);
if (resp) return resp;
// Fetch origin, then cache it.
resp = await fetch(new Request(url.toString(), request), {
cf: { cacheTtl: 60, cacheEverything: true }
});
// Don’t cache errors.
if (resp.status >= 200 && resp.status < 300) {
ctx.waitUntil(cache.put(cacheKey, resp.clone()));
}
return resp;
}
}
What this does
- It routes only what’s safe.
- It caches only what you explicitly allow.
- It avoids caching auth-bound content.
- It keeps origin clean for truly dynamic calls.
Canary Workers safely
Edge bugs are global bugs.
Canary patterns that work:
- enable Worker only on a subset of paths
- enable by header: x-edge-canary: 1
- enable by % rollout based on a stable hash (cookie/session id)
Keep a kill switch. Script it. Don’t rely on “we can revert fast” during an incident.
Redis: keep hot state hot, and make it disposable
Redis saves RTT when used right; it becomes your bottleneck when used wrong.
Put the right data in Redis
Cache the things that are read-heavy and safe to lose.
Good Redis candidates:
- session tokens / session metadata
- rate limiting counters
- feature flags/config snapshots
- small lookup tables (tenant → plan, user → segment)
Bad Redis candidates:
- “primary database but faster”
- large blobs
- unbounded key growth
TTL strategy decides p95
TTL is a latency control knob.
Rules:
- Use short TTL for rapidly changing data (seconds to minutes).
- Use long TTL only if invalidation is correct.
- Never ship “no TTL” in a multi-tenant production system unless you enjoy OOM incidents.
Avoid hot keys and stampedes
One hot key can take down your whole cache layer.
Fixes:
- shard counters (key:{user}:{bucket}) instead of one global counter
- add jitter to TTL to avoid synchronized expiry
- use a single-flight pattern (only one origin fetch per key)
Add circuit breakers
When Redis is slow, fail fast and move on.
- Set client timeouts.
- Cap retries.
- If Redis is down, serve from edge cache or hit origin directly depending on endpoint criticality.
Don’t let Redis become a distributed queue by accident.
Harden the AWS Mumbai origin so edge doesn’t mask real slowness
Edge can cut distance; it can’t fix a slow backend.
Connection reuse matters
Most “Mumbai is slow” tickets are handshake overhead plus pool starvation.
Do the basics:
- keep-alive between ALB and pods/instances
- HTTP/2 where it makes sense
- right-size connection pools to DB and Redis
Make origin cacheable too
CDN misses still happen. Origin should be fast on repeat reads.
- Add in-process caches for ultra-hot config.
- Cache DB reads where correctness allows.
- Precompute expensive aggregates.
Scale the right layer
Scaling pods won’t fix a saturated DB pool.
Watch:
- request queue depth at load balancer
- DB connection wait time
- Redis latency percentiles
- CPU throttling if you set tight limits
Scale compute only when compute is the constraint. Everything else is noise.
Observability that catches edge failures fast
“Cache hit ratio” is not an SLO.
Track these as first-class metrics:
- p95/p99 latency at the edge (client-facing)
- origin TTFB for uncached routes
- cache hit/miss per route group
- Redis p95 latency and error rate
- retry rate (gRPC/HTTP clients)
- 5xx rate at edge and origin
Correlation trick that saves hours
- Add a request ID at the edge.
- Pass it to origin as a header.
- Log it in app + Redis calls.
- Now you can grep one request across layers.
Rollout plan that won’t torch production
Edge changes ship fast; that’s good until it isn’t.
A safe rollout looks like this:
- Deploy Worker behind a header flag.
- Canary with internal traffic and one low-risk route group.
- Measure p95 and origin offload.
- Scale rollout by % in steps.
- Rollback instantly if p95 moves the wrong way.
Also: test from India, not from your laptop in Europe. Pipe synthetic checks from real metros.
India-specific gotchas you should design for
India traffic punishes extra round trips and large payloads.
Common realities:
- Mobile networks with variable loss and jitter.
- ISPs with inconsistent peering.
- DNS resolution variance.
Practical fixes:
- keep payloads small (compress, trim JSON, avoid chatty endpoints)
- avoid multi-call fanout on the critical path
- cache aggressively where safe
- fail fast on slow dependencies to protect p99
The build checklist
If you can’t tick these off, you don’t have an edge setup—you have a proxy.
- RUM + synthetics split by metro/ISP
- Explicit cache rules for static + semi-static endpoints
- Worker only does routing/cache/auth short-circuit
- Redis keys have TTL, no hot-key stampedes
- Origin in AWS Mumbai is tuned for keep-alive and fast reads
- Kill switch for Worker rollout
- Dashboards show edge p95/p99 + origin TTFB + Redis p95
Top comments (0)