As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!
Think of a popular bakery that only has enough oven space to make one hundred loaves of bread every hour. If two hundred people show up at once, the system breaks. The baker gets overwhelmed, some customers leave with nothing, and everything grinds to a halt. In the world of software, our servers are the baker, and requests for data are the customers. A distributed rate limiter is the system that manages the line, ensuring everyone gets served fairly and the bakery doesn't catch fire.
I build these systems to protect applications. Without them, a sudden flood of traffic or a malicious attack can exhaust resources, causing outages for everyone. The goal is to be firm but fair: allow normal traffic to flow smoothly while politely saying "please wait a moment" to anything excessive.
At the heart of many modern solutions is an idea called the token bucket. Picture a bucket that holds tokens. Every time a request comes in, it needs to take a token from the bucket to proceed. The bucket is constantly being refilled at a steady rate, say ten tokens per second. This gives us a smooth, average limit. The clever part is the bucket's size. If it's large, it can hold many tokens, allowing for brief bursts of traffic above the average rate when the bucket is full. This matches real-world usage, where users might click quickly several times in a row before pausing.
Here is a basic, single-server idea of that bucket.
type LocalTokenBucket struct {
capacity float64 // Max tokens the bucket can hold
tokens float64 // Current token count
fillRate float64 // Tokens added per second
lastRefill time.Time
mu sync.Mutex
}
func (b *LocalTokenBucket) Allow() bool {
b.mu.Lock()
defer b.mu.Unlock()
now := time.Now()
// Calculate how many tokens to add since last check
elapsed := now.Sub(b.lastRefill).Seconds()
b.tokens = math.Min(b.capacity, b.tokens + (elapsed * b.fillRate))
b.lastRefill = now
if b.tokens >= 1.0 {
b.tokens -= 1.0
return true // Token available, request allowed
}
return false // No tokens, request denied
}
This works perfectly on one machine. But today's applications are rarely on just one machine. They run across dozens or hundreds of servers. If each server has its own bucket, a user could send requests to ten different servers and get ten tokens instantly, bypassing the limit entirely. We need all our servers to share one common understanding of the bucket's level. This is where Redis, a fast in-memory data store, becomes essential. It acts as our shared brain.
The challenge is coordination. We cannot have two servers check the token count at the exact same time and both decide a token is available. That would be a classic race condition. We need the check for a token and the act of taking one to be a single, unbreakable operation. In Redis, we use Lua scripts to achieve this atomicity. The script runs completely inside Redis, guaranteeing no other command interrupts it.
Here is the core of our distributed token bucket, implemented as a Lua script executed in Redis.
func (drl *DistributedRateLimiter) tokenBucketLuaScript() string {
return `
local key = KEYS[1] -- e.g., 'ratelimit:user123'
local now = tonumber(ARGV[1]) -- Current timestamp
local fill_rate = tonumber(ARGV[2]) -- Tokens per second
local capacity = tonumber(ARGV[3]) -- Bucket size
local ttl = tonumber(ARGV[4]) -- How long to keep this key
-- Get the bucket's current state from Redis
local bucket = redis.call('HMGET', key, 'tokens', 'last_update')
local tokens = tonumber(bucket[1])
local last_update = tonumber(bucket[2])
-- If this is a new bucket, fill it to capacity
if tokens == nil then
tokens = capacity
last_update = now
end
-- Add new tokens based on time passed
local elapsed = now - last_update
local new_tokens = elapsed * fill_rate
if new_tokens > 0 then
tokens = math.min(capacity, tokens + new_tokens)
last_update = now
end
-- Can we take a token?
if tokens >= 1.0 then
tokens = tokens - 1.0
-- Save the updated state back to Redis
redis.call('HMSET', key, 'tokens', tokens, 'last_update', last_update)
redis.call('EXPIRE', key, ttl)
return 1 -- Allowed
end
-- Not enough tokens, but still update the state
redis.call('HMSET', key, 'tokens', tokens, 'last_update', last_update)
redis.call('EXPIRE', key, ttl)
return 0 -- Denied
`
}
In this script, everything happens in sequence inside Redis. It reads the current state, calculates new tokens, decides if one can be taken, and writes back the new state—all before any other command from any other server can touch the same key. This gives us our consistent, distributed limit.
The token bucket is excellent for smooth, burstable limits. But sometimes you need different behavior. A common alternative is the fixed window. Imagine a counter that resets every minute. It allows up to one hundred requests per minute, no matter when they arrive. The problem is that if a hundred requests come at 59 seconds past the minute, and another hundred at 0 seconds of the next minute, you get two hundred requests in two seconds. This is a boundary issue.
We implement a fixed window in Redis quite simply.
func (drl *DistributedRateLimiter) fixedWindow(ctx context.Context, key string, limit int64, window time.Duration) (bool, error) {
// Create a key that includes the current window's start time
windowStart := time.Now().Truncate(window)
redisKey := fmt.Sprintf("%s:window:%d", key, windowStart.Unix())
// Increment the counter for this window
currentCount, err := drl.redisClient.Incr(ctx, redisKey).Result()
if err != nil {
return false, err
}
// If this is the first request in the window, set its expiry
if currentCount == 1 {
drl.redisClient.Expire(ctx, redisKey, window + 10*time.Second)
}
return currentCount <= limit, nil
}
To solve the boundary issue of fixed windows, we can use a sliding window. It's more complex but more accurate. It keeps a record of individual request timestamps over the last time window. To check a request, we remove all timestamps older than the window, then count how many are left.
This is implemented using a Redis Sorted Set, where the score is the timestamp.
func (drl *DistributedRateLimiter) slidingWindow(ctx context.Context, key string, limit int64, window time.Duration) (bool, error) {
redisKey := fmt.Sprintf("%s:sliding", key)
now := time.Now()
cutoff := now.Add(-window).UnixNano()
// 1. Remove all requests older than the window
drl.redisClient.ZRemRangeByScore(ctx, redisKey, "0", fmt.Sprintf("%d", cutoff))
// 2. Count requests still in the window
count, err := drl.redisClient.ZCard(ctx, redisKey).Result()
if err != nil {
return false, err
}
if count >= limit {
return false, nil // Over the limit
}
// 3. Add the new request's timestamp
drl.redisClient.ZAdd(ctx, redisKey, &redis.Z{
Score: float64(now.UnixNano()),
Member: fmt.Sprintf("%d:%s", now.UnixNano(), randomString(8)), // Unique member
})
// Keep the key alive slightly longer than the window
drl.redisClient.Expire(ctx, redisKey, window + 10*time.Second)
return true, nil
}
Calling Redis for every single request can become a bottleneck. To reduce load and latency, I add a local cache on each application server. If we recently allowed a request for user "X", we'll probably allow the next one very soon after. We can cache that "allowed" decision locally for a short time.
type LocalCache struct {
mu sync.RWMutex
entries map[string]*CacheEntry
}
type CacheEntry struct {
Allowed bool
CachedAt time.Time
}
func (c *LocalCache) Get(key string) (bool, bool) {
c.mu.RLock()
entry, exists := c.entries[key]
c.mu.RUnlock()
if !exists {
return false, false // Not in cache
}
if time.Since(entry.CachedAt) > 500*time.Millisecond { // Short TTL
return false, false // Cache entry is stale
}
return entry.Allowed, true // Return cached decision
}
func (c *LocalCache) Set(key string, allowed bool) {
c.mu.Lock()
defer c.mu.Unlock()
// Simple eviction policy: if cache is too big, clear it
if len(c.entries) > 10000 {
c.entries = make(map[string]*CacheEntry)
}
c.entries[key] = &CacheEntry{Allowed: allowed, CachedAt: time.Now()}
}
What happens if Redis becomes unavailable? A good system needs a fallback. In a failure scenario, each server can fall back to its own local token bucket. It won't be perfectly coordinated across servers, but it's better than either completely failing open (allowing all traffic) or failing closed (blocking all traffic). The local bucket uses a conservative estimate to try and maintain some control until Redis is healthy again.
Finally, we need to see what's happening. I add simple counters to track the behavior of the limiter. This data is invaluable for tuning limits and diagnosing problems.
type Metrics struct {
totalRequests uint64
allowedRequests uint64
cachedDecisions uint64
redisErrors uint64
}
func (m *Metrics) Allow(requestAllowed, fromCache bool) {
atomic.AddUint64(&m.totalRequests, 1)
if requestAllowed {
atomic.AddUint64(&m.allowedRequests, 1)
}
if fromCache {
atomic.AddUint64(&m.cachedDecisions, 1)
}
}
Putting it all together, the main Allow function for our distributed limiter orchestrates these pieces. It tries the cache, chooses the right algorithm, calls Redis with the atomic script, updates the cache, and tracks metrics.
func (limiter *DistributedRateLimiter) Allow(userKey string) bool {
// 1. Quick local cache check
if allowed, found := limiter.localCache.Get(userKey); found {
limiter.metrics.Allow(allowed, true)
return allowed
}
// 2. Determine the limit strategy for this user/key
strategy := limiter.getStrategy(userKey)
// 3. Apply the chosen algorithm via Redis
allowed, err := limiter.executeRedisScript(userKey, strategy)
if err != nil {
// 4. Redis failed! Use local fallback.
allowed = limiter.localFallbackBucket.Allow()
limiter.metrics.RecordRedisError()
}
// 5. Cache the fresh decision for a very short time
limiter.localCache.Set(userKey, allowed)
// 6. Record metrics
limiter.metrics.Allow(allowed, false)
return allowed
}
Building this involves careful trade-offs. The token bucket with Redis synchronization offers a good balance of accuracy, efficiency, and burst tolerance. Adding a local cache with a short lifespan drastically reduces the load on Redis without significantly compromising consistency. The fallback mechanism provides crucial resilience.
The end result is a system that can handle tens of thousands of decisions per second. It protects your application's resources, ensures fair usage, and maintains stability even under heavy load or partial infrastructure failures. You start by defining what "normal" traffic looks like for your service, then you set your limits as a protective fence just beyond that. The system quietly does its job, letting good traffic flow while gently holding back the flood.
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva
Top comments (0)