DEV Community

Cover image for Building Distributed Rate Limiters: Redis Token Buckets for High-Traffic Applications
Nithin Bharadwaj
Nithin Bharadwaj

Posted on

Building Distributed Rate Limiters: Redis Token Buckets for High-Traffic Applications

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Think of a popular bakery that only has enough oven space to make one hundred loaves of bread every hour. If two hundred people show up at once, the system breaks. The baker gets overwhelmed, some customers leave with nothing, and everything grinds to a halt. In the world of software, our servers are the baker, and requests for data are the customers. A distributed rate limiter is the system that manages the line, ensuring everyone gets served fairly and the bakery doesn't catch fire.

I build these systems to protect applications. Without them, a sudden flood of traffic or a malicious attack can exhaust resources, causing outages for everyone. The goal is to be firm but fair: allow normal traffic to flow smoothly while politely saying "please wait a moment" to anything excessive.

At the heart of many modern solutions is an idea called the token bucket. Picture a bucket that holds tokens. Every time a request comes in, it needs to take a token from the bucket to proceed. The bucket is constantly being refilled at a steady rate, say ten tokens per second. This gives us a smooth, average limit. The clever part is the bucket's size. If it's large, it can hold many tokens, allowing for brief bursts of traffic above the average rate when the bucket is full. This matches real-world usage, where users might click quickly several times in a row before pausing.

Here is a basic, single-server idea of that bucket.

type LocalTokenBucket struct {
    capacity     float64 // Max tokens the bucket can hold
    tokens       float64 // Current token count
    fillRate     float64 // Tokens added per second
    lastRefill   time.Time
    mu           sync.Mutex
}

func (b *LocalTokenBucket) Allow() bool {
    b.mu.Lock()
    defer b.mu.Unlock()

    now := time.Now()
    // Calculate how many tokens to add since last check
    elapsed := now.Sub(b.lastRefill).Seconds()
    b.tokens = math.Min(b.capacity, b.tokens + (elapsed * b.fillRate))
    b.lastRefill = now

    if b.tokens >= 1.0 {
        b.tokens -= 1.0
        return true // Token available, request allowed
    }
    return false // No tokens, request denied
}
Enter fullscreen mode Exit fullscreen mode

This works perfectly on one machine. But today's applications are rarely on just one machine. They run across dozens or hundreds of servers. If each server has its own bucket, a user could send requests to ten different servers and get ten tokens instantly, bypassing the limit entirely. We need all our servers to share one common understanding of the bucket's level. This is where Redis, a fast in-memory data store, becomes essential. It acts as our shared brain.

The challenge is coordination. We cannot have two servers check the token count at the exact same time and both decide a token is available. That would be a classic race condition. We need the check for a token and the act of taking one to be a single, unbreakable operation. In Redis, we use Lua scripts to achieve this atomicity. The script runs completely inside Redis, guaranteeing no other command interrupts it.

Here is the core of our distributed token bucket, implemented as a Lua script executed in Redis.

func (drl *DistributedRateLimiter) tokenBucketLuaScript() string {
    return `
        local key = KEYS[1]              -- e.g., 'ratelimit:user123'
        local now = tonumber(ARGV[1])    -- Current timestamp
        local fill_rate = tonumber(ARGV[2]) -- Tokens per second
        local capacity = tonumber(ARGV[3])  -- Bucket size
        local ttl = tonumber(ARGV[4])      -- How long to keep this key

        -- Get the bucket's current state from Redis
        local bucket = redis.call('HMGET', key, 'tokens', 'last_update')
        local tokens = tonumber(bucket[1])
        local last_update = tonumber(bucket[2])

        -- If this is a new bucket, fill it to capacity
        if tokens == nil then
            tokens = capacity
            last_update = now
        end

        -- Add new tokens based on time passed
        local elapsed = now - last_update
        local new_tokens = elapsed * fill_rate
        if new_tokens > 0 then
            tokens = math.min(capacity, tokens + new_tokens)
            last_update = now
        end

        -- Can we take a token?
        if tokens >= 1.0 then
            tokens = tokens - 1.0
            -- Save the updated state back to Redis
            redis.call('HMSET', key, 'tokens', tokens, 'last_update', last_update)
            redis.call('EXPIRE', key, ttl)
            return 1  -- Allowed
        end

        -- Not enough tokens, but still update the state
        redis.call('HMSET', key, 'tokens', tokens, 'last_update', last_update)
        redis.call('EXPIRE', key, ttl)
        return 0  -- Denied
    `
}
Enter fullscreen mode Exit fullscreen mode

In this script, everything happens in sequence inside Redis. It reads the current state, calculates new tokens, decides if one can be taken, and writes back the new state—all before any other command from any other server can touch the same key. This gives us our consistent, distributed limit.

The token bucket is excellent for smooth, burstable limits. But sometimes you need different behavior. A common alternative is the fixed window. Imagine a counter that resets every minute. It allows up to one hundred requests per minute, no matter when they arrive. The problem is that if a hundred requests come at 59 seconds past the minute, and another hundred at 0 seconds of the next minute, you get two hundred requests in two seconds. This is a boundary issue.

We implement a fixed window in Redis quite simply.

func (drl *DistributedRateLimiter) fixedWindow(ctx context.Context, key string, limit int64, window time.Duration) (bool, error) {
    // Create a key that includes the current window's start time
    windowStart := time.Now().Truncate(window)
    redisKey := fmt.Sprintf("%s:window:%d", key, windowStart.Unix())

    // Increment the counter for this window
    currentCount, err := drl.redisClient.Incr(ctx, redisKey).Result()
    if err != nil {
        return false, err
    }

    // If this is the first request in the window, set its expiry
    if currentCount == 1 {
        drl.redisClient.Expire(ctx, redisKey, window + 10*time.Second)
    }

    return currentCount <= limit, nil
}
Enter fullscreen mode Exit fullscreen mode

To solve the boundary issue of fixed windows, we can use a sliding window. It's more complex but more accurate. It keeps a record of individual request timestamps over the last time window. To check a request, we remove all timestamps older than the window, then count how many are left.

This is implemented using a Redis Sorted Set, where the score is the timestamp.

func (drl *DistributedRateLimiter) slidingWindow(ctx context.Context, key string, limit int64, window time.Duration) (bool, error) {
    redisKey := fmt.Sprintf("%s:sliding", key)
    now := time.Now()
    cutoff := now.Add(-window).UnixNano()

    // 1. Remove all requests older than the window
    drl.redisClient.ZRemRangeByScore(ctx, redisKey, "0", fmt.Sprintf("%d", cutoff))

    // 2. Count requests still in the window
    count, err := drl.redisClient.ZCard(ctx, redisKey).Result()
    if err != nil {
        return false, err
    }

    if count >= limit {
        return false, nil // Over the limit
    }

    // 3. Add the new request's timestamp
    drl.redisClient.ZAdd(ctx, redisKey, &redis.Z{
        Score:  float64(now.UnixNano()),
        Member: fmt.Sprintf("%d:%s", now.UnixNano(), randomString(8)), // Unique member
    })
    // Keep the key alive slightly longer than the window
    drl.redisClient.Expire(ctx, redisKey, window + 10*time.Second)
    return true, nil
}
Enter fullscreen mode Exit fullscreen mode

Calling Redis for every single request can become a bottleneck. To reduce load and latency, I add a local cache on each application server. If we recently allowed a request for user "X", we'll probably allow the next one very soon after. We can cache that "allowed" decision locally for a short time.

type LocalCache struct {
    mu     sync.RWMutex
    entries map[string]*CacheEntry
}

type CacheEntry struct {
    Allowed   bool
    CachedAt  time.Time
}

func (c *LocalCache) Get(key string) (bool, bool) {
    c.mu.RLock()
    entry, exists := c.entries[key]
    c.mu.RUnlock()

    if !exists {
        return false, false // Not in cache
    }
    if time.Since(entry.CachedAt) > 500*time.Millisecond { // Short TTL
        return false, false // Cache entry is stale
    }
    return entry.Allowed, true // Return cached decision
}

func (c *LocalCache) Set(key string, allowed bool) {
    c.mu.Lock()
    defer c.mu.Unlock()
    // Simple eviction policy: if cache is too big, clear it
    if len(c.entries) > 10000 {
        c.entries = make(map[string]*CacheEntry)
    }
    c.entries[key] = &CacheEntry{Allowed: allowed, CachedAt: time.Now()}
}
Enter fullscreen mode Exit fullscreen mode

What happens if Redis becomes unavailable? A good system needs a fallback. In a failure scenario, each server can fall back to its own local token bucket. It won't be perfectly coordinated across servers, but it's better than either completely failing open (allowing all traffic) or failing closed (blocking all traffic). The local bucket uses a conservative estimate to try and maintain some control until Redis is healthy again.

Finally, we need to see what's happening. I add simple counters to track the behavior of the limiter. This data is invaluable for tuning limits and diagnosing problems.

type Metrics struct {
    totalRequests   uint64
    allowedRequests uint64
    cachedDecisions uint64
    redisErrors     uint64
}

func (m *Metrics) Allow(requestAllowed, fromCache bool) {
    atomic.AddUint64(&m.totalRequests, 1)
    if requestAllowed {
        atomic.AddUint64(&m.allowedRequests, 1)
    }
    if fromCache {
        atomic.AddUint64(&m.cachedDecisions, 1)
    }
}
Enter fullscreen mode Exit fullscreen mode

Putting it all together, the main Allow function for our distributed limiter orchestrates these pieces. It tries the cache, chooses the right algorithm, calls Redis with the atomic script, updates the cache, and tracks metrics.

func (limiter *DistributedRateLimiter) Allow(userKey string) bool {
    // 1. Quick local cache check
    if allowed, found := limiter.localCache.Get(userKey); found {
        limiter.metrics.Allow(allowed, true)
        return allowed
    }

    // 2. Determine the limit strategy for this user/key
    strategy := limiter.getStrategy(userKey)

    // 3. Apply the chosen algorithm via Redis
    allowed, err := limiter.executeRedisScript(userKey, strategy)
    if err != nil {
        // 4. Redis failed! Use local fallback.
        allowed = limiter.localFallbackBucket.Allow()
        limiter.metrics.RecordRedisError()
    }

    // 5. Cache the fresh decision for a very short time
    limiter.localCache.Set(userKey, allowed)

    // 6. Record metrics
    limiter.metrics.Allow(allowed, false)
    return allowed
}
Enter fullscreen mode Exit fullscreen mode

Building this involves careful trade-offs. The token bucket with Redis synchronization offers a good balance of accuracy, efficiency, and burst tolerance. Adding a local cache with a short lifespan drastically reduces the load on Redis without significantly compromising consistency. The fallback mechanism provides crucial resilience.

The end result is a system that can handle tens of thousands of decisions per second. It protects your application's resources, ensures fair usage, and maintains stability even under heavy load or partial infrastructure failures. You start by defining what "normal" traffic looks like for your service, then you set your limits as a protective fence just beyond that. The system quietly does its job, letting good traffic flow while gently holding back the flood.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Top comments (0)