Implementing Token Bucket Rate Limiting for High-Volume Inventory APIs

#aws #microservices #devops #systemdesign

When you expose inventory or checkout endpoints to public-facing front-ends or third-party webhooks, safeguarding those APIs from brute-force scripts, scraping bots, and inventory hoarding algorithms becomes a critical requirement. Without defensive rate limiting, a single coordinated script can easily overwhelm your database connections.

The Problem with Simple Counter Resets

A common mistake when setting up basic API protection is using a rigid "Fixed Window" counter (e.g., allowing 100 requests per minute, resetting exactly at the turn of the clock). This creates a massive flaw where a developer can flood your server with 100 requests at 11:59:59 and another 100 requests at 12:00:01, effectively doubling your acceptable burst traffic and causing severe performance dips.

To handle uneven burst traffic safely without crashing your database, the standard approach is implementing a token bucket algorithm.

The Token Bucket Pattern

The token bucket algorithm maintains a centralized bucket that holds a maximum capacity of tokens. Tokens are added back to the bucket at a constant, predictable rate over time. Each incoming API request consumes exactly one token. If the bucket is completely empty, the request is instantly rejected with a 429 Too Many Requests status code, protecting your core server threads.


javascript
// Quick Redis-based token bucket rate limiter concept
async function isRateLimited(userId) {
  const key = `rate:${userId}`;
  const now = Date.now();

  // Use a Redis multi-exec transaction to atomically check and update tokens
  const [tokens, lastRefill] = await redis.hmget(key, 'tokens', 'lastRefill');

  // Calculate token replenishment based on time elapsed...
  // Return true if tokens <= 0, otherwise decrement tokens and update timestamp
}

Top comments (1)

arun rajkumar • Jun 10

Token bucket over fixed windows is the right call. One trap on the Redis version: the read-then-write in that snippet isn't atomic, so under real concurrency two requests can both read the same token count and both pass. We push the whole refill-and-decrement into a single Lua script — one atomic round-trip — otherwise the limiter leaks exactly when you're under the load you built it for. For public webhooks we also key on the source rather than the user, since the abusive caller usually isn't authenticated. What refill granularity are you running?