Case Study: Distributed Rate Limiter

Rate limiting is a critical defense mechanism that controls how many requests a client can make to a service within a given time window. While a single-server rate limiter is straightforward, designing one that works correctly across a distributed fleet of servers introduces significant complexity. The system must make sub-millisecond decisions on whether to allow or reject each incoming request, all while maintaining accurate counts across multiple nodes.

What makes this case study particularly interesting is the tension between accuracy and performance. A perfectly accurate distributed counter requires coordination between nodes, but that coordination adds latency -- exactly what a rate limiter cannot afford. The system must handle millions of requests per second, apply complex rules (per-user, per-endpoint, per-IP), and degrade gracefully under extreme load. Choosing the right algorithm (fixed window, sliding window log, sliding window counter, or token bucket) has deep implications for both fairness and resource consumption.

The challenge extends beyond counting. A production rate limiter needs a flexible rule engine that operators can update in real time, clear signaling to clients via HTTP headers, and careful handling of edge cases like clock skew between nodes and race conditions on shared counters.

Key Challenges

Distributed counters: Maintaining accurate request counts across multiple servers without introducing unacceptable latency or single points of failure.
Sliding window algorithms: Selecting and implementing a rate limiting algorithm that balances accuracy, memory efficiency, and computational cost.
Rule engine: Supporting flexible, dynamically configurable rules that can target different dimensions (user, IP, endpoint, API key) with different limits.
Low latency: Ensuring the rate limiting check adds minimal overhead to each request, typically under 1 millisecond.
Consistency under failures: Deciding how the system behaves when the central counter store becomes unavailable -- fail open (allow all) or fail closed (deny all).

Prerequisites

01-fundamentals -- networking, latency, and throughput concepts underlying every request path.
02-scalability -- horizontal scaling patterns and the coordination challenges they introduce.
06-caching-strategies -- in-memory stores like Redis that serve as the backbone for distributed counters.