Distributed Caching
Overview
Distributed caching spreads cached data across multiple servers, providing more memory capacity, higher throughput, and fault tolerance than a single cache node. It is essential infrastructure for any system serving thousands of requests per second, but introduces challenges around data distribution, consistency, and failure handling.
Redis
Redis is an in-memory data structure store used as a cache, message broker, and database. It is the most popular distributed caching solution.
Key Features
Data structures:
Strings: Simple key-value (cache entries, counters)
Hashes: Field-value maps (user profiles, object attributes)
Lists: Ordered collections (message queues, activity feeds)
Sets: Unique collections (tags, followers, online users)
Sorted Sets: Ranked collections (leaderboards, priority queues)
Streams: Append-only logs (event sourcing, message queues)
Operational features:
- Sub-millisecond latency for most operations
- Persistence options (RDB snapshots, AOF append-only file)
- Replication (primary-replica for read scaling)
- Clustering (automatic sharding across nodes)
- Lua scripting for atomic multi-step operations
- Pub/Sub for real-time messaging
- TTL and eviction policies built in
Redis Cluster
Redis Cluster divides the keyspace into 16,384 hash slots.
Each node owns a subset of slots.
Key assignment:
slot = CRC16(key) % 16384
"user:123" -> slot 5649 -> Node B
"user:456" -> slot 12103 -> Node C
Cluster topology (6 nodes, 3 primary + 3 replica):
Node A (primary): slots 0-5460 | Node D (replica of A)
Node B (primary): slots 5461-10922 | Node E (replica of B)
Node C (primary): slots 10923-16383 | Node F (replica of C)
Automatic failover:
If Node B fails, Node E is promoted to primary.
Cluster continues serving all slots.
Limitations:
- Multi-key operations must use keys on the same slot
- Use hash tags to force keys to the same slot: {user:123}:profile
- No cross-slot transactions
- Cluster overhead increases with more nodes
When to Use Redis
- Session storage with rich data structures
- Real-time leaderboards and counters
- Rate limiting
- Pub/Sub messaging
- Caching with TTL and eviction
- Distributed locks (with Redlock for stronger guarantees)
Memcached
Memcached is a simpler, high-performance distributed cache focused purely on key-value storage.
Key Features
Design philosophy: Simple, fast, and focused.
- Key-value only (no data structures)
- No persistence (pure in-memory cache)
- No replication (each node is independent)
- Multi-threaded (better CPU utilization per node)
- Slab-based memory allocation (predictable memory usage)
Operations:
GET key Retrieve a value
SET key value ttl Store a value with TTL
DELETE key Remove a value
INCR/DECR key Atomic counter operations
CAS key value Compare-and-swap (optimistic concurrency)
Redis vs Memcached
Choose Redis when:
- You need data structures (lists, sets, sorted sets)
- You need persistence or replication
- You need pub/sub or streams
- You need atomic multi-key operations (Lua scripts)
- Your use case goes beyond simple caching
Choose Memcached when:
- Simple key-value caching is sufficient
- You need the simplest possible architecture
- Multi-threaded performance matters (Memcached uses threads;
Redis uses single-threaded event loop per shard)
- You want predictable memory usage (slab allocator)
- You are already running Memcached and it works
Real-world:
Facebook: Memcached for caching (simplicity at massive scale)
Twitter: Redis for timelines, counters, and caching
GitHub: Redis for background jobs, caching, and real-time features
Instagram: Redis for news feed, Memcached for general caching
Consistent Hashing for Cache Sharding
When distributing cache entries across multiple servers, you need a strategy that minimizes disruption when servers are added or removed.
The Problem with Simple Hashing
Simple modulo hashing:
server = hash(key) % number_of_servers
With 3 servers:
hash("user:1") % 3 = 0 -> Server A
hash("user:2") % 3 = 1 -> Server B
hash("user:3") % 3 = 2 -> Server C
Add a 4th server (now mod 4):
hash("user:1") % 4 = 1 -> Server B (was A, MOVED)
hash("user:2") % 4 = 2 -> Server C (was B, MOVED)
hash("user:3") % 4 = 0 -> Server A (was C, MOVED)
Almost every key remaps to a different server.
All cached data is effectively invalidated.
Database receives the full query load (cache stampede).
How Consistent Hashing Works
Consistent hashing uses a hash ring:
1. Hash each server onto a ring (0 to 2^32):
Server A: position 1000
Server B: position 4000
Server C: position 7000
2. Hash each key onto the same ring:
"user:1": position 2500 -> walks clockwise -> Server B
"user:2": position 5500 -> walks clockwise -> Server C
"user:3": position 8500 -> walks clockwise -> Server A (wraps)
3. Adding Server D at position 5000:
"user:1": position 2500 -> still Server B (unchanged)
"user:2": position 5500 -> now Server C (unchanged, 5500 > 5000)
Only keys between 4000 and 5000 move to Server D.
Approximately 1/N of keys are remapped (N = number of servers).
4. Virtual nodes (vnodes):
Each physical server maps to multiple positions on the ring.
Server A: positions 1000, 3500, 6500, 9000
This ensures more even key distribution.
Typical: 100-200 virtual nodes per physical server.
Real-World Usage
Amazon DynamoDB uses consistent hashing for partition key distribution across storage nodes.
Apache Cassandra uses consistent hashing with virtual nodes for data distribution across the cluster.
Memcached client libraries (libmemcached, Ketama) use consistent hashing to distribute keys across a pool of Memcached servers.
Cache Stampede
A cache stampede occurs when a popular cache entry expires and many concurrent requests simultaneously attempt to regenerate it, overwhelming the database.
The Problem
Scenario:
Popular item "trending:homepage" expires.
1,000 concurrent requests arrive for the same key.
All 1,000 see a cache miss.
All 1,000 query the database simultaneously.
Database overloaded, latency spikes, possible failure.
Timeline:
T0: Cache entry expires
T1: 1,000 requests check cache, all miss
T2: 1,000 identical database queries execute
T3: Database chokes, responses time out
T4: Cascading failure spreads to other services
Solutions
Solution 1: Locking (mutex)
On cache miss, acquire a lock for the key.
Only one request queries the database.
Other requests wait for the lock or return stale data.
request arrives -> cache miss -> try lock
Lock acquired: query database, populate cache, release lock
Lock not acquired: wait and retry, or return stale data
Solution 2: Early expiration (probabilistic)
Items "expire" slightly before their real TTL.
A random request triggers a background refresh.
Other requests still see the cached (slightly old) data.
effective_ttl = ttl - random(0, ttl * 0.1)
If current_time > effective_ttl: refresh in background
Return existing cached value to the current request
Solution 3: External refresh (background job)
A separate process refreshes popular cache entries
before they expire. Application never causes a miss
for known hot keys.
Solution 4: Stale-while-revalidate
Serve stale data while refreshing in the background.
Similar to the HTTP cache-control directive.
One request triggers refresh; all others get stale data.
Thundering Herd
The thundering herd problem is similar to cache stampede but can also occur when a cache node fails or restarts, and all requests previously served by that node suddenly hit the database.
The Problem
Scenario 1: Cache node failure
Cache node B served 30% of all keys.
Node B crashes.
All requests for those keys become cache misses.
Database load jumps 30% instantly.
Scenario 2: Application restart
Application process restarts with cold in-process cache.
All requests from this process hit the distributed cache or database.
If many processes restart simultaneously (rolling deploy):
each restart adds load in a wave.
Scenario 3: Cache flush
Operator accidentally flushes the cache.
100% of requests become cache misses simultaneously.
Database receives the full read load.
Solutions
Solution 1: Request coalescing (deduplication)
When multiple identical requests arrive at the same time,
only one actually fetches from the database.
The rest wait for and share the result.
Also called: single-flight, request collapsing
Solution 2: Circuit breaker
When database error rate exceeds a threshold:
- Stop sending requests to the database
- Return cached (potentially stale) data or a degraded response
- Periodically test if database has recovered
Solution 3: Graceful degradation
When cache is unavailable:
- Serve stale data from backup cache or local cache
- Return partial results instead of complete failure
- Queue requests and process when cache recovers
Solution 4: Rate limiting database queries
Limit the number of concurrent database queries per key or globally.
Excess requests wait in a queue or return an error.
Prevents database from being overwhelmed.
Solution 5: Redundant cache nodes
Multiple replicas of each cache shard.
If one node fails, replicas serve requests.
Redis Sentinel or Redis Cluster provide this.
Cache Invalidation at Scale
"There are only two hard things in Computer Science:
cache invalidation and naming things." - Phil Karlton
Invalidation strategies:
TTL-based:
Simple: set TTL when caching, let it expire.
Pro: No active invalidation needed.
Con: Stale data until TTL expires.
Event-based:
When data changes, publish an event.
Cache subscribers invalidate affected keys.
Pro: Near-instant invalidation.
Con: Must track all cache keys affected by each change.
Version-based:
Include a version in cache keys: "user:123:v7"
On update, increment version: "user:123:v8"
Old entries expire naturally via TTL.
Pro: No explicit invalidation.
Con: Must store and look up current version.
Write-through invalidation:
Database change triggers cache update.
Using CDC (Change Data Capture) or triggers.
Pro: Database is the single source of truth for invalidation.
Con: Adds latency and complexity to write path.
Real-World Distributed Caching
Facebook operates the world's largest Memcached deployment. They partition data across regions, use lease tokens to prevent cache stampede, and built McRouter as a proxy layer for connection management and routing.
Twitter uses Redis clusters for timeline caching. Each user's home timeline is pre-computed and stored in Redis. When a tweet is posted, it is fanned out to followers' timeline caches.
Pinterest uses a combination of Memcached for object caching and Redis for specialized data structures (sorted sets for feeds, HyperLogLog for counting).
Common Pitfalls
- Single cache node as single point of failure: Always deploy cache with replication or clustering. A cache failure should degrade performance, not cause an outage.
- Not using consistent hashing: Adding or removing cache nodes with modulo hashing invalidates most of the cache. Use consistent hashing from the start.
- Ignoring hot keys: If one key receives disproportionate traffic, a single cache node becomes a bottleneck. Replicate hot keys across nodes or use local caching for the hottest keys.
- No cache stampede protection: Popular items expire and hammer the database. Implement locking, stale-while-revalidate, or background refresh.
- Over-relying on cache: If the cache is down and your system falls over completely, your architecture is wrong. The system should degrade gracefully, not fail catastrophically.
- Network as an afterthought: Distributed cache adds a network round trip. If your cache is in a different availability zone, that round trip might be 1-5ms instead of sub-millisecond.
Key Takeaways
- Redis is the most versatile distributed cache, offering rich data structures, persistence, and clustering. Memcached is simpler and sufficient for pure key-value caching.
- Consistent hashing is essential for distributed caches. It ensures that adding or removing nodes only remaps a small fraction of keys.
- Cache stampede and thundering herd are real operational risks. Implement request coalescing, locking, or stale-while-revalidate to protect your database.
- Hot keys require special handling: local caching, key replication, or read replicas for the cache itself.
- Plan for cache failure. Your system should degrade gracefully (slower, not broken) when the cache is unavailable.
- Monitor cache node health, hit rates per node, and key distribution to detect hot spots and imbalances early.