Distributed Caching

Overview

Distributed caching spreads cached data across multiple servers, providing more memory capacity, higher throughput, and fault tolerance than a single cache node. It is essential infrastructure for any system serving thousands of requests per second, but introduces challenges around data distribution, consistency, and failure handling.

Redis

Redis is an in-memory data structure store used as a cache, message broker, and database. It is the most popular distributed caching solution.

Key Features

Data structures:
  Strings:     Simple key-value (cache entries, counters)
  Hashes:      Field-value maps (user profiles, object attributes)
  Lists:       Ordered collections (message queues, activity feeds)
  Sets:        Unique collections (tags, followers, online users)
  Sorted Sets: Ranked collections (leaderboards, priority queues)
  Streams:     Append-only logs (event sourcing, message queues)

Operational features:
  - Sub-millisecond latency for most operations
  - Persistence options (RDB snapshots, AOF append-only file)
  - Replication (primary-replica for read scaling)
  - Clustering (automatic sharding across nodes)
  - Lua scripting for atomic multi-step operations
  - Pub/Sub for real-time messaging
  - TTL and eviction policies built in

Redis Cluster

Redis Cluster divides the keyspace into 16,384 hash slots.
Each node owns a subset of slots.

Key assignment:
  slot = CRC16(key) % 16384
  "user:123" -> slot 5649 -> Node B
  "user:456" -> slot 12103 -> Node C

Cluster topology (6 nodes, 3 primary + 3 replica):
  Node A (primary): slots 0-5460       | Node D (replica of A)
  Node B (primary): slots 5461-10922   | Node E (replica of B)
  Node C (primary): slots 10923-16383  | Node F (replica of C)

Automatic failover:
  If Node B fails, Node E is promoted to primary.
  Cluster continues serving all slots.

Limitations:
  - Multi-key operations must use keys on the same slot
  - Use hash tags to force keys to the same slot: {user:123}:profile
  - No cross-slot transactions
  - Cluster overhead increases with more nodes

When to Use Redis

Session storage with rich data structures
Real-time leaderboards and counters
Rate limiting
Pub/Sub messaging
Caching with TTL and eviction
Distributed locks (with Redlock for stronger guarantees)

Memcached

Memcached is a simpler, high-performance distributed cache focused purely on key-value storage.

Key Features

Design philosophy: Simple, fast, and focused.
  - Key-value only (no data structures)
  - No persistence (pure in-memory cache)
  - No replication (each node is independent)
  - Multi-threaded (better CPU utilization per node)
  - Slab-based memory allocation (predictable memory usage)

Operations:
  GET key            Retrieve a value
  SET key value ttl  Store a value with TTL
  DELETE key         Remove a value
  INCR/DECR key     Atomic counter operations
  CAS key value      Compare-and-swap (optimistic concurrency)

Redis vs Memcached

Choose Redis when:
  - You need data structures (lists, sets, sorted sets)
  - You need persistence or replication
  - You need pub/sub or streams
  - You need atomic multi-key operations (Lua scripts)
  - Your use case goes beyond simple caching

Choose Memcached when:
  - Simple key-value caching is sufficient
  - You need the simplest possible architecture
  - Multi-threaded performance matters (Memcached uses threads;
    Redis uses single-threaded event loop per shard)
  - You want predictable memory usage (slab allocator)
  - You are already running Memcached and it works

Real-world:
  Facebook: Memcached for caching (simplicity at massive scale)
  Twitter: Redis for timelines, counters, and caching
  GitHub: Redis for background jobs, caching, and real-time features
  Instagram: Redis for news feed, Memcached for general caching

Consistent Hashing for Cache Sharding

When distributing cache entries across multiple servers, you need a strategy that minimizes disruption when servers are added or removed.

The Problem with Simple Hashing

Simple modulo hashing:
  server = hash(key) % number_of_servers

  With 3 servers:
    hash("user:1") % 3 = 0 -> Server A
    hash("user:2") % 3 = 1 -> Server B
    hash("user:3") % 3 = 2 -> Server C

  Add a 4th server (now mod 4):
    hash("user:1") % 4 = 1 -> Server B  (was A, MOVED)
    hash("user:2") % 4 = 2 -> Server C  (was B, MOVED)
    hash("user:3") % 4 = 0 -> Server A  (was C, MOVED)

  Almost every key remaps to a different server.
  All cached data is effectively invalidated.
  Database receives the full query load (cache stampede).

How Consistent Hashing Works

Consistent hashing uses a hash ring:

1. Hash each server onto a ring (0 to 2^32):
   Server A: position 1000
   Server B: position 4000
   Server C: position 7000

2. Hash each key onto the same ring:
   "user:1": position 2500 -> walks clockwise -> Server B
   "user:2": position 5500 -> walks clockwise -> Server C
   "user:3": position 8500 -> walks clockwise -> Server A (wraps)

3. Adding Server D at position 5000:
   "user:1": position 2500 -> still Server B (unchanged)
   "user:2": position 5500 -> now Server C (unchanged, 5500 > 5000)
   Only keys between 4000 and 5000 move to Server D.
   Approximately 1/N of keys are remapped (N = number of servers).

4. Virtual nodes (vnodes):
   Each physical server maps to multiple positions on the ring.
   Server A: positions 1000, 3500, 6500, 9000
   This ensures more even key distribution.
   Typical: 100-200 virtual nodes per physical server.

Real-World Usage

Amazon DynamoDB uses consistent hashing for partition key distribution across storage nodes.

Apache Cassandra uses consistent hashing with virtual nodes for data distribution across the cluster.

Memcached client libraries (libmemcached, Ketama) use consistent hashing to distribute keys across a pool of Memcached servers.

Cache Stampede

A cache stampede occurs when a popular cache entry expires and many concurrent requests simultaneously attempt to regenerate it, overwhelming the database.

The Problem

Scenario:
  Popular item "trending:homepage" expires.
  1,000 concurrent requests arrive for the same key.
  All 1,000 see a cache miss.
  All 1,000 query the database simultaneously.
  Database overloaded, latency spikes, possible failure.

Timeline:
  T0: Cache entry expires
  T1: 1,000 requests check cache, all miss
  T2: 1,000 identical database queries execute
  T3: Database chokes, responses time out
  T4: Cascading failure spreads to other services

Solutions

Solution 1: Locking (mutex)
  On cache miss, acquire a lock for the key.
  Only one request queries the database.
  Other requests wait for the lock or return stale data.

  request arrives -> cache miss -> try lock
    Lock acquired: query database, populate cache, release lock
    Lock not acquired: wait and retry, or return stale data

Solution 2: Early expiration (probabilistic)
  Items "expire" slightly before their real TTL.
  A random request triggers a background refresh.
  Other requests still see the cached (slightly old) data.

  effective_ttl = ttl - random(0, ttl * 0.1)
  If current_time > effective_ttl: refresh in background
  Return existing cached value to the current request

Solution 3: External refresh (background job)
  A separate process refreshes popular cache entries
  before they expire. Application never causes a miss
  for known hot keys.

Solution 4: Stale-while-revalidate
  Serve stale data while refreshing in the background.
  Similar to the HTTP cache-control directive.
  One request triggers refresh; all others get stale data.

Thundering Herd

The thundering herd problem is similar to cache stampede but can also occur when a cache node fails or restarts, and all requests previously served by that node suddenly hit the database.

The Problem

Scenario 1: Cache node failure
  Cache node B served 30% of all keys.
  Node B crashes.
  All requests for those keys become cache misses.
  Database load jumps 30% instantly.

Scenario 2: Application restart
  Application process restarts with cold in-process cache.
  All requests from this process hit the distributed cache or database.
  If many processes restart simultaneously (rolling deploy):
  each restart adds load in a wave.

Scenario 3: Cache flush
  Operator accidentally flushes the cache.
  100% of requests become cache misses simultaneously.
  Database receives the full read load.

Solutions

Solution 1: Request coalescing (deduplication)
  When multiple identical requests arrive at the same time,
  only one actually fetches from the database.
  The rest wait for and share the result.

  Also called: single-flight, request collapsing

Solution 2: Circuit breaker
  When database error rate exceeds a threshold:
  - Stop sending requests to the database
  - Return cached (potentially stale) data or a degraded response
  - Periodically test if database has recovered

Solution 3: Graceful degradation
  When cache is unavailable:
  - Serve stale data from backup cache or local cache
  - Return partial results instead of complete failure
  - Queue requests and process when cache recovers

Solution 4: Rate limiting database queries
  Limit the number of concurrent database queries per key or globally.
  Excess requests wait in a queue or return an error.
  Prevents database from being overwhelmed.

Solution 5: Redundant cache nodes
  Multiple replicas of each cache shard.
  If one node fails, replicas serve requests.
  Redis Sentinel or Redis Cluster provide this.

Cache Invalidation at Scale

"There are only two hard things in Computer Science:
cache invalidation and naming things." - Phil Karlton

Invalidation strategies:

TTL-based:
  Simple: set TTL when caching, let it expire.
  Pro: No active invalidation needed.
  Con: Stale data until TTL expires.

Event-based:
  When data changes, publish an event.
  Cache subscribers invalidate affected keys.
  Pro: Near-instant invalidation.
  Con: Must track all cache keys affected by each change.

Version-based:
  Include a version in cache keys: "user:123:v7"
  On update, increment version: "user:123:v8"
  Old entries expire naturally via TTL.
  Pro: No explicit invalidation.
  Con: Must store and look up current version.

Write-through invalidation:
  Database change triggers cache update.
  Using CDC (Change Data Capture) or triggers.
  Pro: Database is the single source of truth for invalidation.
  Con: Adds latency and complexity to write path.

Real-World Distributed Caching

Facebook operates the world's largest Memcached deployment. They partition data across regions, use lease tokens to prevent cache stampede, and built McRouter as a proxy layer for connection management and routing.

Twitter uses Redis clusters for timeline caching. Each user's home timeline is pre-computed and stored in Redis. When a tweet is posted, it is fanned out to followers' timeline caches.

Pinterest uses a combination of Memcached for object caching and Redis for specialized data structures (sorted sets for feeds, HyperLogLog for counting).

Common Pitfalls

Single cache node as single point of failure: Always deploy cache with replication or clustering. A cache failure should degrade performance, not cause an outage.
Not using consistent hashing: Adding or removing cache nodes with modulo hashing invalidates most of the cache. Use consistent hashing from the start.
Ignoring hot keys: If one key receives disproportionate traffic, a single cache node becomes a bottleneck. Replicate hot keys across nodes or use local caching for the hottest keys.
No cache stampede protection: Popular items expire and hammer the database. Implement locking, stale-while-revalidate, or background refresh.
Over-relying on cache: If the cache is down and your system falls over completely, your architecture is wrong. The system should degrade gracefully, not fail catastrophically.
Network as an afterthought: Distributed cache adds a network round trip. If your cache is in a different availability zone, that round trip might be 1-5ms instead of sub-millisecond.

Key Takeaways

Redis is the most versatile distributed cache, offering rich data structures, persistence, and clustering. Memcached is simpler and sufficient for pure key-value caching.
Consistent hashing is essential for distributed caches. It ensures that adding or removing nodes only remaps a small fraction of keys.
Cache stampede and thundering herd are real operational risks. Implement request coalescing, locking, or stale-while-revalidate to protect your database.
Hot keys require special handling: local caching, key replication, or read replicas for the cache itself.
Plan for cache failure. Your system should degrade gracefully (slower, not broken) when the cache is unavailable.
Monitor cache node health, hit rates per node, and key distribution to detect hot spots and imbalances early.