Distributed Caching
Overview
Caching in distributed systems reduces latency and load on backend stores by keeping frequently accessed data in fast, in-memory storage closer to the application. The challenge is not just caching itself -- it is maintaining consistency across multiple cache nodes and invalidating stale data reliably in a system where failures and network partitions are normal.
Cache Topologies
Local (In-Process) Cache
Each application instance maintains its own cache in memory (e.g., a HashMap with TTL).
- Fastest access (no network hop)
- No consistency between instances -- each has its own copy
- Limited by the application's memory
- Good for: immutable data, configuration, short-lived request-scoped caches
Centralized Cache
A dedicated cache cluster (e.g., Redis, Memcached) shared by all application instances.
- Single source of truth for cached data
- Network round-trip on every access (typically sub-millisecond on a local network)
- The cache cluster itself must be scaled and made highly available
- Good for: shared session state, frequently queried database results, rate limiting counters
Two-Tier (Local + Remote)
Application instances maintain a small local L1 cache backed by a shared remote L2 cache.
- L1 handles the hottest data with zero network cost
- L2 prevents redundant database queries across instances
- Invalidation must propagate to both tiers
- Good for: high-throughput systems where even cache-hit latency matters
Caching Patterns
Cache-Aside (Lazy Loading)
The application checks the cache first. On a miss, it reads from the database, populates the cache, and returns the result.
STRUCTURE CacheEntry<V>
value : V
inserted_at : Timestamp
ttl : Duration
PROCEDURE CacheEntry.IS_EXPIRED() → Boolean
RETURN ELAPSED(self.inserted_at) > self.ttl
/// A cache-aside implementation with TTL-based expiration.
STRUCTURE CacheAside<V>
store : Map<String, CacheEntry<V>>
default_ttl : Duration
/// Attempt to read from cache. Returns NULL on miss or expiration.
PROCEDURE CacheAside.GET(key) → Optional<V>
entry ← self.store[key]
IF entry IS NOT NULL THEN
IF NOT entry.IS_EXPIRED() THEN
RETURN entry.value
END IF
DELETE self.store[key]
END IF
RETURN NULL
/// Populate the cache after a database read.
PROCEDURE CacheAside.SET(key, value)
self.store[key] ← CacheEntry {
value ← value,
inserted_at ← CURRENT_TIME(),
ttl ← self.default_ttl
}
/// Invalidate a cache entry, typically after a write to the database.
PROCEDURE CacheAside.INVALIDATE(key)
DELETE self.store[key]
- Most common pattern in practice
- Risk of thundering herd: many clients simultaneously miss and all query the database
- Mitigation: use a "single-flight" or lock mechanism so only one client fetches on a miss
Write-Through
Every write goes to both the cache and the database atomically.
- Ensures the cache is always consistent with the database
- Higher write latency (two writes on every mutation)
- Good for: read-heavy workloads where consistency matters and writes are infrequent
Write-Behind (Write-Back)
Writes go to the cache first and are asynchronously flushed to the database in batches.
- Very low write latency
- Risk of data loss if the cache node fails before flushing
- Complexity in ensuring ordering and handling flush failures
- Good for: write-heavy workloads that can tolerate eventual durability (e.g., view counters, analytics)
Cache Invalidation Across Nodes
Cache invalidation is famously one of the two hard problems in computer science. In a distributed system, it gets harder because invalidation must propagate across multiple cache nodes and application instances.
Strategies
TTL-based expiration: Every entry has a time-to-live. After it expires, the next read triggers a fresh fetch. Simple and reliable but allows stale reads up to the TTL duration.
Event-driven invalidation: When data changes, the writing service publishes an event (e.g., via Kafka, Redis Pub/Sub). Subscribers invalidate or update their caches. Lower staleness than TTL but adds messaging infrastructure.
Lease-based invalidation (Facebook's approach): The cache issues a "lease" (a token) when a client gets a miss. The client can only write back to the cache if the lease is still valid. If another client invalidates the key while the first is fetching, the lease is revoked and the stale write is rejected.
Version-stamped values: Each cached value carries a version number. Writers increment the version. Readers accept only values with a version >= what they expect. Prevents older values from overwriting newer ones.
Redis Cluster
Redis Cluster is the most widely used distributed caching system in modern architectures.
Architecture:
- Data is split across 16,384 hash slots
- Each master node owns a subset of hash slots
- Each master can have one or more replicas for failover
- Clients are hash-slot-aware: they compute
CRC16(key) mod 16384and route directly to the correct node
Key behaviors:
- No multi-key operations across slots (unless keys are on the same node via hash tags:
{user:123}.profileand{user:123}.settingsshare a slot) - Gossip protocol for cluster state propagation
- Automatic failover when a master fails, a replica is promoted
- MOVED and ASK redirects handle rebalancing transparently to clients
Real-World Examples
Facebook Memcache (TAO and Memcached at Scale)
Facebook operates one of the largest Memcached deployments in the world, serving billions of requests per second. Key insights from their published architecture:
- Regional pools: Memcached instances are deployed in regional clusters. Reads are served locally; writes go to the primary region and are replicated.
- Lease mechanism: To prevent thundering herd and stale sets, Facebook introduced leases. When a cache miss occurs, the client receives a lease token. If the key is invalidated before the client returns with the data, the lease is revoked and the stale data is not written.
- Gutter pools: Dedicated "gutter" Memcached instances absorb load when primary cache instances fail. Rather than sending all misses to the database, failures are redirected to gutter servers with short TTLs.
- McRouter: A Memcached protocol-compatible routing layer that handles consistent hashing, replication, failover, and connection pooling. Applications talk to McRouter as if it were a single Memcached instance.
- Delete-based invalidation: On writes, Facebook deletes the cache key rather than updating it. This is simpler and avoids race conditions where an update with stale data overwrites a newer value.
Netflix EVCache
Netflix's EVCache is a distributed caching solution built on top of Memcached, designed for their AWS infrastructure:
- Zone-aware replication: Data is replicated across AWS availability zones. Reads are served from the local zone; if the local cache is down, the client falls back to another zone.
- Write-all, read-local: On writes, EVCache writes to all zones. On reads, it only reads from the local zone. This ensures low read latency while maintaining cross-zone durability.
- Tunable consistency: Applications can choose to read from one zone (fast, potentially stale) or all zones (consistent but slower).
- Near-cache (L1): For extremely hot keys, EVCache supports an in-process L1 cache in front of the Memcached L2. This eliminates network hops for the most frequently accessed data.
- Integration with Chaos Monkey: EVCache is tested continuously with fault injection. Cache node failures are expected and handled gracefully by fallback to other zones.
Common Pitfalls
-
Cache stampede / thundering herd: When a popular key expires, many clients simultaneously miss and all query the database. Use leases, single-flight patterns, or staggered TTLs with jitter.
-
Inconsistency between cache and database: Race conditions between write and invalidation paths can leave stale data in the cache. Prefer delete-on-write over update-on-write to reduce the window for races.
-
Unbounded cache growth: Without eviction policies (LRU, LFU, TTL), caches grow until they consume all available memory. Always configure max memory limits and an eviction policy.
-
Caching negative results (or not): If you do not cache misses (negative caching), repeated queries for nonexistent keys always hit the database. Cache misses with a short TTL to prevent this.
-
Hot key problem: A single key accessed millions of times per second can overwhelm a single cache node. Solutions include replicating hot keys across nodes, using local caches for hot keys, or splitting the key into sub-keys.
Key Takeaways
- Cache-aside is the right default pattern. Use write-through only when consistency is critical, and write-behind only when you can tolerate eventual durability.
- Invalidation is harder than caching. Design your invalidation strategy before you start caching. Prefer delete over update for safer semantics.
- Two-tier caching (local L1 + remote L2) gives the best of both worlds for high-throughput systems but adds invalidation complexity.
- Redis Cluster is the dominant choice for new systems. Memcached remains relevant at Facebook/Meta scale and for simpler use cases.
- Always plan for cache failure. The system must function (perhaps at degraded performance) when the cache is unavailable.