Prerequisites
Before reading this, you may want to check out:
Case Study: Content Delivery Network
A Content Delivery Network (CDN) is a geographically distributed system of servers designed to deliver web content, media, and application assets to users with minimal latency. Companies like Cloudflare, Akamai, and AWS CloudFront operate networks of edge servers spanning hundreds of locations worldwide, serving as the performance backbone for much of the modern internet. Designing a CDN touches nearly every layer of the networking stack and requires balancing consistency, performance, and cost at global scale.
What makes CDN design particularly fascinating from a system design perspective is the fundamental tension between data freshness and access speed. The entire purpose of a CDN is to serve cached copies of content from locations close to users, but this creates a distributed caching problem of enormous scale. When origin content changes, the CDN must propagate invalidations across hundreds of edge nodes, each potentially serving different versions to different users during the convergence window.
Beyond basic caching, a production CDN must handle geographic routing to direct users to the nearest healthy edge node, origin shielding to protect backend servers from cache-miss stampedes, TLS termination at the edge for security, and real-time traffic analytics. The system must gracefully handle edge node failures, origin outages, and sudden traffic spikes while keeping operational costs proportional to actual usage.
Key Challenges
- Edge caching strategy: Designing cache hierarchies across edge, regional, and origin tiers with appropriate TTLs, eviction policies, and storage allocation to maximize hit rates while minimizing stale content delivery.
- Cache invalidation: Propagating content updates and purge requests across a globally distributed edge network quickly and reliably, handling both targeted invalidation and bulk purges without overwhelming the control plane.
- Origin shielding: Protecting origin servers from thundering-herd cache misses by introducing an intermediate caching layer that collapses concurrent requests for the same uncached object into a single origin fetch.
- Geographic routing: Directing user requests to the optimal edge node based on latency, server health, and capacity using DNS-based routing, anycast, or application-layer routing, while handling failover when nodes become unavailable.
- Cost and performance optimization: Balancing bandwidth costs, storage at the edge, and origin egress charges while meeting latency SLAs, including decisions about what to cache, where to cache it, and when to serve stale content.
Prerequisites
- 06-caching-strategies -- Cache eviction policies, TTL management, write-through vs. write-back patterns, and cache coherence concepts that are central to CDN operation.
- 01-fundamentals -- Networking fundamentals including DNS, TCP/IP, TLS, HTTP semantics, and latency characteristics that underpin content delivery at the protocol level.