Design a Content Delivery Network

A content delivery network places copies of content on servers close to end users, reducing latency and offloading traffic from origin servers. Cloudflare, CloudFront, and Akamai serve trillions of requests daily across thousands of edge locations. This design covers the architecture from edge to origin, including caching, routing, security, and invalidation.

Content delivery network architecture

Functional Requirements

Cache and serve static content (images, videos, CSS, JS, fonts) from edge servers
Support dynamic content acceleration (proxying with optimized routes)
Provide cache invalidation via API (purge by URL, prefix, or tag)
Route users to the nearest edge server using anycast and GeoDNS
Terminate TLS at the edge with managed certificates
Protect origins from DDoS attacks and malicious traffic
Support configurable cache-control policies per customer
Provide real-time analytics on cache hit rates, latency, and bandwidth

Non-Functional Requirements

Sub-50ms response time for cache hits at the edge
99.999% availability (five nines)
Support 10 million requests per second globally
Handle 100+ Tbps aggregate bandwidth
Operate in 200+ edge locations across 50+ countries

Estimation

Traffic

10 million requests/second globally
~200 edge locations -> ~50,000 requests/second per location average
Cache hit ratio target: 95%+
Origin traffic: 5% of 10M = 500K requests/second to origins

Storage

Average cached object: 100 KB
Unique objects per edge: ~5 million (hot working set)
Per-edge storage: 5M * 100 KB = 500 GB
Total across 200 edges: 100 TB (with significant overlap)
Origin storage: managed by customer (not CDN's concern)

Bandwidth

Average response size: 100 KB
Peak egress: 10M req/sec * 100 KB = 1 Tbps (aggregate)
Per-edge bandwidth: ~5 Gbps average, 20 Gbps peak
Backbone bandwidth (edge-to-origin): ~50 Gbps aggregate

High-Level Design

The CDN has four major components: the edge layer, the mid-tier shield layer, the origin connectivity layer, and the control plane.

Edge Layer

Edge servers sit in points of presence (PoPs) close to users. Each PoP contains a cluster of servers that cache content, terminate TLS, and apply security rules. Requests that miss the edge cache are forwarded to the shield layer or directly to the origin.

Shield Layer (Mid-Tier Cache)

Origin shield servers sit between edge and origin. Multiple edge locations share a common shield, which collapses duplicate cache misses into a single origin fetch. This dramatically reduces origin load during cache fills and invalidations.

Origin Connectivity

The CDN maintains persistent, optimized connections to customer origins. These connections use keepalive pools, connection multiplexing, and optimized TCP settings to reduce round trips.

Control Plane

The control plane manages configuration distribution, cache invalidation, certificate provisioning, and analytics collection. Configuration changes propagate to all edges within seconds.

Detailed Design

Request Routing

When a user requests cdn.example.com/images/logo.png, the request must reach the nearest edge server. Two complementary mechanisms handle this.

Anycast Routing

All edge locations announce the same IP address via BGP anycast. The internet's routing infrastructure naturally sends packets to the nearest announcing location. This is the primary routing mechanism and requires no DNS tricks.

User in Tokyo -> BGP routes to Tokyo PoP (same IP as all other PoPs)
User in London -> BGP routes to London PoP (same IP)
User in Sao Paulo -> BGP routes to Sao Paulo PoP (same IP)

Anycast works well for TCP because connections are sticky once established. It handles failover automatically: if a PoP goes down, BGP withdraws the route and traffic shifts to the next nearest PoP within seconds.

Cloudflare's entire network is built on Anycast -- every one of their 300+ data centers announces the same IP ranges, so every server in the network can handle any customer's request. This architecture means Cloudflare does not need per-customer DNS configuration or traffic steering logic; BGP routing handles it inherently. It also makes their DDoS mitigation highly effective, because attack traffic is automatically spread across the entire network rather than concentrated on a single location.

GeoDNS

For more granular control, GeoDNS resolves the same hostname to different IPs based on the resolver's location. This is used when different PoPs have different IP ranges or when traffic needs to be steered away from overloaded locations.

dns.example.com query from US-East resolver -> 198.51.100.1 (Virginia PoP)
dns.example.com query from EU-West resolver -> 198.51.100.2 (Frankfurt PoP)

In practice, most large CDNs use anycast as the primary mechanism and GeoDNS for special cases (regional overrides, traffic engineering).

Edge Server Architecture

Each edge server runs a reverse proxy (similar to Nginx or Envoy) with a multi-tier local cache.

Request flow within an edge server:
  1. TLS termination
  2. DDoS/WAF filtering
  3. Cache lookup (memory -> SSD -> miss)
  4. If hit: serve from cache
  5. If miss: forward to shield or origin
  6. Store response in cache
  7. Return to client

Memory Cache (L1): Hot objects in RAM. Fast but limited capacity. Typically the top 1-5% of objects by request frequency. Implemented as an LRU or LFU hash map.

SSD Cache (L2): Warm objects on NVMe SSDs. Larger capacity (several TB per server). Objects evicted from L1 fall here. Random read latency under 100 microseconds.

Cache Key Construction:

Default key: HTTP method + Host header + URL path + query string
Custom keys: can include headers, cookies, or device type
Example: GET|cdn.example.com|/images/logo.png|?v=3

Vary headers are respected. A request with Accept-Encoding: gzip produces a different cache entry than one without.

Origin Shielding

Without a shield, every edge PoP that gets a cache miss sends a request to the origin. For a popular new object, 200 PoPs each send a request simultaneously, creating a thundering herd.

The shield layer solves this:

Without shield:
  200 edge PoPs -> 200 requests to origin (thundering herd)

With shield:
  200 edge PoPs -> 3-5 shield regions -> 1-2 requests to origin

Each edge location is mapped to a shield region. On a cache miss, the edge asks the shield first. The shield coalesces concurrent requests for the same object (request collapsing) so that only one request reaches the origin.

Netflix takes origin shielding to an extreme with their Open Connect program. Rather than relying solely on a traditional CDN, Netflix deploys custom appliances called Open Connect Appliances (OCAs) directly inside ISP networks. These appliances are essentially dedicated cache servers pre-loaded with Netflix content during off-peak hours, so that during peak viewing, the vast majority of video traffic never leaves the ISP's own network. This approach reduces Netflix's transit costs while delivering better quality to users, since the content originates from a server just one or two network hops away.

Request collapsing at the shield:
  Edge-Tokyo requests /video/abc.mp4    -> shield queues it
  Edge-Osaka requests /video/abc.mp4    -> shield queues it (coalesced)
  Edge-Seoul requests /video/abc.mp4    -> shield queues it (coalesced)
  Shield fetches from origin once
  Shield responds to all three edges simultaneously

Cache Invalidation

Invalidation is one of the hardest problems in CDN design. The system supports three invalidation methods.

Purge by URL: Removes a single object from all edges and shields.

POST /v1/purge
{
  "type": "url",
  "url": "https://cdn.example.com/images/logo.png"
}

Purge by Prefix: Removes all objects matching a URL prefix.

POST /v1/purge
{
  "type": "prefix",
  "prefix": "https://cdn.example.com/images/"
}

Purge by Cache Tag: Objects are tagged at ingestion (via a Cache-Tag response header from the origin). Purging a tag invalidates all objects with that tag.

Origin response header: Cache-Tag: product-123, category-shoes
Purge request: {"type": "tag", "tag": "product-123"}

Invalidation propagates via a pub/sub system. The control plane publishes the purge command to a topic. Every edge server subscribes and processes purges locally. Global propagation completes in under 5 seconds for most CDNs.

For URL and prefix purges, edges can act immediately (delete from local cache). For tag-based purges, edges maintain a tag-to-URL index or use soft purges (mark as stale, revalidate on next request).

Cache-Control Strategies

The CDN respects standard HTTP cache-control headers from the origin, with customer-configurable overrides.

Cache-Control header hierarchy:
  1. CDN edge rule overrides (customer-configured)
  2. s-maxage directive (shared cache TTL)
  3. max-age directive (general cache TTL)
  4. Expires header (legacy)
  5. CDN default TTL (e.g., 24 hours for static assets)

Stale-while-revalidate:
  Cache-Control: max-age=3600, stale-while-revalidate=86400
  -> Serve stale content while fetching fresh copy in background
  -> Users never wait for origin during revalidation

Stale-if-error:
  Cache-Control: max-age=3600, stale-if-error=86400
  -> If origin is down, serve stale content for up to 24 hours
  -> Graceful degradation when origins fail

Customers configure cache rules in the control plane:

Rule: Match path /api/* -> bypass cache (pass to origin)
Rule: Match path /static/* -> cache 30 days, ignore query string
Rule: Match extension .html -> cache 1 hour, stale-while-revalidate 4 hours

TLS Termination

TLS is terminated at the edge to avoid additional round trips to the origin for the handshake.

Client <--TLS--> Edge Server <--TLS (optional)--> Origin

Edge responsibilities:
  - Manage certificates (automated provisioning via Let's Encrypt or customer upload)
  - Support TLS 1.3 with 0-RTT resumption
  - OCSP stapling to avoid extra lookups
  - Cipher suite selection optimized per client

Certificate provisioning is automated. When a customer onboards a domain, the control plane provisions a certificate, distributes it to all edges, and handles renewal. Shared certificates using Subject Alternative Names (SANs) reduce the number of certificates for small customers. Large customers get dedicated certificates.

Session tickets and TLS 1.3 0-RTT eliminate handshake overhead on repeat connections, which matters significantly when users load pages with dozens of CDN-served assets.

DDoS Protection

The CDN is inherently DDoS-resistant because traffic is distributed across hundreds of PoPs. Additional protections include:

Network-Layer (L3/L4) Protection: Anycast absorbs volumetric attacks across all PoPs. Each PoP applies rate limiting and drops packets matching known attack signatures (SYN floods, UDP amplification). Hardware-level filtering (FPGA/ASIC) handles high packet rates without burdening the application layer.

Application-Layer (L7) Protection: The edge proxy inspects HTTP requests for attack patterns. A Web Application Firewall (WAF) applies rule sets (OWASP Top 10, custom rules). Challenge pages (JavaScript challenges, CAPTCHAs) filter bot traffic. Rate limiting per IP, per session, and per endpoint prevents abuse.

Akamai pioneered edge computing for security and content manipulation with their EdgeWorkers platform, which allows customers to run custom JavaScript logic directly on edge servers. This enables use cases beyond simple caching -- such as A/B testing, request routing, authentication checks, and response transformation -- all executed at the edge without a round trip to the origin. Akamai's edge platform processes over 30% of global web traffic daily, making their distributed computing layer one of the largest serverless platforms in existence.

DDoS mitigation pipeline:
  1. BGP anycast distributes attack traffic globally
  2. Network ACLs drop known-bad traffic at the NIC
  3. SYN cookies handle SYN flood without state
  4. Rate limiter caps requests per source IP
  5. WAF inspects HTTP layer for attack signatures
  6. Bot detection challenges suspicious clients
  7. Clean traffic reaches cache/origin

Analytics & Monitoring

Every edge server emits request logs and metrics. These feed into a central analytics pipeline.

Metrics per PoP:
  - Request rate (hits/sec)
  - Cache hit ratio (%)
  - Bandwidth (Gbps)
  - Error rate (4xx, 5xx)
  - P50/P95/P99 latency

Metrics per customer:
  - Bandwidth consumed
  - Cache hit ratio
  - Top requested URLs
  - Origin response times

Logs are sampled (e.g., 1% sample rate) and shipped to a centralized store for querying. Real-time metrics use a push-based system (StatsD/Prometheus).

Trade-offs & Alternatives

Push vs Pull Caching

Pull (lazy) caching fetches from origin on first request. Push (proactive) caching preloads content onto edges before users request it. Pull is simpler and handles the long tail naturally. Push is useful for live events (a new video release) where you know demand will spike. Most CDNs default to pull with optional push for premium customers.

Consistent Hashing vs Full Replication

Within a PoP, you can replicate cached content across all servers (every server has every object) or use consistent hashing (each object lives on a subset of servers). Full replication wastes memory but simplifies request routing. Consistent hashing is more memory-efficient but requires an internal load balancer that is cache-aware. Large CDNs use consistent hashing within PoPs.

Anycast vs DNS-Based Routing

Anycast is simpler and handles failover automatically via BGP. DNS-based routing offers finer control (weighted traffic steering, latency-based routing) but has TTL-related propagation delays. The best approach combines both: anycast for default routing, DNS overrides for traffic engineering.

Single-Tier vs Multi-Tier Caching

A single-tier cache (edge only) is simpler. A multi-tier cache (edge + shield) reduces origin load dramatically but adds latency on shield misses. At scale, the origin load reduction justifies the extra tier. Cloudflare and CloudFront both use multi-tier caching.

Bottlenecks & Scaling

Cache Stampede on Popular New Content

When a viral piece of content is first published, thousands of edges may request it simultaneously. Origin shielding with request collapsing handles most cases. For extreme scenarios (live event start), pre-warm the shield layer by pushing content before the event.

Global Configuration Propagation

Purge and config changes must reach all 200+ PoPs within seconds. A pub/sub system (Kafka, or a custom gossip protocol) distributes changes. Each edge confirms receipt. If an edge is unreachable, it catches up when it reconnects via a persistent log.

TLS Certificate Distribution

Managing millions of certificates across hundreds of PoPs requires a reliable distribution mechanism. Certificates are pushed from the control plane, not pulled, to ensure edges have certs before traffic arrives. A versioned certificate store on each edge allows atomic swaps.

Origin Overload During Invalidation

A mass purge (purge by prefix matching millions of objects) causes a surge of origin fetches as edges re-fill their caches. Stale-while-revalidate helps by serving old content while fetching new. Soft purges (mark stale instead of delete) ensure users are never left waiting.

Edge Server Failures

Individual server failures within a PoP are handled by the PoP's internal load balancer, which health-checks servers and removes unhealthy ones. Entire PoP failures are handled by anycast BGP withdrawal, redirecting traffic to the next closest PoP. Capacity planning ensures neighboring PoPs can absorb the traffic.

Common Pitfalls

Caching content with user-specific data: a response containing a session cookie or personal information must never be cached on a shared edge. Always check Vary and Set-Cookie headers before caching.
Ignoring cache key collisions: two different resources that produce the same cache key return wrong content. Ensure cache keys account for all relevant dimensions (query params, accept-encoding, device type).
TTLs that are too long: aggressive caching improves hit rates but makes updates slow to propagate. Use stale-while-revalidate rather than extremely long TTLs to balance freshness and performance.
No origin shield: without a shield layer, the origin bears the full brunt of cache misses multiplied by the number of edge locations. This is the most common scaling mistake in CDN design.
Treating all content the same: static images, dynamic API responses, and streaming video have fundamentally different caching characteristics. Apply different cache policies per content type.
Skipping connection reuse to origin: opening a new TCP + TLS connection to the origin for every cache miss adds hundreds of milliseconds. Maintain persistent connection pools between edges, shields, and origins.

Key Takeaways

A CDN is a globally distributed reverse proxy with caching. The core value proposition is reducing latency by serving content from nearby edge servers.
Anycast routing is the simplest and most resilient way to direct users to the nearest edge. GeoDNS supplements it for fine-grained traffic control.
Origin shielding with request collapsing is essential at scale. Without it, the origin becomes the bottleneck during cache fills and invalidations.
Cache invalidation is the hardest operational problem. Support multiple invalidation strategies (URL, prefix, tag) and propagate globally in seconds.
TLS termination at the edge is non-negotiable for performance. Modern TLS 1.3 with 0-RTT makes it nearly free.
The CDN's distributed nature provides inherent DDoS resistance. Layer additional protections (WAF, rate limiting, bot detection) at the edge where capacity is abundant.