System Design Case Studies

The System Design Process

Every system design follows the same framework, whether you are designing a URL shortener or a global chat platform. The steps are:

Requirements clarification — What exactly are we building? What are the constraints?
Back-of-the-envelope estimation — How much storage, bandwidth, and compute do we need?
High-level architecture — What are the major components and how do they communicate?
Data model — What does the schema look like? Which database(s) do we use?
API design — What endpoints do clients call?
Deep dives — Scale bottlenecks, edge cases, failure modes

This document walks through three classic systems using this framework, then examines real-world architectures from Twitter and WhatsApp.

Case Study 1: URL Shortener

Requirements

Functional:

Given a long URL, generate a short URL (e.g., https://short.ly/abc123)
Given a short URL, redirect to the original long URL
Optional: custom short codes, expiration, click analytics

Non-functional:

Very high read throughput (redirects are far more common than creates)
Low latency redirects (under 50ms)
High availability (a broken shortener means broken links everywhere)

Scale assumptions:

100 million URLs created per month
Read:write ratio of 100:1 (10 billion redirects per month)

Estimation

Storage:

Per URL: long URL (500 bytes avg) + short code (7 bytes) + metadata (50 bytes) = ~560 bytes
100M URLs/month x 12 months x 5 years = 6 billion URLs
6B x 560 bytes = ~3.4 TB total storage over 5 years
Verdict: Fits on a single database with SSD. No sharding needed initially.

Throughput:

Writes: 100M/month = ~40 writes/sec (very low)
Reads: 10B/month = ~3,800 reads/sec average, ~10,000 reads/sec peak
Verdict: A single server with caching handles this easily.

Short code space:

Using Base62 (a-z, A-Z, 0-9) with 7 characters: 62^7 = 3.5 trillion possible codes
At 100M URLs/month, this lasts thousands of years. No collision concerns.

Architecture

┌─────────┐     ┌──────────┐     ┌────────────┐     ┌──────────────┐
│  Client  │────→│   CDN    │────→│ API Server │────→│   Database   │
│          │     │ (cached  │     │            │     │ (PostgreSQL) │
│          │←────│ redirects│←────│  + Redis   │←────│              │
└─────────┘     └──────────┘     └────────────┘     └──────────────┘

Components:

CDN: Caches redirect responses (301/302) at the edge. For popular short URLs, the CDN serves the redirect without ever hitting the origin.
API Server: Handles URL creation and redirect lookups. Stateless — any instance can handle any request.
Redis Cache: Caches short-code-to-URL mappings. With 3.4 TB of total data but a power-law distribution (a small fraction of URLs get most traffic), a 50 GB Redis instance caches the hot set with 95%+ hit rate.
PostgreSQL: Stores all URL mappings, metadata, and analytics data.

Data Model

CREATE TABLE urls (
    id          BIGSERIAL PRIMARY KEY,
    short_code  VARCHAR(10) UNIQUE NOT NULL,
    long_url    TEXT NOT NULL,
    created_at  TIMESTAMP NOT NULL DEFAULT NOW(),
    expires_at  TIMESTAMP,         -- NULL means never expires
    user_id     BIGINT,            -- NULL for anonymous
    click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_urls_short_code ON urls (short_code);

The short_code column has a unique index for fast lookups during redirects. The primary key is a BIGSERIAL for internal use and efficient JOINs.

API Design

Create a short URL:

POST /api/v1/urls
Content-Type: application/json

{
    "long_url": "https://example.com/very/long/path?with=params",
    "custom_code": "mylink",     // optional
    "expires_in_days": 30        // optional
}

Response 201:
{
    "short_url": "https://short.ly/mylink",
    "short_code": "mylink",
    "expires_at": "2026-04-23T00:00:00Z"
}

Redirect:

GET /abc123

Response 301:
Location: https://example.com/very/long/path?with=params

Use 301 (permanent redirect) if you want browsers to cache the redirect (fewer server hits). Use 302 (temporary redirect) if you need to track every click (browsers will hit the server each time).

Short Code Generation

Two approaches:

Approach 1: Base62 encode an auto-increment ID.

URL gets ID 12345 in the database
Base62(12345) = "dnh"
Pad to minimum length: "0000dnh"
Pros: No collisions, simple, sequential
Cons: Predictable (users can guess other URLs by incrementing)

Approach 2: Random generation with collision check.

Generate a random 7-character Base62 string
Check if it exists in the database; if so, regenerate
With 3.5 trillion possible codes and 6 billion URLs, collision probability is ~0.0002%
Pros: Unpredictable
Cons: Requires a uniqueness check (cheap with an index)

Key Design Decisions

Read-heavy workload means caching is the primary scaling lever. Redis + CDN handles the vast majority of traffic.
301 vs 302 depends on whether you need click analytics. 302 forces every click through your server.
Database choice is simple: this is a key-value lookup pattern. PostgreSQL works. DynamoDB or Cassandra work if you need global distribution.

Case Study 2: Real-Time Chat System

Requirements

Functional:

One-on-one messaging and group chats (up to 500 members)
Message delivery (online and offline users)
Message history (scrollback)
Online/offline status (presence)
Read receipts

Non-functional:

Real-time delivery (under 200ms for online users)
1 million concurrent WebSocket connections
Messages must not be lost (at-least-once delivery)
Message ordering within a conversation

Scale assumptions:

500 million users, 100 million daily active
1 million concurrent connections
50 billion messages per day (average 500 messages per active user)

Estimation

Message storage:

Average message: 200 bytes (text) + 100 bytes (metadata) = 300 bytes
50B messages/day x 300 bytes = 15 TB/day
1 year retention: ~5.5 PB
Verdict: Needs a distributed database optimized for writes (Cassandra, ScyllaDB)

Connection handling:

1M concurrent WebSocket connections
Each connection uses ~10 KB of memory
1M x 10 KB = 10 GB RAM for connections alone
A single server can handle ~50K-100K WebSocket connections
Verdict: Need 10-20 WebSocket gateway servers

Message throughput:

50B messages/day = ~580K messages/sec
Verdict: Needs a high-throughput message bus (Kafka) for fanout

WebSocket Gateway: Maintains persistent WebSocket connections with clients. Each gateway instance handles 50-100K connections. Stateless in the sense that any gateway can serve any user — the mapping of user-to-gateway is stored in Redis.

Message Service: Receives messages from senders, persists them, and publishes them to Kafka for delivery to recipients.

Kafka: Distributes messages to the correct WebSocket gateway instances. When User A sends a message to User B, the message goes through Kafka to the gateway instance holding User B's connection.

Cassandra: Stores message history. Cassandra is ideal here because:

Write-heavy workload (messages are written far more than read)
Natural partitioning by conversation ID
Time-ordered data within a partition (messages in a chat sorted by timestamp)
Horizontal scaling by adding nodes

Redis: Stores session data (which gateway holds each user's connection) and presence information (online/offline status with TTL).

Data Model (Cassandra)

-- Messages partitioned by conversation, ordered by time
CREATE TABLE messages (
    conversation_id UUID,
    message_id      TIMEUUID,
    sender_id       UUID,
    content         TEXT,
    message_type    TEXT,         -- 'text', 'image', 'file'
    created_at      TIMESTAMP,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

-- Conversations per user (for the chat list screen)
CREATE TABLE user_conversations (
    user_id             UUID,
    last_message_at     TIMESTAMP,
    conversation_id     UUID,
    conversation_name   TEXT,
    last_message_preview TEXT,
    unread_count        INT,
    PRIMARY KEY (user_id, last_message_at)
) WITH CLUSTERING ORDER BY (last_message_at DESC);

The messages table partitions by conversation_id so all messages in a chat are co-located on the same Cassandra node. Within a partition, messages are sorted by time (descending for efficient "load recent messages" queries).

Message Delivery Flow

Sending a message (User A to User B):

User A's client sends the message over WebSocket to a gateway
The gateway forwards it to the Message Service, which persists it in Cassandra
The Message Service publishes the message to Kafka (key: conversation_id)
A consumer looks up User B's gateway in Redis
If online: push over WebSocket. If offline: send a push notification (APNs/FCM)

Group messages follow the same flow for every member. Kafka handles the fanout.

Presence System

Online/offline status uses Redis with TTL. On connect, set presence:{user_id} = {gateway_id} with a 30-second TTL. The client heartbeats every 20 seconds, resetting the TTL. On disconnect or crash, the key expires and the user appears offline.

API Design

WebSocket messages: JSON frames for send, receive, read receipts, and typing indicators — each with a type field, conversation_id, and relevant payload.

REST endpoints (non-real-time):

GET  /api/v1/conversations                    -- List user's conversations
GET  /api/v1/conversations/{id}/messages      -- Message history (paginated)
POST /api/v1/conversations                    -- Create a group conversation

Case Study 3: News Feed System

Requirements

Functional:

Users see a feed of posts from people they follow
Users can post text, images, and videos
Feed is personalized (ranked, not purely chronological)

Non-functional:

Feed loads in under 500ms
500 million users, 200 million daily active
Average user follows 200 accounts
Peak: 50,000 feed requests per second

Scale assumptions:

200M DAU x average 10 feed loads per day = 2 billion feed loads per day
10M new posts per day
Average post seen by 100-500 followers

The Core Problem: Fan-Out

When User A creates a post, how do we get it into all their followers' feeds? This is the fan-out problem, and it has two fundamental approaches.

Fan-Out on Write (Push Model)

When a user posts, immediately write that post reference into every follower's pre-computed feed cache.

Pro: Feed reads are instant (pre-computed cache lookup)
Con: Celebrity problem — a user with 50M followers generates 50M writes per post. Wasted work for inactive followers.

Fan-Out on Read (Pull Model)

When a user opens their feed, compute it on the fly by fetching recent posts from all followed users and merging them.

Pro: No wasted computation for inactive users. Celebrity posts written once.
Con: Slow feed generation (fetch and merge from 200+ users). High database load at read time.

Hybrid Approach (What Facebook/Twitter/Instagram Use)

The production solution combines both approaches based on the user who is posting:

Regular users (< 10K followers): Fan-out on write. Pre-compute followers' feeds at post time. This covers the vast majority of users.
Celebrities/influencers (> 10K followers): Fan-out on read. Do NOT write into millions of feeds. Instead, when a follower loads their feed, merge the pre-computed feed with real-time queries for celebrity posts.

User opens feed
    │
    ├──→ Read pre-computed feed from cache (posts from regular users)
    │
    ├──→ Fetch recent posts from followed celebrities (on-the-fly)
    │
    ▼
Merge, rank, return

This hybrid eliminates the celebrity problem while keeping feed reads fast for the common case (most content comes from regular users).

Architecture

┌─────────┐     ┌──────────────┐     ┌─────────────┐
│  Client  │────→│  API Gateway │────→│ Feed Service│
└─────────┘     └──────────────┘     └──────┬──────┘
                                            │
                    ┌───────────────────────┤
                    │           │            │
                    ▼           ▼            ▼
             ┌──────────┐ ┌─────────┐ ┌──────────┐
             │  Feed    │ │  Post   │ │  Social  │
             │  Cache   │ │  Store  │ │  Graph   │
             │ (Redis)  │ │ (DB)    │ │  Service │
             └──────────┘ └─────────┘ └──────────┘

Feed Cache (Redis): Stores pre-computed feeds as sorted sets. Each user's feed is a sorted set where the score is the post timestamp (or ranking score). ZREVRANGE user:123:feed 0 19 returns the 20 most recent items instantly.

Post Store: Stores the actual post content. PostgreSQL or Cassandra depending on scale. Posts are fetched by ID after the feed service determines which post IDs to show.

Social Graph Service: Stores follow relationships. Used to determine a user's followers (for fan-out on write) and a user's following list (for fan-out on read of celebrity posts).

Feed Ranking

Modern feeds are ranked, not chronological. The ranking pipeline: candidate generation (collect posts from cache + celebrity queries), scoring (ML model based on relevance, recency, relationship, content type), filtering (already seen, muted, policy violations), and final ranking with diversity rules. This pipeline runs in under 200ms for the typical case.

Real-World Deep Dives

Twitter's Timeline Architecture

Twitter's timeline went through multiple architectural generations:

Generation 1 — Pull model (2006-2009): When you opened Twitter, a query fetched tweets from all accounts you follow, sorted by time. As Twitter grew, this became too slow — a user following 500 accounts triggered 500 database lookups per feed load.

Generation 2 — Push model (2009-2012): Fan-out on write. When a user tweets, the tweet ID is written into every follower's home timeline cache (a Redis list). Feed reads became a simple Redis lookup. This worked until users like Lady Gaga (50M+ followers) joined. A single tweet from a celebrity triggered 50 million Redis writes, taking minutes to fan out.

Generation 3 — Hybrid (2012-present): Regular users: fan-out on write (push into follower timelines). Celebrity users (configurable threshold): fan-out on read (fetch their tweets at read time and merge with the pre-computed timeline). The threshold is dynamic and based on the follower count.

Key optimization: Timelines store tweet IDs only (not full objects). Tweet content is fetched separately and cached. Timelines are pre-computed to a fixed depth (~800 tweets). Muted/blocked accounts are filtered at read time rather than maintained in the push pipeline.

WhatsApp Architecture: 2 Billion Users, Minimal Servers

WhatsApp's architecture is a case study in efficiency and simplicity.

Technology choices: Erlang/OTP for the messaging backend (designed for telecom, excels at millions of concurrent lightweight processes — a single node handles 2-3M connections), FreeBSD tuned for high connection counts, and Mnesia (Erlang's built-in distributed database) for session data.

Architecture principles:

Store and forward: Messages are stored temporarily on the server and forwarded to the recipient. Once delivered, they are deleted. WhatsApp does not store long-term message history — messages live on devices.
End-to-end encryption (Signal Protocol): The server never sees message content, simplifying architecture to a pure routing layer for opaque encrypted blobs.
Connection efficiency: Single persistent connection per user with minimal keepalive packets, delta-based presence updates, and compressed payloads.

Scale numbers: 2 billion users, 100 billion messages/day at peak, approximately 50 engineers and a few hundred servers before Facebook acquisition. The efficiency comes from Erlang's concurrency model, store-and-forward with deletion (minimal storage), and the server doing minimal per-message processing.

Lesson: Choosing the right technology (Erlang), the right architecture (store-and-forward), and deliberately limiting scope (no server-side search or long-term storage) enables extraordinary scale with minimal infrastructure.

Cross-Cutting Design Considerations

These apply to all three case studies and to most system designs:

Idempotency: Clients will retry failed requests. Use client-generated idempotency keys so the server deduplicates on retry (no double-sent messages, no duplicate short URLs).
Rate limiting: All public APIs need rate limiting at the API gateway (token bucket is standard).
Monitoring: Track latency percentiles (p50, p95, p99), error rates, queue/consumer lag, cache hit rates, and resource saturation for every service.
Graceful degradation: When a component fails, degrade rather than crash. If the presence service is down, messages still deliver. If ranking is down, show a chronological feed.

Key Takeaways

Start every design with requirements and estimation. The numbers determine the architecture.
The fan-out problem (push vs pull vs hybrid) is central to any social/feed system. The hybrid approach is the industry standard.
Technology choices matter enormously. WhatsApp's use of Erlang is not incidental — it is the reason they serve 2 billion users with a small team.
Real systems evolve through multiple architectures. Twitter's timeline went through three generations. Design for the current scale with a clear path to the next.
Caching is the primary scaling lever for read-heavy systems (URL shortener, news feed). Message queues and streaming are the primary scaling lever for write-heavy systems (chat).
Every system design involves trade-offs. Document them explicitly so future engineers understand why decisions were made.