Common Questions

There are roughly 10-15 system design questions that appear repeatedly across companies. They are popular not because interviewers lack imagination, but because each one tests a different set of distributed systems concepts. Understanding what each question really tests — not just how to answer it — lets you pattern-match new questions to familiar territory.

The Top 10 Questions

1. URL Shortener (Design TinyURL)

What it really tests: Hashing, key generation, read-heavy systems, database choice, caching, analytics at scale.

Core challenge:
  Generate a short, unique key for each URL.
  Handle billions of reads for popular links.
  Track click analytics.

Key decisions:
  - Key generation: hash-based vs counter-based vs pre-generated key pool
  - Hash collisions: how to detect and handle them
  - Read/write ratio: heavily read-skewed, so caching matters
  - Database: key-value store (DynamoDB, Cassandra) is natural
  - Analytics: click tracking at high QPS needs async processing

Scale numbers:
  100M new URLs/month, 10B redirects/month
  Write QPS: ~40/sec, Read QPS: ~4,000/sec
  Storage: 100M * 1KB ~ 100GB/month

This is the easiest system design question and often the first one candidates practice. If the interviewer picks this, they will push deep on something — usually the key generation strategy, cache invalidation, or analytics pipeline.

2. Chat System (Design WhatsApp/Messenger)

What it really tests: Real-time communication, WebSockets, message ordering, presence, offline delivery, fan-out.

Core challenge:
  Deliver messages in real-time with guaranteed ordering.
  Handle offline users gracefully.
  Scale WebSocket connections to millions of concurrent users.

Key decisions:
  - Protocol: WebSocket for real-time, fallback to long polling
  - Message ordering: timestamp-based or sequence numbers per conversation
  - Storage: write-heavy, time-series access pattern (Cassandra, HBase)
  - Delivery: push via WebSocket if online, store-and-forward if offline
  - Group chat: fan-out on write vs fan-out on read
  - Presence: heartbeat-based, eventual consistency is fine

Scale numbers:
  50M DAU, 40 messages/user/day
  Write QPS: ~23,000/sec
  Concurrent WebSocket connections: ~5M at peak

Group chat introduces the fan-out problem. If a group has 500 members, do you write 500 copies of each message (fan-out on write) or write once and query for each reader (fan-out on read)? The answer depends on group size distribution.

3. News Feed (Design Facebook/Twitter Feed)

What it really tests: Fan-out, ranking, caching, push vs pull architecture, eventually consistent reads.

Core challenge:
  Aggregate posts from hundreds of followed users.
  Rank by relevance or chronology.
  Deliver within seconds of posting.

Key decisions:
  - Fan-out on write: pre-compute feeds when a post is created
    Good for users who follow < 1000 people
    Expensive for celebrities with millions of followers
  - Fan-out on read: compute feed at read time
    Good for celebrities, expensive for readers
  - Hybrid: fan-out on write for normal users, fan-out on read for
    celebrities (the "Twitter approach")
  - Ranking: chronological is simple, ML-based adds complexity
  - Caching: pre-computed feeds in Redis/Memcached

Scale numbers:
  500M DAU, average 300 follows/user
  Feed read QPS: ~500,000/sec
  New post QPS: ~50,000/sec

This question is fundamentally about the fan-out tradeoff. If you can articulate when to use push vs pull vs hybrid, you have nailed the core of it.

4. Notification System

What it really tests: Multi-channel delivery (push, SMS, email), rate limiting, user preferences, reliability, priority queues.

Core challenge:
  Deliver notifications across multiple channels reliably.
  Respect user preferences and do not spam.
  Handle high throughput with varying priority.

Key decisions:
  - Message queue per channel (push, SMS, email) for isolation
  - Priority levels: critical (password reset) vs marketing
  - Rate limiting: per-user, per-channel limits
  - Deduplication: same event should not trigger duplicate notifications
  - Template system: separate content from delivery logic
  - Retry with backoff: transient failures in downstream providers
  - User preferences: stored separately, checked before delivery

Architecture pattern:
  Event source -> Notification service -> Channel queues -> Workers -> Providers
  Each step is decoupled and independently scalable.

The interesting part of this question is reliability. What happens when the SMS provider is down? You need retry queues, dead letter queues, and fallback providers.

5. Rate Limiter

What it really tests: Algorithm design, distributed counting, consistency in distributed systems, API gateway patterns.

Core challenge:
  Limit requests to N per time window per user/IP/API key.
  Work correctly across multiple servers.
  Add minimal latency to the request path.

Key algorithms:
  - Token bucket (most common): capacity C, refill rate R tokens/sec
  - Sliding window counter: good balance of accuracy and memory
  - Fixed window: simple but spiky at boundaries

Key decisions:
  - Where: API gateway, middleware, or application layer
  - Storage: Redis (INCR + EXPIRE, Lua scripts for atomicity)
  - Race conditions: the core distributed systems challenge

This is deceptively simple. The algorithms are straightforward. The hard part is making it work correctly across multiple servers with minimal latency.

6. Web Crawler

What it really tests: Distributed systems, politeness, deduplication, URL frontier, fault tolerance, scale.

Core challenge:
  Crawl billions of web pages efficiently and politely.
  Do not re-crawl pages unnecessarily.
  Handle malformed HTML, infinite loops, and traps.

Key decisions:
  - URL frontier: priority queue with politeness constraints
    (do not hit same domain more than X times/second)
  - Deduplication: URL dedup (exact match + normalization)
    and content dedup (simhash or min-hash)
  - DNS resolution: cache aggressively, it is a bottleneck
  - Distributed coordination: partition URLs by domain
  - Robots.txt: fetch and cache per domain, respect it
  - Trap detection: URL pattern detection, depth limits

Architecture:
  URL frontier -> DNS resolver -> Fetcher -> Parser -> Dedup -> Storage
  Each component runs on many machines.
  The frontier is the brain of the crawler.

The URL frontier design is what separates good answers from mediocre ones. It needs to be distributed, prioritized (important pages first), and polite (respect per-domain rate limits).

7. Search Autocomplete (Typeahead)

What it really tests: Trie data structure, ranking, caching, latency optimization, data freshness.

Core challenge:
  Return the top 5 suggestions within 100ms as the user types.
  Suggestions ranked by popularity.
  Update rankings as search trends change.

Key decisions:
  - Data structure: trie with top-k results cached at each node
  - Storage: in-memory for speed, replicated for availability
  - Updates: do not update the trie on every search
    Aggregate search logs, rebuild/update trie periodically (every 15 min)
  - Caching: browser caches results for recent prefixes
    CDN caches popular prefixes
  - Multi-language: different tries per language
  - Personalization: blend global popularity with user history

Optimization:
  "fac" matches "facebook", "face mask", "factory"
  Precompute top results at each trie node.
  Serving is O(prefix length) to find the node, then O(1) to return results.

The interviewer often pushes on the update mechanism. You cannot rebuild a trie with billions of entries every time someone searches. The answer is offline aggregation with periodic updates and real-time blending for trending topics.

8. Video Streaming (Design YouTube/Netflix)

What it really tests: CDN, encoding pipeline, adaptive bitrate, storage at massive scale, metadata service.

Core challenge:
  Upload, encode, store, and stream video at scale.
  Adaptive quality based on bandwidth.
  Global distribution with low latency.

Key decisions:
  - Upload: chunked, resumable for reliability
  - Encoding: transcode to multiple resolutions/codecs via job queue
  - Storage: originals in blob storage (S3), encoded in CDN
  - Streaming: adaptive bitrate (HLS/DASH), client picks quality
  - CDN: popular videos at edge, long-tail from origin

The encoding pipeline is the most interesting part. A single video generates dozens of versions. Processing is asynchronous and can take minutes.

What it really tests: Geospatial indexing, real-time matching, location updates at scale, ETA calculation, supply-demand balancing.

Core challenge:
  Match riders to nearby drivers in real-time.
  Handle millions of location updates per second.
  Compute ETAs and routes.

Key decisions:
  - Geospatial index: geohash or quadtree for nearby driver queries
  - Location updates: drivers send GPS every 3-5 seconds (~250K writes/sec)
  - Matching: find available drivers within radius, rank by ETA
  - Surge pricing: real-time supply/demand by geohash region
  - Trip state machine: requested -> matched -> en route -> completed

Geospatial indexing is the core challenge. Geohash converts 2D coordinates into a 1D string that preserves locality. Nearby locations share prefixes, enabling range queries on a standard database.

10. Payment System (Design Stripe/PayPal)

What it really tests: Exactly-once processing, idempotency, consistency, audit trails, reconciliation, security.

Core challenge:
  Process payments reliably. Never lose money.
  Never charge a customer twice. Never fail silently.

Key decisions:
  - Idempotency: every payment request has a unique idempotency key
    If the same key is received twice, return the original result
  - Double-entry bookkeeping: every transaction has a debit and credit
    The ledger must always balance
  - State machine: pending -> processing -> completed/failed
    Transitions are atomic and logged
  - Reconciliation: periodically compare internal ledger with bank records
  - Retry logic: distinguish retryable (network timeout) from
    non-retryable (insufficient funds) failures
  - PCI compliance: never store raw card numbers
    Use tokenization (Stripe, Braintree)

Architecture:
  API -> Payment service -> Payment processor (Stripe/bank)
  Every state change written to append-only ledger.
  Async reconciliation job compares ledger with external records.

This question is different from the others because it prioritizes correctness over performance. The interviewer wants to hear about idempotency, exactly-once semantics, and how you prevent double-charging even during failures.

Mapping Questions to Concepts

Question              Primary concepts tested
URL shortener         Hashing, caching, key generation
Chat system           WebSocket, message ordering, fan-out
News feed             Fan-out, ranking, push vs pull
Notification system   Multi-channel delivery, reliability, queuing
Rate limiter          Distributed counting, algorithms, atomicity
Web crawler           Distributed coordination, dedup, politeness
Search autocomplete   Trie, caching, latency optimization
Video streaming       CDN, encoding pipeline, adaptive bitrate
Ride sharing          Geospatial indexing, real-time matching
Payment system        Idempotency, consistency, exactly-once

If you face a question you have not seen, map it to the closest one above. "Design a food delivery app" is ride sharing + a restaurant service. "Design Google Docs" is a chat system (real-time sync) + conflict resolution (CRDTs or OT). "Design a ticketing system" is a payment system + inventory management with concurrency.

How to Practice

Do not memorize architectures. Instead, for each question:

Identify the 2-3 core technical challenges
Know why specific technologies fit (not just which ones)
Practice articulating tradeoffs out loud
Time yourself — 35 minutes for the design, 10 minutes for Q&A

Practice with a partner if possible. System design is a conversation, and practicing alone builds different muscles than practicing with someone who pushes back.

Common Pitfalls

Memorizing a single "correct" design. The interviewer has seen every blog post you have read. They care about your reasoning, not your recall.
Going too broad. Listing cache, load balancer, CDN, message queue, and monitoring without explaining why any of them are needed. Each component must solve a specific problem.
Ignoring the unique challenge. Every question has a core tension. URL shortener is about key uniqueness. Chat is about real-time delivery. Payment is about correctness. Miss the core tension and the rest does not matter.
Not knowing the numbers. If you cannot estimate storage or QPS within an order of magnitude, you cannot make informed design decisions.
Skipping the data model. Many candidates draw boxes and arrows but never define what is stored. The data model drives query patterns, indexing, partitioning, and caching.

Key Takeaways

The top 10 questions cover the most important distributed systems concepts: fan-out, consistency, caching, real-time communication, geospatial indexing, idempotency, and distributed coordination.
Each question has a core challenge that the interviewer is testing. Identify it early and make it the focus of your design.
New questions are combinations of familiar patterns. "Design X" where X is unfamiliar becomes manageable when you decompose it into known building blocks.
Depth on the core challenge matters more than breadth across all components. A deep discussion of fan-out strategy in a news feed design is worth more than mentioning 15 services without explanation.
Practice articulating tradeoffs. "I chose X because of Y, at the cost of Z" is the sentence pattern that demonstrates engineering judgment.