System Design

System design creates the architecture for large-scale software systems — handling scalability, reliability, and performance.

Scalability

Vertical Scaling (Scale Up)

Add more resources to a single machine (more CPU, RAM, faster disk).

Limits: Hardware ceilings. Single point of failure. Cost increases non-linearly.

Horizontal Scaling (Scale Out)

Add more machines. Distribute load across them.

Challenges: Data distribution, consistency, network communication overhead, coordination.

Stateless services: Easy to scale horizontally — any instance can handle any request. Store state externally (database, cache, session store).

Load Balancing

Distribute requests across multiple servers. Covered in networking topic.

Key patterns: Round-robin, least connections, consistent hashing (for sticky sessions/caching), health checks.

Caching

Store frequently accessed data in a fast layer to reduce load on the primary data store.

Caching Strategies

Cache-aside (lazy loading): Application checks cache first. On miss, query database, store in cache.

FUNCTION GET_USER(id)
    IF CACHE.HAS(id) THEN RETURN CACHE.GET(id)
    user ← db.FIND_USER(id)
    CACHE.SET(id, user, ttl: 300)
    RETURN user

Read-through: Cache itself handles misses by querying the database.

Write-through: Writes go to both cache and database simultaneously. Cache is always up-to-date.

Write-behind (write-back): Writes go to cache immediately. Cache asynchronously writes to database. Faster writes but risk of data loss.

Cache Invalidation

The two hard problems in CS: Cache invalidation and naming things.

TTL-based: Entries expire after a time period. Simple but stale data during TTL.

Event-based: Invalidate when the underlying data changes. More complex but fresher data.

Versioning: Include a version in cache keys. New version = new key = no stale reads.

Cache Systems

Redis: In-memory data structure store. Strings, hashes, lists, sets, sorted sets. Persistence options (RDB, AOF). Clustering. Pub/sub.

Memcached: Simple key-value cache. Multi-threaded. No persistence. Simpler than Redis.

Application-level: In-process cache (HashMap, LRU cache). Fastest but not shared across instances.

Message Queues

Decouple producers and consumers. Enable async processing, load leveling, and reliability.

Systems

RabbitMQ: Traditional message broker. AMQP protocol. Routing, priorities, acknowledgments. Good for task queues.

Apache Kafka: Distributed event streaming platform. Append-only log. Consumer groups. High throughput (millions of messages/sec). Replay capability. Good for event sourcing, data pipelines.

NATS: Lightweight, high-performance. Simple pub/sub and request-reply. JetStream for persistence. Good for microservices.

Patterns

Work queue: Multiple consumers compete for messages. Each message processed by one consumer. Load distribution.

Pub/Sub: Publishers broadcast to topics. All subscribers receive all messages. Event notification.

Request-Reply: Synchronous-like communication over async messaging.

Dead letter queue (DLQ): Messages that fail processing go to a separate queue for manual inspection/retry.

Rate Limiting

Protect services from being overwhelmed.

Token bucket: Bucket of N tokens. Each request consumes a token. Tokens refill at rate R. Allows bursts up to N.

Sliding window: Count requests in a rolling time window. Reject if count exceeds limit.

Common limits: Per user, per API key, per IP, per endpoint.

Circuit Breaker

Prevent cascading failures by stopping requests to a failing service.

States:
CLOSED → (failures exceed threshold) → OPEN → (timeout) → HALF-OPEN
  ↑                                                            ↓
  └─── (success in half-open) ─────────────────────────────────┘
  └─── (failure in half-open) → OPEN

CLOSED: Normal operation. Track failures. If failures exceed threshold → OPEN.

OPEN: All requests immediately fail (or return fallback). No load on the failing service. After timeout → HALF-OPEN.

HALF-OPEN: Allow one test request. If successful → CLOSED. If failure → OPEN again.

API Design

REST

Resource-oriented. HTTP methods map to CRUD operations.

GET    /users          → list users
GET    /users/42       → get user 42
POST   /users          → create user
PUT    /users/42       → replace user 42
PATCH  /users/42       → update user 42
DELETE /users/42       → delete user 42

Principles: Stateless, uniform interface, resource-based URLs, proper HTTP status codes, HATEOAS (optional).

GraphQL

Client specifies exactly what data it needs. Single endpoint. Strongly typed schema.

Best for: Mobile clients (bandwidth-sensitive), complex data requirements, multiple client types.

gRPC

Binary protocol (protobuf) over HTTP/2. Streaming support. Code generation.

Best for: Service-to-service communication, high throughput, polyglot environments.

Database Patterns

Read Replicas

Primary handles writes. Replicas handle reads. Scales read-heavy workloads.

Write → Primary → Replicate → Replica 1 (reads)
                            → Replica 2 (reads)

Replication lag: Replicas may be seconds behind. Application must handle stale reads.

Sharding

Split data across multiple databases by a shard key. Covered in distributed databases.

Challenges: Cross-shard queries, rebalancing, hot shards.

Consistent Hashing

Distribute data across nodes. Adding/removing a node only affects ~1/N of the data.

Used in: Distributed caches (Memcached), distributed databases (Cassandra, DynamoDB), load balancers.

Distributed Locking

Coordinate access to shared resources across multiple instances.

Redis-based (Redlock): Acquire lock on majority of Redis instances. Set TTL. Release when done.

ZooKeeper/etcd: Distributed coordination with strong consistency guarantees.

Caution: Distributed locks are tricky. Clock skew, network partitions, and process pauses can cause issues. Use fencing tokens for safety.

Idempotency

An operation is idempotent if executing it multiple times produces the same result as executing it once.

Why it matters: In distributed systems, messages can be delivered more than once (at-least-once delivery). Idempotent operations are safe to retry.

Implementation: Use an idempotency key (unique request ID). Store processed keys. On duplicate → return cached result.

POST /payments
Idempotency-Key: abc-123
Body: { "amount": 100, "to": "bob" }

# First call: process payment, store result for key abc-123
# Second call (retry): detect abc-123 already processed, return cached result

Retry Patterns

Exponential backoff: Wait 1s, 2s, 4s, 8s, ... between retries. Jitter (random delay) prevents thundering herd.

delay ← base_delay * 2^attempt + RANDOM_JITTER()

Circuit breaker + retry: Don't retry if the circuit is open.

Retry budget: Limit total retries to prevent amplifying failures.

System Design Process

Clarify requirements: Functional (what the system does) and non-functional (scale, latency, availability).
Estimate scale: Users, requests/sec, data size, growth rate.
Define API: Endpoints, data model.
High-level design: Components and their interactions.
Detailed design: Database schema, caching strategy, algorithms.
Bottlenecks and tradeoffs: Identify and address scaling challenges.
Monitoring and alerting: How to know when things go wrong.

Applications in CS

Interview preparation: System design interviews test architecture thinking at scale.
Production systems: Every web service needs load balancing, caching, and monitoring.
Startup engineering: Make pragmatic tradeoffs — start simple, scale as needed.
Platform engineering: Build internal platforms (databases-as-service, deployment pipelines, observability).