Scaling Patterns

As systems grow, naive architectures break down. This file covers advanced patterns that let systems scale beyond what straightforward request-response architectures can handle: CQRS, event sourcing, materialized views, async processing, and back-pressure.

These patterns are not needed at every scale. Each adds complexity. Use them when a specific bottleneck demands it.

CQRS (Command Query Responsibility Segregation)

CQRS separates read operations (queries) from write operations (commands) into distinct models or even distinct services.

The Problem CQRS Solves

In a traditional architecture, the same database schema and the same service handle both reads and writes. This creates tension:

Reads want denormalized, pre-joined data for fast queries
Writes want normalized data for consistency and simplicity
Read and write workloads have different scaling characteristics

How It Works

Traditional:
  [API] -> [Single Model] -> [Single Database]

CQRS:
  Write path: [API] -> [Command Handler] -> [Write Store]
  Read path:  [API] -> [Query Handler]   -> [Read Store]

  An event or sync mechanism keeps the read store updated from the write store.

Benefits

Scale reads and writes independently (add read replicas without affecting write path)
Optimize read and write models separately (normalized writes, denormalized reads)
Different storage technologies for each side (PostgreSQL for writes, Elasticsearch for reads)

Real-World: Twitter Timelines

Twitter's home timeline is a classic CQRS example. When a user tweets (write), it is stored in the tweet table. A fan-out service then writes the tweet ID into each follower's timeline cache (read store). When a user opens their timeline (read), it reads from the precomputed cache, not by querying all followed users' tweets.

For users with millions of followers (celebrities), the fan-out is too expensive. Twitter uses a hybrid: celebrity tweets are fetched at read time and merged with the precomputed timeline.

When to Use CQRS

Read and write workloads differ by orders of magnitude
Read queries require complex denormalized views
You need different storage technologies for reads and writes
You're already using event sourcing (CQRS pairs naturally with it)

When to Avoid

Simple CRUD applications where reads and writes are balanced
Small teams that don't need the operational overhead of two data paths

Event Sourcing

Event sourcing stores every state change as an immutable event rather than overwriting the current state.

Traditional vs Event-Sourced

Traditional:
  UPDATE accounts SET balance = 150 WHERE id = 42;
  (Previous balance is lost)

Event sourced:
  Event 1: AccountCreated { id: 42, balance: 0 }
  Event 2: MoneyDeposited { id: 42, amount: 200 }
  Event 3: MoneyWithdrawn { id: 42, amount: 50 }
  (Current balance derived by replaying events: 0 + 200 - 50 = 150)

Benefits

Full audit trail. Every change is recorded. Critical for finance, healthcare, and compliance.
Time travel. Rebuild state at any point in time by replaying events up to that moment.
Event replay. If you add a new feature that needs historical data, replay all events through the new logic.
Debugging. Reproduce bugs by replaying the exact sequence of events.

Challenges

Event schema evolution. Changing the structure of events is hard when you have billions of immutable events. Use versioned schemas.
Replay time. Replaying millions of events to rebuild state is slow. Use snapshots (periodically save current state) to truncate replay.
Eventual consistency. Read models built from events are eventually consistent with the write model.

Real-World: Event Sourcing at Scale

Banking systems use event sourcing for the immutable audit trail of every transaction.
Kafka is often the event store — events are appended to topics and consumed by services that build read models.
Datomic is a database built around immutable facts (events) with time-travel queries.

Materialized Views

A materialized view is a precomputed query result stored as a table. Instead of running an expensive query at read time, you compute it once and serve the stored result.

How They Work

Base tables:
  orders (id, user_id, product_id, amount, created_at)
  products (id, name, category)

Materialized view:
  daily_sales_by_category (date, category, total_amount, order_count)

Instead of:
  SELECT category, SUM(amount), COUNT(*)
  FROM orders JOIN products ON ...
  WHERE created_at >= today
  GROUP BY category

Serve directly from the materialized view: single row lookup per category.

Refresh Strategies

Full refresh: Recompute the entire view periodically (every 5 minutes, hourly). Simple but expensive.
Incremental refresh: Update only the rows affected by new data. Faster but more complex.
Stream-based: Consume change events from the source tables and update the view in real time. Used in Kafka Streams, ksqlDB, and Flink.

In CQRS Architecture

Materialized views are the natural read model in a CQRS system. The write store emits events; consumers build materialized views optimized for specific query patterns.

Write store: normalized order events
Materialized view 1: orders-by-user (for "my orders" page)
Materialized view 2: orders-by-product (for product analytics)
Materialized view 3: daily-revenue (for dashboard)

Real-World: LinkedIn Feed

LinkedIn precomputes feed data into materialized views so that opening the app is a fast read rather than an expensive real-time computation across connections, posts, and relevance scores.

Async Processing

Asynchronous processing moves non-critical work out of the request path. The user gets a fast response; background workers handle the rest.

Synchronous vs Asynchronous

Synchronous (slow):
  User places order -> validate -> charge payment -> send email
                    -> update inventory -> generate invoice -> return 200
  Total: 800ms

Asynchronous (fast):
  User places order -> validate -> charge payment -> return 200
  Background: send email, update inventory, generate invoice
  Total visible to user: 200ms

Message Queues

The backbone of async processing. Producers enqueue work; consumers process it.

[API Server] -> [Queue: email-tasks] -> [Email Worker]
             -> [Queue: inventory-updates] -> [Inventory Worker]
             -> [Queue: invoice-generation] -> [Invoice Worker]

Benefits

Lower latency. Users don't wait for non-critical work.
Resilience. If the email service is down, messages stay in the queue and are processed when it recovers.
Load leveling. A traffic spike fills the queue; workers process at their own pace.
Independent scaling. Scale workers independently based on queue depth.

Patterns

Fire and forget: Enqueue and move on. No result expected.
Request-reply: Enqueue with a correlation ID. Worker posts the result to a reply queue. Used for long-running tasks where the client polls for completion.
Dead letter queue: Messages that fail processing N times are moved to a DLQ for investigation instead of blocking the queue.

Real-World: Shopify

Shopify processes order webhooks asynchronously. When an order is placed, the synchronous path handles payment and inventory reservation. Everything else (shipping label generation, merchant notification, analytics events) goes through background jobs backed by Redis queues.

Back-Pressure

Back-pressure is a mechanism for a system to signal that it is overloaded and slow down the rate of incoming work.

Why It Matters

Without back-pressure, an overloaded system accepts work faster than it can process, eventually running out of memory, crashing, or producing cascading failures.

Without back-pressure:
  Producer: 10,000 msg/sec
  Consumer: 5,000 msg/sec capacity
  Result: queue grows unboundedly -> OOM crash

With back-pressure:
  Producer: 10,000 msg/sec
  Consumer signals overload -> producer slows to 5,000 msg/sec
  Result: system stays healthy at reduced throughput

Back-Pressure Strategies

Bounded queues: Set a maximum queue size. When the queue is full, producers block or receive an error.

Queue capacity: 10,000 messages
When full:
  Option A: Producer blocks until space is available
  Option B: Producer receives 429 Too Many Requests
  Option C: Oldest messages are dropped (acceptable for metrics, not for orders)

Rate limiting: Limit the number of requests per second from each client. Return 429 when the limit is exceeded.

Load shedding: When the system is overwhelmed, drop low-priority requests and process only high-priority ones.

Reactive streams: Frameworks like Reactive Streams (Java), RxJS (JavaScript), and Akka Streams implement back-pressure at the library level. Consumers request only as many items as they can handle.

TCP-Level Back-Pressure

TCP has built-in back-pressure via flow control. The receiver advertises a window size; the sender cannot send more data than the window allows. This naturally slows down fast senders.

Real-World: Netflix

Netflix uses back-pressure extensively. Its Zuul gateway applies rate limiting and load shedding. If a backend service is slow, Zuul reduces traffic to it rather than queueing unboundedly. Client libraries (Hystrix, now Resilience4j) implement circuit breakers and bulkheads that create back-pressure at the service-to-service level.

Combining the Patterns

These patterns work together:

1. API receives a write command
2. Command handler validates and writes to the event store (event sourcing)
3. Events are published to a message queue (async processing)
4. Consumers build materialized views (CQRS read model)
5. Read API serves from materialized views
6. If consumers fall behind, back-pressure slows the event publishers

This architecture handles millions of writes per second with low-latency reads, full audit trails, and resilience to component failures.

Common Pitfalls

CQRS everywhere. Not every service needs separate read and write models. Start with a simple architecture and introduce CQRS only where read/write asymmetry is extreme.
Event sourcing without snapshots. Replaying millions of events to rebuild state is prohibitively slow. Take periodic snapshots.
Stale materialized views. If the refresh lag is too long, users see outdated data. Monitor refresh latency and alert on excessive lag.
Async without idempotency. Messages can be delivered more than once. Consumers must handle duplicates gracefully (idempotent processing).
No dead letter queue. Poison messages (messages that always fail) block the queue forever. Always configure a DLQ.
Ignoring back-pressure. Unbounded queues are a ticking time bomb. Set limits and handle the overflow deliberately.
Over-engineering for future scale. These patterns add complexity. If your system handles its current and near-future load without them, wait.

Key Takeaways

CQRS separates read and write paths, enabling independent optimization and scaling.
Event sourcing captures every state change as an immutable event, providing audit trails and time-travel debugging.
Materialized views precompute expensive queries, trading write-time computation for read-time speed.
Async processing moves non-critical work out of the request path, improving latency and resilience.
Back-pressure prevents overloaded systems from crashing by slowing producers when consumers can't keep up.
These patterns are powerful but complex. Introduce them incrementally when a specific bottleneck demands it.