Database Scaling

The database is the most common bottleneck in a growing system. Application servers are easy to scale horizontally — they are stateless. Databases hold state, which makes scaling them fundamentally harder. This file covers the techniques for scaling databases from a single instance to a globally distributed fleet.

Read Replicas

The simplest database scaling technique: replicate data from a primary (read-write) node to one or more secondary (read-only) nodes.

How It Works

Writes -> Primary DB -> Replication stream -> Replica 1 (reads)
                                           -> Replica 2 (reads)
                                           -> Replica 3 (reads)

The primary handles all writes. Replicas receive a continuous stream of changes and serve read queries. Since most applications are read-heavy (90%+ reads), replicas absorb the majority of query load. Facebook, for example, runs MySQL with a single primary and dozens of read replicas per cluster, routing the vast majority of their social graph reads to replicas while all writes go through the primary.

Replication Modes

Asynchronous: The primary doesn't wait for replicas to confirm. Fastest writes but replicas may lag behind (replication lag).
Synchronous: The primary waits for at least one replica to confirm before acknowledging the write. Stronger consistency but higher write latency.
Semi-synchronous: The primary waits for one replica (not all) to confirm. A practical middle ground used by MySQL's semi-sync replication.

Handling Replication Lag

Asynchronous replicas can be seconds behind the primary. This causes stale reads:

1. User updates their profile (write to primary)
2. User refreshes the page (read from replica)
3. Replica hasn't received the update yet -> user sees old data

Solutions:

Read-your-writes consistency: After a write, route that user's reads to the primary for a short window.
Causal consistency: Track the write's position in the replication log. Read from a replica only after it has caught up to that position.
Accept the lag: For non-critical reads (dashboards, feeds), a few seconds of staleness is often fine.

When Read Replicas Aren't Enough

Read replicas don't help with write-heavy workloads. If the primary is the bottleneck because of write volume, you need sharding.

Sharding Strategies

Sharding splits data across multiple independent database instances (shards). Each shard holds a subset of the data.

Hash-Based Sharding

Apply a hash function to the shard key, then mod by the number of shards.

shard_id = hash(user_id) % num_shards

user_id 1001 -> hash -> 7432 % 4 = 0 -> Shard 0
user_id 1002 -> hash -> 2891 % 4 = 3 -> Shard 3
user_id 1003 -> hash -> 5614 % 4 = 2 -> Shard 2

Pros: Even distribution across shards. Cons: Adding or removing shards remaps most keys. Range queries span all shards. Use consistent hashing to reduce remapping (see 03-data-partitioning).

Instagram uses a variant of hash-based sharding for their PostgreSQL databases, where each logical shard is mapped to one of several thousand PostgreSQL databases. Their shard IDs are embedded directly into generated primary keys (alongside a timestamp and auto-increment sequence), so the application can determine which shard holds a record just by looking at its ID -- no lookup service required.

Range-Based Sharding

Assign contiguous ranges of the shard key to each shard.

Shard 0: user_id 1 - 1,000,000
Shard 1: user_id 1,000,001 - 2,000,000
Shard 2: user_id 2,000,001 - 3,000,000

Pros: Range queries on the shard key are efficient (hit one shard). Easy to understand. Cons: Uneven distribution if key ranges have different densities. New users all hit the latest shard, creating a hot spot.

Geographic Sharding

Shard by region or country.

Shard US: users in North America
Shard EU: users in Europe
Shard APAC: users in Asia-Pacific

Pros: Data residency compliance (GDPR keeps EU data in EU). Lower latency (shard is co-located with users). Cons: Users who travel across regions. Uneven shard sizes if most users are in one region.

Directory-Based Sharding

A lookup service maps each key to its shard.

Lookup table:
  user_id 1001 -> Shard 2
  user_id 1002 -> Shard 0
  user_id 1003 -> Shard 1

Pros: Maximum flexibility — can rebalance by updating the directory. Cons: The directory is a single point of failure and a performance bottleneck. Must be cached aggressively.

Pinterest adopted a directory-based approach when migrating from a single MySQL instance to a sharded architecture. They maintain a configuration that maps each logical shard to a physical database host, allowing them to move shards between hosts for rebalancing without changing application logic. This migration was one of the most well-documented examples of scaling a consumer application from a monolithic database to a sharded fleet, and the directory approach let them do it incrementally rather than all at once.

Choosing a Shard Key

The shard key is the most important decision in a sharded system. Guidelines:

High cardinality: The key should have many possible values for even distribution.
Even distribution: Avoid keys that cluster (e.g., date-based keys concentrate writes on today's shard).
Query alignment: The most common query should include the shard key so it hits a single shard.

Real-world example: Instagram shards by a composite ID that encodes the timestamp, shard ID, and auto-increment sequence. This ensures even distribution and time-ordered IDs.

Connection Pooling

Every database connection consumes memory (typically 5-10 MB per connection in PostgreSQL). When you have hundreds of application servers, each opening multiple connections, the database runs out of memory before it runs out of CPU.

How Connection Pools Work

Without pooling:
  100 app servers x 20 connections = 2,000 connections to DB

With pooling (PgBouncer):
  100 app servers x 20 connections -> PgBouncer -> 200 actual DB connections

A connection pool maintains a set of open connections and multiplexes application requests onto them. When a request finishes, the connection returns to the pool.

Pool Types

Application-level pool: Built into the ORM or driver (HikariCP for Java, SQLAlchemy pool for Python). Each app instance has its own pool.
External proxy pool: A separate process (PgBouncer, ProxySQL) sits between app servers and the database. Consolidates connections from all app instances.

Pool Sizing

Too few connections: requests queue waiting for a free connection, adding latency. Too many connections: database is overwhelmed with context switching.

A good starting point for PostgreSQL:

optimal_connections = (2 * cpu_cores) + effective_spindle_count

For a 4-core SSD machine: (2 * 4) + 1 = 9 connections

This is per-database, not per-application-instance.

Real-World: PgBouncer at Heroku

Heroku runs PgBouncer in front of every PostgreSQL database. Without it, thousands of dynos would each open direct connections, quickly exhausting the database's connection limit. PgBouncer reduced actual database connections by 10-20x.

Query Optimization

Before adding hardware, optimize the queries hitting the database.

Index the Right Columns

Check slow query logs for full table scans. Add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.

-- Slow: full table scan
SELECT * FROM orders WHERE customer_id = 42;

-- Fast: index lookup
CREATE INDEX idx_orders_customer ON orders(customer_id);

Use EXPLAIN

Every relational database has an EXPLAIN command that shows the query execution plan.

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Look for:
- Seq Scan (sequential scan) -> needs an index
- Nested Loop on large tables -> consider a hash join
- High actual rows vs estimated rows -> statistics are stale, run ANALYZE

Avoid N+1 Queries

A common ORM anti-pattern: fetch a list of records, then loop and fetch related records one by one.

Bad (N+1):
  SELECT * FROM orders;                      -- 1 query
  SELECT * FROM items WHERE order_id = 1;    -- N queries
  SELECT * FROM items WHERE order_id = 2;
  ...

Good (JOIN or batch):
  SELECT o.*, i.* FROM orders o
  JOIN items i ON o.id = i.order_id;         -- 1 query

Denormalization

Normalization reduces redundancy but requires joins. At high scale, joins become expensive. Denormalization stores precomputed or duplicated data to avoid joins.

Normalized:
  orders table + customers table + JOIN for every query

Denormalized:
  orders table includes customer_name and customer_email
  Faster reads, but you must update all copies when customer data changes

Caching Query Results

For expensive queries that don't change often (leaderboards, reports, aggregations), cache the result in Redis with a TTL.

1. Check Redis for key "leaderboard:daily"
2. Cache hit -> return result
3. Cache miss -> run expensive query -> store in Redis with 5-minute TTL -> return

Scaling Timeline

A practical progression for database scaling as load grows:

Stage 1: Single database, optimize queries & indexes
Stage 2: Add connection pooling (PgBouncer/ProxySQL)
Stage 3: Add caching layer (Redis) for hot queries
Stage 4: Add read replicas for read-heavy workloads
Stage 5: Vertical scaling (bigger instance)
Stage 6: Shard for write-heavy workloads
Stage 7: Move to purpose-built databases for specific workloads
         (Elasticsearch for search, Redis for sessions, Cassandra for time-series)

YouTube's Vitess project is a real-world embodiment of this scaling timeline. Originally built to scale YouTube's MySQL databases, Vitess adds a proxy layer that handles connection pooling, query routing, and transparent sharding on top of standard MySQL. It allows teams to start with a single MySQL instance and incrementally shard as load grows, without rewriting application queries. Vitess is now open source and used in production by Slack, GitHub, and others as a general-purpose MySQL scaling solution.

Common Pitfalls

Sharding too early. Sharding is operationally expensive (migrations, cross-shard joins, rebalancing). Exhaust read replicas, caching, and query optimization first.
Wrong shard key. A bad shard key creates hot spots or forces cross-shard queries. Choose a key that matches your primary access pattern and distributes evenly.
Ignoring connection limits. A database with a 200-connection limit serving 50 app servers at 10 connections each is already over capacity. Use a pooler.
No read-your-writes consistency. After a write, routing the same user to a lagging replica causes confusion. Implement read-your-writes for interactive workflows.
Over-denormalizing. Denormalization speeds reads but creates update anomalies. Denormalize strategically for hot paths, not everywhere.
Missing indexes on foreign keys. Joins and cascading deletes do full table scans without indexes on foreign key columns.

Key Takeaways

Read replicas are the first scaling lever for read-heavy workloads. Understand replication lag trade-offs.
Sharding is necessary for write scaling but adds significant complexity. Choose the shard key carefully.
Connection pooling is often the cheapest way to extend a database's capacity.
Query optimization (indexes, EXPLAIN, avoiding N+1) should come before any infrastructure scaling.
Follow the scaling timeline: optimize, pool, cache, replicate, then shard.
Use the right database for the right workload. Polyglot persistence is standard at scale.