Database Scaling
The database is the most common bottleneck in a growing system. Application servers are easy to scale horizontally — they are stateless. Databases hold state, which makes scaling them fundamentally harder. This file covers the techniques for scaling databases from a single instance to a globally distributed fleet.
Read Replicas
The simplest database scaling technique: replicate data from a primary (read-write) node to one or more secondary (read-only) nodes.
How It Works
Writes -> Primary DB -> Replication stream -> Replica 1 (reads)
-> Replica 2 (reads)
-> Replica 3 (reads)
The primary handles all writes. Replicas receive a continuous stream of changes and serve read queries. Since most applications are read-heavy (90%+ reads), replicas absorb the majority of query load. Facebook, for example, runs MySQL with a single primary and dozens of read replicas per cluster, routing the vast majority of their social graph reads to replicas while all writes go through the primary.
Replication Modes
- Asynchronous: The primary doesn't wait for replicas to confirm. Fastest writes but replicas may lag behind (replication lag).
- Synchronous: The primary waits for at least one replica to confirm before acknowledging the write. Stronger consistency but higher write latency.
- Semi-synchronous: The primary waits for one replica (not all) to confirm. A practical middle ground used by MySQL's semi-sync replication.
Handling Replication Lag
Asynchronous replicas can be seconds behind the primary. This causes stale reads:
1. User updates their profile (write to primary)
2. User refreshes the page (read from replica)
3. Replica hasn't received the update yet -> user sees old data
Solutions:
- Read-your-writes consistency: After a write, route that user's reads to the primary for a short window.
- Causal consistency: Track the write's position in the replication log. Read from a replica only after it has caught up to that position.
- Accept the lag: For non-critical reads (dashboards, feeds), a few seconds of staleness is often fine.
When Read Replicas Aren't Enough
Read replicas don't help with write-heavy workloads. If the primary is the bottleneck because of write volume, you need sharding.
Sharding Strategies
Sharding splits data across multiple independent database instances (shards). Each shard holds a subset of the data.
Hash-Based Sharding
Apply a hash function to the shard key, then mod by the number of shards.
shard_id = hash(user_id) % num_shards
user_id 1001 -> hash -> 7432 % 4 = 0 -> Shard 0
user_id 1002 -> hash -> 2891 % 4 = 3 -> Shard 3
user_id 1003 -> hash -> 5614 % 4 = 2 -> Shard 2
Pros: Even distribution across shards. Cons: Adding or removing shards remaps most keys. Range queries span all shards. Use consistent hashing to reduce remapping (see 03-data-partitioning).
Instagram uses a variant of hash-based sharding for their PostgreSQL databases, where each logical shard is mapped to one of several thousand PostgreSQL databases. Their shard IDs are embedded directly into generated primary keys (alongside a timestamp and auto-increment sequence), so the application can determine which shard holds a record just by looking at its ID -- no lookup service required.
Range-Based Sharding
Assign contiguous ranges of the shard key to each shard.
Shard 0: user_id 1 - 1,000,000
Shard 1: user_id 1,000,001 - 2,000,000
Shard 2: user_id 2,000,001 - 3,000,000
Pros: Range queries on the shard key are efficient (hit one shard). Easy to understand. Cons: Uneven distribution if key ranges have different densities. New users all hit the latest shard, creating a hot spot.
Geographic Sharding
Shard by region or country.
Shard US: users in North America
Shard EU: users in Europe
Shard APAC: users in Asia-Pacific
Pros: Data residency compliance (GDPR keeps EU data in EU). Lower latency (shard is co-located with users). Cons: Users who travel across regions. Uneven shard sizes if most users are in one region.
Directory-Based Sharding
A lookup service maps each key to its shard.
Lookup table:
user_id 1001 -> Shard 2
user_id 1002 -> Shard 0
user_id 1003 -> Shard 1
Pros: Maximum flexibility — can rebalance by updating the directory. Cons: The directory is a single point of failure and a performance bottleneck. Must be cached aggressively.
Pinterest adopted a directory-based approach when migrating from a single MySQL instance to a sharded architecture. They maintain a configuration that maps each logical shard to a physical database host, allowing them to move shards between hosts for rebalancing without changing application logic. This migration was one of the most well-documented examples of scaling a consumer application from a monolithic database to a sharded fleet, and the directory approach let them do it incrementally rather than all at once.
Choosing a Shard Key
The shard key is the most important decision in a sharded system. Guidelines:
- High cardinality: The key should have many possible values for even distribution.
- Even distribution: Avoid keys that cluster (e.g., date-based keys concentrate writes on today's shard).
- Query alignment: The most common query should include the shard key so it hits a single shard.
Real-world example: Instagram shards by a composite ID that encodes the timestamp, shard ID, and auto-increment sequence. This ensures even distribution and time-ordered IDs.
Connection Pooling
Every database connection consumes memory (typically 5-10 MB per connection in PostgreSQL). When you have hundreds of application servers, each opening multiple connections, the database runs out of memory before it runs out of CPU.
How Connection Pools Work
Without pooling:
100 app servers x 20 connections = 2,000 connections to DB
With pooling (PgBouncer):
100 app servers x 20 connections -> PgBouncer -> 200 actual DB connections
A connection pool maintains a set of open connections and multiplexes application requests onto them. When a request finishes, the connection returns to the pool.
Pool Types
- Application-level pool: Built into the ORM or driver (HikariCP for Java, SQLAlchemy pool for Python). Each app instance has its own pool.
- External proxy pool: A separate process (PgBouncer, ProxySQL) sits between app servers and the database. Consolidates connections from all app instances.
Pool Sizing
Too few connections: requests queue waiting for a free connection, adding latency. Too many connections: database is overwhelmed with context switching.
A good starting point for PostgreSQL:
optimal_connections = (2 * cpu_cores) + effective_spindle_count
For a 4-core SSD machine: (2 * 4) + 1 = 9 connections
This is per-database, not per-application-instance.
Real-World: PgBouncer at Heroku
Heroku runs PgBouncer in front of every PostgreSQL database. Without it, thousands of dynos would each open direct connections, quickly exhausting the database's connection limit. PgBouncer reduced actual database connections by 10-20x.
Query Optimization
Before adding hardware, optimize the queries hitting the database.
Index the Right Columns
Check slow query logs for full table scans. Add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
-- Slow: full table scan
SELECT * FROM orders WHERE customer_id = 42;
-- Fast: index lookup
CREATE INDEX idx_orders_customer ON orders(customer_id);
Use EXPLAIN
Every relational database has an EXPLAIN command that shows the query execution plan.
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
Look for:
- Seq Scan (sequential scan) -> needs an index
- Nested Loop on large tables -> consider a hash join
- High actual rows vs estimated rows -> statistics are stale, run ANALYZE
Avoid N+1 Queries
A common ORM anti-pattern: fetch a list of records, then loop and fetch related records one by one.
Bad (N+1):
SELECT * FROM orders; -- 1 query
SELECT * FROM items WHERE order_id = 1; -- N queries
SELECT * FROM items WHERE order_id = 2;
...
Good (JOIN or batch):
SELECT o.*, i.* FROM orders o
JOIN items i ON o.id = i.order_id; -- 1 query
Denormalization
Normalization reduces redundancy but requires joins. At high scale, joins become expensive. Denormalization stores precomputed or duplicated data to avoid joins.
Normalized:
orders table + customers table + JOIN for every query
Denormalized:
orders table includes customer_name and customer_email
Faster reads, but you must update all copies when customer data changes
Caching Query Results
For expensive queries that don't change often (leaderboards, reports, aggregations), cache the result in Redis with a TTL.
1. Check Redis for key "leaderboard:daily"
2. Cache hit -> return result
3. Cache miss -> run expensive query -> store in Redis with 5-minute TTL -> return
Scaling Timeline
A practical progression for database scaling as load grows:
Stage 1: Single database, optimize queries & indexes
Stage 2: Add connection pooling (PgBouncer/ProxySQL)
Stage 3: Add caching layer (Redis) for hot queries
Stage 4: Add read replicas for read-heavy workloads
Stage 5: Vertical scaling (bigger instance)
Stage 6: Shard for write-heavy workloads
Stage 7: Move to purpose-built databases for specific workloads
(Elasticsearch for search, Redis for sessions, Cassandra for time-series)
YouTube's Vitess project is a real-world embodiment of this scaling timeline. Originally built to scale YouTube's MySQL databases, Vitess adds a proxy layer that handles connection pooling, query routing, and transparent sharding on top of standard MySQL. It allows teams to start with a single MySQL instance and incrementally shard as load grows, without rewriting application queries. Vitess is now open source and used in production by Slack, GitHub, and others as a general-purpose MySQL scaling solution.
Common Pitfalls
- Sharding too early. Sharding is operationally expensive (migrations, cross-shard joins, rebalancing). Exhaust read replicas, caching, and query optimization first.
- Wrong shard key. A bad shard key creates hot spots or forces cross-shard queries. Choose a key that matches your primary access pattern and distributes evenly.
- Ignoring connection limits. A database with a 200-connection limit serving 50 app servers at 10 connections each is already over capacity. Use a pooler.
- No read-your-writes consistency. After a write, routing the same user to a lagging replica causes confusion. Implement read-your-writes for interactive workflows.
- Over-denormalizing. Denormalization speeds reads but creates update anomalies. Denormalize strategically for hot paths, not everywhere.
- Missing indexes on foreign keys. Joins and cascading deletes do full table scans without indexes on foreign key columns.
Key Takeaways
- Read replicas are the first scaling lever for read-heavy workloads. Understand replication lag trade-offs.
- Sharding is necessary for write scaling but adds significant complexity. Choose the shard key carefully.
- Connection pooling is often the cheapest way to extend a database's capacity.
- Query optimization (indexes, EXPLAIN, avoiding N+1) should come before any infrastructure scaling.
- Follow the scaling timeline: optimize, pool, cache, replicate, then shard.
- Use the right database for the right workload. Polyglot persistence is standard at scale.