Data Partitioning

Data partitioning splits a dataset across multiple nodes so that no single node holds everything. It is the primary mechanism for scaling write-heavy workloads and managing datasets that exceed a single machine's storage or memory. Done well, partitioning is invisible to users. Done poorly, it creates hot spots, cross-partition bottlenecks, and operational headaches.

Why Partition

Write scalability. A single database node has a finite write throughput. Partitioning distributes writes across N nodes.
Storage capacity. When data exceeds one machine's disk, partitioning spreads it across many.
Query performance. Queries that target a single partition read less data than queries scanning the full dataset.
Isolation. A problem on one partition (slow query, hardware failure) doesn't affect others.

Partition Strategies

Hash Partitioning

Apply a hash function to the partition key and assign the result to a partition.

partition = hash(key) % num_partitions

key "order:5001" -> hash -> 82734 % 8 = 6 -> Partition 6
key "order:5002" -> hash -> 11209 % 8 = 1 -> Partition 1

Pros: Even distribution regardless of key patterns. Cons: Range queries require scatter-gather across all partitions. Adding or removing partitions remaps most keys (unless using consistent hashing).

Range Partitioning

Assign contiguous ranges of the partition key to each partition.

Partition 0: keys "a" - "f"
Partition 1: keys "g" - "m"
Partition 2: keys "n" - "s"
Partition 3: keys "t" - "z"

Pros: Range queries on the partition key hit only the relevant partitions. Data is naturally ordered within each partition. Cons: Uneven distribution if the key space is skewed. Sequential keys (timestamps, auto-increment IDs) concentrate writes on one partition.

List Partitioning

Assign specific key values to specific partitions.

Partition US: country IN ('US', 'CA', 'MX')
Partition EU: country IN ('DE', 'FR', 'GB', 'IT', 'ES')
Partition APAC: country IN ('JP', 'KR', 'AU', 'IN')

Pros: Full control over placement. Good for data residency requirements. Cons: Manual management. Uneven partition sizes.

Composite Partitioning

Combine strategies. A common pattern is range-then-hash: first range-partition by date, then hash-partition within each date range.

Partition by month (range), then by user_id (hash) within each month.

January + hash(user_id) % 4 -> one of 4 January sub-partitions
February + hash(user_id) % 4 -> one of 4 February sub-partitions

This supports time-range queries (which month) while distributing load within each range (hash).

Consistent Hashing

Standard hash partitioning (hash mod N) has a major flaw: changing N (adding or removing nodes) remaps nearly all keys. Consistent hashing fixes this.

How It Works

Map the hash output space to a ring (0 to 2^32 - 1).
Place each node at one or more points on the ring (using virtual nodes).
To find the partition for a key, hash the key and walk clockwise to the first node.

Ring: 0 -------- Node A -------- Node B -------- Node C -------- 0

Key X hashes to a point between A and B -> assigned to Node B
Key Y hashes to a point between B and C -> assigned to Node C

Adding a Node

When Node D is added between A and B, only the keys between A and D are remapped (from B to D). All other keys stay put.

Before: 0 -- A ---------- B -------- C -- 0
After:  0 -- A ---- D ---- B -------- C -- 0

Only keys in range (A, D] move from B to D

Virtual Nodes

With only one point per node on the ring, distribution can be uneven. Virtual nodes place each physical node at many points (e.g., 150 virtual nodes per physical node), creating a more uniform distribution.

Real-World: Amazon DynamoDB

DynamoDB uses consistent hashing to distribute partition keys across storage nodes. When capacity is added, only adjacent partitions are split. This allows DynamoDB to scale storage and throughput without redistributing the entire dataset.

Real-World: Apache Cassandra

Cassandra uses consistent hashing with virtual nodes (vnodes). Each physical node owns many token ranges on the ring. When a new node joins, it takes over a portion of each existing node's token ranges, distributing the rebalancing work evenly.

Hot Spots

A hot spot is a partition that receives disproportionately more traffic than others.

Causes

Celebrity problem: One key (a popular user, a viral post) gets millions of reads or writes.
Temporal skew: Time-based partition keys concentrate all current writes on the "now" partition.
Poor key choice: Low-cardinality keys (country, status) create few, large partitions.

Mitigations

Key salting: Append a random suffix to hot keys to spread them across partitions.

Without salting:
  key "celeb:12345" -> always hits Partition 3

With salting:
  key "celeb:12345:0" -> Partition 3
  key "celeb:12345:1" -> Partition 7
  key "celeb:12345:2" -> Partition 1
  (Read all 3, merge results)

Write sharding with read fan-out: Split a hot counter across N sub-counters on different partitions. To read, sum all N. To write, pick one at random.

Dedicated partition: Move extremely hot data to its own high-capacity node.

Caching: Put a cache in front of the hot partition. Reads hit the cache; only writes reach the partition.

Rebalancing

As data grows or traffic patterns change, partitions become uneven. Rebalancing redistributes data to restore balance.

Fixed Number of Partitions

Create many more partitions than nodes at the start (e.g., 1000 partitions across 10 nodes = 100 partitions per node). When a node is added, it takes partitions from existing nodes. No data needs to be re-partitioned, only moved.

Before (10 nodes, 1000 partitions):
  Node 1: partitions 0-99
  Node 2: partitions 100-199
  ...

After adding Node 11:
  Each existing node gives ~9 partitions to Node 11
  Node 11: ~90 partitions spread across the key space

Used by: Elasticsearch, Riak, Couchbase, early Kafka.

Dynamic Partitioning

Start with one partition. When it exceeds a size threshold, split it into two. When it shrinks, merge with a neighbor.

Used by: HBase, DynamoDB (auto-split).

Pros: Adapts to data volume automatically. Cons: A brand-new table starts with one partition (bottleneck). HBase supports pre-splitting to mitigate this.

Partition-Proportional

The number of partitions is proportional to the number of nodes. Each node holds a fixed number of partitions. When a node joins, it splits some existing partitions and takes half.

Used by: Cassandra with vnodes.

Partition-Tolerant Design

Design your application to work correctly even when partitioning introduces complexity.

Single-Partition Queries

Structure data so the most common queries hit a single partition. This means choosing a partition key that matches the primary access pattern.

Chat application:
  Partition key: channel_id
  All messages in a channel are on the same partition
  "Get recent messages for channel X" -> single partition query

Cross-Partition Queries

When a query must span partitions, use scatter-gather: send the query to all relevant partitions in parallel, then merge results.

"Find all orders across all users in the last hour"
  -> Fan out to all partitions
  -> Each partition returns its results
  -> Coordinator merges and returns

Cross-partition queries are expensive. If they are frequent, consider maintaining a secondary index or materialized view that is partitioned differently.

Secondary Indexes on Partitioned Data

Two approaches:

Local index (document-partitioned): Each partition maintains its own index covering only its data. Reads require scatter-gather across all partitions.
Global index (term-partitioned): The index itself is partitioned by the indexed term. Reads hit a single index partition, but writes must update the index partition (which may be on a different node).

Common Pitfalls

Choosing a low-cardinality partition key. A key with only 10 possible values gives you at most 10 partitions. Choose keys with millions of unique values.
Ignoring access patterns. A partition key that distributes data evenly but forces every query to scatter-gather is worse than a slightly uneven key that supports single-partition queries.
No plan for hot spots. Even a good partition key can have outliers. Monitor partition-level metrics and have a mitigation strategy (salting, caching, dedicated partitions).
Rebalancing during peak traffic. Moving data between nodes consumes network and disk I/O. Schedule rebalancing during low-traffic windows or use throttled background processes.
Assuming partitions are equal. Data and traffic are almost never perfectly even. Monitor per-partition size and latency.
Cross-partition transactions. Distributed transactions across partitions (two-phase commit) are slow and fragile. Design to avoid them when possible.

Key Takeaways

Hash partitioning gives even distribution; range partitioning supports range queries. Choose based on your access pattern.
Consistent hashing minimizes data movement when nodes are added or removed. Virtual nodes improve distribution.
Hot spots are inevitable. Plan for them with salting, caching, or dedicated capacity.
Design queries to hit a single partition whenever possible. Cross-partition queries are expensive.
Rebalancing strategy matters as much as the initial partitioning scheme. Prefer fixed-partition-count approaches for operational simplicity.
The partition key is the most important design decision. It determines query efficiency, load distribution, and scalability ceiling.