Consistency Models
Overview
Consistency models define what values a read can return in a replicated system. Stronger models are easier to reason about but more expensive to implement.
Strongest Weakest
│ │
▼ ▼
Linearizability → Sequential → Causal → Eventual
│ Consistency Consistency Consistency
│ │
└── "appears as a single copy" └── "replicas may diverge temporarily"
Linearizability (Strong Consistency)
The gold standard. Every operation appears to take effect atomically at some point between its invocation and completion. The system behaves as if there is a single copy of the data.
Real time ──────────────────────────────────────>
Client A: |── write(x=1) ──|
Client B: |── read(x) ──| must return 1
Client C: |── read(x) ──| must return 1
The write "takes effect" at some point within its duration.
All subsequent reads (in real time) must see it.
Key properties:
- Respects real-time ordering
- Once a read returns a value, all subsequent reads must return that value or a later one
- Composable: linearizable operations on different objects are each linearizable
// Linearizable read in a Raft-based system:
// Option 1: Read through the leader (readIndex)
ASYNC PROCEDURE LINEARIZABLE_READ(key) → value
// Leader confirms it's still leader by contacting a quorum
read_index ← AWAIT leader.GET_COMMIT_INDEX()
AWAIT leader.WAIT_FOR_APPLY(read_index)
RETURN state_machine.GET(key)
// Option 2: Read lease (if bounded clock skew is assumed)
ASYNC PROCEDURE LEASE_BASED_READ(key) → value
// Leader holds a time-based lease; reads are local during lease
ASSERT(leader.LEASE_VALID())
RETURN state_machine.GET(key)
Cost: Requires coordination (consensus or quorum reads). Cannot be available during network partitions (CAP theorem).
Sequential Consistency
All operations appear to execute in some sequential order, and each process's operations appear in program order. Does NOT need to respect real-time ordering.
Actual execution:
P1: write(x=1) write(x=3)
P2: write(x=2) read(x)→?
Linearizable: read must return 3 (real-time order)
Sequentially consistent: read could return 2
Valid sequential order: write(x=1), write(x=3), write(x=2), read(x)→2
(P1's operations maintain their relative order,
P2's operations maintain their relative order,
but interleaving can differ from real-time)
Used by: some multiprocessor memory models. Rarely used in distributed storage because it's still expensive but gives weaker guarantees than linearizability.
Causal Consistency
Operations that are causally related must be seen by all processes in the same order. Concurrent operations may be seen in different orders by different processes.
P1: write(x=1)
P2: read(x)→1, write(y=2) ← y=2 causally depends on x=1
All processes must see x=1 before y=2.
But if P3 independently writes z=3, different processes
may order z=3 differently relative to x=1 and y=2.
Causal violation (forbidden):
P1: write(x, "hello")
P2: read(x)→"hello", write(y, "world")
P3: read(y)→"world", read(x)→null ← violates causality!
Implementation: Vector clocks or dependency tracking. Each operation carries metadata about its causal predecessors.
Used by: MongoDB (causal sessions), COPS (Cluster of Order-Preserving Servers).
Eventual Consistency
If no new updates are made, all replicas will eventually converge to the same value. No ordering guarantees during convergence.
Write(x=1) at t=0
Replica A: x=1 x=1 x=1 x=1
Replica B: x=? x=? x=1 x=1 ← eventually propagates
Replica C: x=? x=? x=? x=1
Time ──────────────────────────>
↑
convergence
Problems in practice:
- Reads may return stale data
- Concurrent writes create conflicts (need resolution strategy)
- No upper bound on convergence time (in theory)
Used by: DNS, Amazon DynamoDB (default), Cassandra (at low consistency levels).
Strong Eventual Consistency (SEC)
Eventual consistency plus a guarantee: replicas that have received the same set of updates are in the same state, regardless of order.
Eventual: Same updates, same order → same state
Strong Eventual: Same updates, ANY order → same state
Requires commutative, associative, idempotent merge (CRDTs).
Session Guarantees
Practical guarantees scoped to a single client session. Can be layered on top of eventual consistency.
Read-Your-Writes
A client always sees its own writes.
Client C:
write(x=5) to Replica A
read(x) from Replica B → must return 5 (or later)
Implementation: track write timestamp,
route reads to replicas at least as up-to-date
Monotonic Reads
Once a client reads a value, subsequent reads never return older values.
Client C:
read(x) → v2
read(x) → must return v2 or later (never v1)
Implementation: track highest read version,
ensure subsequent reads go to sufficiently up-to-date replicas
Monotonic Writes
A client's writes are applied in order at each replica.
Client C:
write(x=1)
write(x=2)
Every replica applies write(x=1) before write(x=2)
Writes-Follow-Reads
If a client reads a value and then writes, the write is ordered after the read.
Client C:
read(x) → v3 (sees version 3)
write(y=10) (must be ordered after x=v3 at all replicas)
Tunable Consistency
Systems like Cassandra and DynamoDB allow per-operation consistency level selection.
Cassandra consistency levels:
ONE: Write/read to 1 replica (fastest, weakest)
QUORUM: Write/read to majority (balanced)
ALL: Write/read to all replicas (slowest, strongest)
LOCAL_QUORUM: Quorum within local datacenter
EACH_QUORUM: Quorum in each datacenter
Strong consistency: W + R > N
e.g., QUORUM writes + QUORUM reads with RF=3:
W=2, R=2, N=3 → 2+2=4 > 3 ✓
ENUMERATION ConsistencyLevel ← {ONE, QUORUM, ALL, LOCAL_QUORUM}
PROCEDURE REQUIRED_ACKS(level, replication_factor) → integer
IF level = ONE THEN RETURN 1
IF level = QUORUM THEN RETURN replication_factor / 2 + 1
IF level = ALL THEN RETURN replication_factor
IF level = LOCAL_QUORUM THEN RETURN replication_factor / 2 + 1
STRUCTURE WriteOperation
key: string
value: bytes
consistency: ConsistencyLevel
ASYNC PROCEDURE WRITE(op, replicas) → success or error
required ← REQUIRED_ACKS(op.consistency, LENGTH(replicas))
acks ← 0
results ← AWAIT ALL IN PARALLEL(
FOR EACH r IN replicas: r.WRITE(op.key, op.value)
)
FOR EACH result IN results DO
IF result IS success THEN
acks ← acks + 1
IF acks ≥ required THEN
RETURN success
RETURN error(InsufficientAcks)
Consistency Model Hierarchy
Linearizability
│
Sequential Consistency
│
Causal Consistency
/ │ \
Causal+ PRAM Writes-follow-reads
\ │ /
\ │ /
Read-your-writes Monotonic-reads
│
Eventual Consistency
│
(no guarantees)
↑ Stronger: easier to program, higher latency, lower availability
↓ Weaker: harder to program, lower latency, higher availability
Jepsen Testing
Kyle Kingsbury's Jepsen framework tests real databases for consistency violations under network partitions and failures.
Common findings from Jepsen tests:
- MongoDB: stale reads under default configuration
- Cassandra: lightweight transactions can lose data
- Redis Sentinel: data loss during failover
- Elasticsearch: split-brain leading to data loss
- CockroachDB: generally passes (linearizable)
- etcd: generally passes (linearizable via Raft)
Choosing a Consistency Model
| Use Case | Recommended Model | Why | |---|---|---| | Bank transfers | Linearizable | Must not lose or duplicate money | | User profile | Read-your-writes | User expects to see their changes | | Social media feed | Causal | Comments should follow posts | | DNS | Eventual | Staleness is tolerable | | Shopping cart | Strong eventual (CRDTs) | Availability over consistency | | Distributed lock | Linearizable | Mutual exclusion requires it | | Analytics/metrics | Eventual | Approximate counts are fine |
Real-World Systems
| System | Default Consistency | Strongest Available | |---|---|---| | Spanner | External consistency (linearizable+) | External consistency | | CockroachDB | Serializable | Serializable | | DynamoDB | Eventual | Strong (per-read) | | Cassandra | ONE (eventual) | ALL (linearizable for single key) | | MongoDB | Causal (sessions) | Linearizable (readConcern) | | Redis | Eventual (async replication) | N/A (not truly linearizable) |
Key Takeaways
- Linearizability is the intuitive "correct" behavior but comes with latency and availability costs.
- Causal consistency is the strongest model achievable with full availability during partitions.
- Session guarantees (read-your-writes, monotonic reads) are practical middle grounds that cover most user-facing needs.
- Tunable consistency lets you choose per-operation, but understanding the implications requires careful thought about W, R, and N.
- Always test your consistency claims. Jepsen has shown that many databases don't provide the guarantees they advertise.