Consistency Models

Overview

Consistency models define what values a read can return in a replicated system. Stronger models are easier to reason about but more expensive to implement.

Strongest                                              Weakest
   │                                                      │
   ▼                                                      ▼
Linearizability → Sequential → Causal → Eventual
   │               Consistency   Consistency  Consistency
   │                                              │
   └── "appears as a single copy"                 └── "replicas may diverge temporarily"

Linearizability (Strong Consistency)

The gold standard. Every operation appears to take effect atomically at some point between its invocation and completion. The system behaves as if there is a single copy of the data.

Real time ──────────────────────────────────────>

Client A:  |── write(x=1) ──|
Client B:           |── read(x) ──|  must return 1
Client C:                     |── read(x) ──|  must return 1

The write "takes effect" at some point within its duration.
All subsequent reads (in real time) must see it.

Key properties:

Respects real-time ordering
Once a read returns a value, all subsequent reads must return that value or a later one
Composable: linearizable operations on different objects are each linearizable

// Linearizable read in a Raft-based system:
// Option 1: Read through the leader (readIndex)
ASYNC PROCEDURE LINEARIZABLE_READ(key) → value
    // Leader confirms it's still leader by contacting a quorum
    read_index ← AWAIT leader.GET_COMMIT_INDEX()
    AWAIT leader.WAIT_FOR_APPLY(read_index)
    RETURN state_machine.GET(key)

// Option 2: Read lease (if bounded clock skew is assumed)
ASYNC PROCEDURE LEASE_BASED_READ(key) → value
    // Leader holds a time-based lease; reads are local during lease
    ASSERT(leader.LEASE_VALID())
    RETURN state_machine.GET(key)

Cost: Requires coordination (consensus or quorum reads). Cannot be available during network partitions (CAP theorem).

Sequential Consistency

All operations appear to execute in some sequential order, and each process's operations appear in program order. Does NOT need to respect real-time ordering.

Actual execution:
  P1: write(x=1)          write(x=3)
  P2:        write(x=2)          read(x)→?

Linearizable: read must return 3 (real-time order)

Sequentially consistent: read could return 2
  Valid sequential order: write(x=1), write(x=3), write(x=2), read(x)→2
  (P1's operations maintain their relative order,
   P2's operations maintain their relative order,
   but interleaving can differ from real-time)

Used by: some multiprocessor memory models. Rarely used in distributed storage because it's still expensive but gives weaker guarantees than linearizability.

Causal Consistency

Operations that are causally related must be seen by all processes in the same order. Concurrent operations may be seen in different orders by different processes.

P1: write(x=1)
P2: read(x)→1, write(y=2)    ← y=2 causally depends on x=1

All processes must see x=1 before y=2.

But if P3 independently writes z=3, different processes
may order z=3 differently relative to x=1 and y=2.

Causal violation (forbidden):
  P1: write(x, "hello")
  P2: read(x)→"hello", write(y, "world")
  P3: read(y)→"world", read(x)→null    ← violates causality!

Implementation: Vector clocks or dependency tracking. Each operation carries metadata about its causal predecessors.

Used by: MongoDB (causal sessions), COPS (Cluster of Order-Preserving Servers).

Eventual Consistency

If no new updates are made, all replicas will eventually converge to the same value. No ordering guarantees during convergence.

Write(x=1) at t=0

  Replica A: x=1  x=1  x=1  x=1
  Replica B: x=?  x=?  x=1  x=1    ← eventually propagates
  Replica C: x=?  x=?  x=?  x=1

  Time ──────────────────────────>
                              ↑
                         convergence

Problems in practice:

Reads may return stale data
Concurrent writes create conflicts (need resolution strategy)
No upper bound on convergence time (in theory)

Used by: DNS, Amazon DynamoDB (default), Cassandra (at low consistency levels).

Strong Eventual Consistency (SEC)

Eventual consistency plus a guarantee: replicas that have received the same set of updates are in the same state, regardless of order.

Eventual:        Same updates, same order → same state
Strong Eventual: Same updates, ANY order  → same state

Requires commutative, associative, idempotent merge (CRDTs).

Session Guarantees

Practical guarantees scoped to a single client session. Can be layered on top of eventual consistency.

Read-Your-Writes

A client always sees its own writes.

Client C:
  write(x=5) to Replica A
  read(x) from Replica B → must return 5 (or later)

Implementation: track write timestamp,
  route reads to replicas at least as up-to-date

Monotonic Reads

Once a client reads a value, subsequent reads never return older values.

Client C:
  read(x) → v2
  read(x) → must return v2 or later (never v1)

Implementation: track highest read version,
  ensure subsequent reads go to sufficiently up-to-date replicas

Monotonic Writes

A client's writes are applied in order at each replica.

Client C:
  write(x=1)
  write(x=2)

  Every replica applies write(x=1) before write(x=2)

Writes-Follow-Reads

If a client reads a value and then writes, the write is ordered after the read.

Client C:
  read(x) → v3   (sees version 3)
  write(y=10)     (must be ordered after x=v3 at all replicas)

Tunable Consistency

Systems like Cassandra and DynamoDB allow per-operation consistency level selection.

Cassandra consistency levels:

  ONE:          Write/read to 1 replica       (fastest, weakest)
  QUORUM:       Write/read to majority        (balanced)
  ALL:          Write/read to all replicas    (slowest, strongest)
  LOCAL_QUORUM: Quorum within local datacenter
  EACH_QUORUM:  Quorum in each datacenter

  Strong consistency: W + R > N
    e.g., QUORUM writes + QUORUM reads with RF=3:
    W=2, R=2, N=3 → 2+2=4 > 3 ✓

ENUMERATION ConsistencyLevel ← {ONE, QUORUM, ALL, LOCAL_QUORUM}

PROCEDURE REQUIRED_ACKS(level, replication_factor) → integer
    IF level = ONE THEN RETURN 1
    IF level = QUORUM THEN RETURN replication_factor / 2 + 1
    IF level = ALL THEN RETURN replication_factor
    IF level = LOCAL_QUORUM THEN RETURN replication_factor / 2 + 1

STRUCTURE WriteOperation
    key: string
    value: bytes
    consistency: ConsistencyLevel

ASYNC PROCEDURE WRITE(op, replicas) → success or error
    required ← REQUIRED_ACKS(op.consistency, LENGTH(replicas))
    acks ← 0

    results ← AWAIT ALL IN PARALLEL(
        FOR EACH r IN replicas: r.WRITE(op.key, op.value)
    )

    FOR EACH result IN results DO
        IF result IS success THEN
            acks ← acks + 1
            IF acks ≥ required THEN
                RETURN success
    RETURN error(InsufficientAcks)

Consistency Model Hierarchy

                  Linearizability
                       │
              Sequential Consistency
                       │
               Causal Consistency
              /         │         \
   Causal+            PRAM        Writes-follow-reads
             \         │         /
              \        │        /
            Read-your-writes  Monotonic-reads
                       │
             Eventual Consistency
                       │
              (no guarantees)

  ↑ Stronger: easier to program, higher latency, lower availability
  ↓ Weaker:   harder to program, lower latency, higher availability

Jepsen Testing

Kyle Kingsbury's Jepsen framework tests real databases for consistency violations under network partitions and failures.

Common findings from Jepsen tests:
- MongoDB: stale reads under default configuration
- Cassandra: lightweight transactions can lose data
- Redis Sentinel: data loss during failover
- Elasticsearch: split-brain leading to data loss
- CockroachDB: generally passes (linearizable)
- etcd: generally passes (linearizable via Raft)

Choosing a Consistency Model

Use Case	Recommended Model	Why
Bank transfers	Linearizable	Must not lose or duplicate money
User profile	Read-your-writes	User expects to see their changes
Social media feed	Causal	Comments should follow posts
DNS	Eventual	Staleness is tolerable
Shopping cart	Strong eventual (CRDTs)	Availability over consistency
Distributed lock	Linearizable	Mutual exclusion requires it
Analytics/metrics	Eventual	Approximate counts are fine

Real-World Systems

System	Default Consistency	Strongest Available
Spanner	External consistency (linearizable+)	External consistency
CockroachDB	Serializable	Serializable
DynamoDB	Eventual	Strong (per-read)
Cassandra	ONE (eventual)	ALL (linearizable for single key)
MongoDB	Causal (sessions)	Linearizable (readConcern)
Redis	Eventual (async replication)	N/A (not truly linearizable)

Key Takeaways

Linearizability is the intuitive "correct" behavior but comes with latency and availability costs.
Causal consistency is the strongest model achievable with full availability during partitions.
Session guarantees (read-your-writes, monotonic reads) are practical middle grounds that cover most user-facing needs.
Tunable consistency lets you choose per-operation, but understanding the implications requires careful thought about W, R, and N.
Always test your consistency claims. Jepsen has shown that many databases don't provide the guarantees they advertise.