Consistency Models

Overview

Consistency models define the rules about when and how updates become visible to readers in a distributed system. Understanding these models is essential because every distributed system makes a consistency trade-off, whether the designers realize it or not.

Strong Consistency

Strong consistency guarantees that after a write completes, all subsequent reads will return that written value. The system behaves as if there is a single copy of the data.

How It Works

Client A writes: balance = 500
  -> Write acknowledged

Client B reads: balance
  -> Returns 500 (guaranteed)

There is no window where Client B could see the old value
after Client A's write has been acknowledged.

Implementation Approaches

Single-leader replication with synchronous followers: The leader waits for all replicas to confirm before acknowledging the write.
Consensus protocols (Raft, Paxos): A majority of nodes must agree on every write.
Two-phase commit (2PC): Coordinator ensures all participants either commit or abort.

Real-World Usage

Google Spanner provides strong consistency globally using synchronized clocks (TrueTime API) and two-phase commit. It is used for Google Ads and Google Play.

CockroachDB implements serializable isolation across distributed nodes using a combination of timestamp ordering and consensus.

Trade-Offs

Higher latency (must wait for coordination)
Lower availability during network partitions
Reduced throughput due to coordination overhead

Eventual Consistency

Eventual consistency guarantees that if no new updates are made, all replicas will eventually converge to the same value. There is no bound on how long "eventually" takes.

How It Works

Client A writes: balance = 500 (to replica 1)
  -> Write acknowledged immediately

Client B reads from replica 2: balance
  -> Might return 400 (old value)
  -> Will return 500 after replication catches up

The convergence window depends on:
  - Network latency between replicas
  - Replication lag
  - System load

Where It Works Well

Social media feeds (seeing a post a few seconds late is acceptable)
Product view counts and analytics
DNS propagation
Shopping cart contents (before checkout)

Where It Fails

Bank account balances (overdrafts from stale reads)
Inventory counts (overselling from stale stock levels)
Anything involving money or limited resources

Real-World Usage

Amazon DynamoDB defaults to eventually consistent reads (cheaper and faster) but offers strongly consistent reads at double the cost.

Cassandra is eventually consistent by default but allows tunable consistency per query.

Causal Consistency

Causal consistency guarantees that operations that are causally related are seen in the same order by all nodes. Operations with no causal relationship may be seen in different orders.

How It Works

Causally related operations:
  Client A posts: "Anyone want to grab lunch?"     (message 1)
  Client B replies: "Sure, where?"                  (message 2)

  Every reader must see message 1 before message 2.
  They are causally related (reply depends on original).

Concurrent operations (no causal relation):
  Client C posts: "Meeting moved to 3pm"            (message 3)

  Message 3 has no causal relationship to messages 1 or 2.
  It may appear before, between, or after them.

Implementation

Vector clocks: Each node maintains a logical clock. Events carry their causal history.
Lamport timestamps: Simpler than vector clocks but only capture partial ordering.
Explicit dependency tracking: Application declares which operations depend on which.

Real-World Usage

MongoDB offers causal consistency sessions. Within a session, reads always reflect prior writes from that session, even across replica set members.

COPS (Clusters of Order-Preserving Servers) from LinkedIn research demonstrated causal consistency across data centers.

Linearizability

Linearizability is the strongest single-object consistency model. Every operation appears to take effect atomically at some point between its invocation and completion. The system behaves as though there is a single copy of the data and all operations are serial.

How It Differs from Strong Consistency

Strong consistency: After a write is acknowledged, subsequent
  reads return the new value.

Linearizability adds: Operations have a real-time ordering.
  If operation A completes before operation B starts,
  B must see A's effects.

Key difference:
  Strong consistency is about a single client's view.
  Linearizability is about the global ordering of all
  clients' operations against real-time.

When You Need It

Leader election (exactly one leader at a time)
Distributed locks (mutual exclusion)
Unique constraint enforcement (no duplicate usernames)
Atomic compare-and-swap operations

Cost of Linearizability

Requires coordination on every operation
Cannot be achieved during network partitions (per CAP theorem)
Significantly higher latency than weaker models
Most applications do not need it for most operations

Real-World Usage

ZooKeeper provides linearizable writes (all go through the leader) but only sequential consistency for reads by default. Clients can request a sync operation before a read to get linearizability.

etcd provides linearizable reads and writes, which is why it is used as the coordination backbone for Kubernetes.

CAP Theorem

The CAP theorem states that a distributed system can provide at most two of three guarantees simultaneously:

C - Consistency: Every read receives the most recent write
A - Availability: Every request receives a response (not an error)
P - Partition tolerance: The system continues operating despite
    network partitions between nodes

Since network partitions are inevitable in distributed systems,
the real choice is between C and A during a partition:

CP systems: During a partition, refuse requests that cannot
  guarantee consistency. The system is unavailable to some clients.
  Examples: HBase, MongoDB (with majority write concern), etcd

AP systems: During a partition, continue serving requests but
  some responses may be stale or conflicting.
  Examples: Cassandra, DynamoDB, CouchDB

Common Misconceptions

Misconception 1: "You must pick two and give up one forever"
  Reality: The trade-off only matters during network partitions.
  In normal operation, you can have all three.

Misconception 2: "CAP means you can't have consistency and availability"
  Reality: You can't have both during a partition. Most of the
  time there are no partitions and the system provides both.

Misconception 3: "My system is AP so it's always available"
  Reality: CAP availability means every non-failing node responds.
  A single-node system can be consistent and available (no P needed).

PACELC Theorem

PACELC extends CAP by addressing the trade-off that exists even when the system is running normally (no partition).

PACELC: If Partition, choose Availability or Consistency;
        Else, choose Latency or Consistency.

PAC portion: Same as CAP - during partitions, choose A or C.

ELC portion: During normal operation (no partition), there is
  still a trade-off between latency and consistency. Stronger
  consistency requires more coordination, which adds latency.

System classifications:
  PA/EL: Prioritize availability and latency (DynamoDB, Cassandra)
  PA/EC: Available during partitions, consistent otherwise (rare)
  PC/EL: Consistent during partitions, fast otherwise (rare)
  PC/EC: Always prioritize consistency (Google Spanner, VoltDB)

Why PACELC Matters More Than CAP

Network partitions are rare in well-managed data centers. The latency vs consistency trade-off (ELC) affects every single request, every day. This is the trade-off that most impacts user experience.

Example: DynamoDB
  - During partition: chooses availability (AP)
  - Normal operation: chooses latency (EL)
  - Classification: PA/EL
  - This is why DynamoDB defaults to eventually consistent reads:
    it prioritizes low latency in the common case.

Example: Google Spanner
  - During partition: chooses consistency (CP)
  - Normal operation: still chooses consistency (EC)
  - Classification: PC/EC
  - Writes go through Paxos consensus even in normal operation.
  - Latency is higher, but correctness is guaranteed.

Tunable Consistency

Many modern databases let you choose the consistency level per operation rather than for the entire system.

Cassandra consistency levels:
  ONE:    Write/read to one replica. Fastest, least consistent.
  QUORUM: Write/read to majority. Good balance.
  ALL:    Write/read to every replica. Slowest, most consistent.

DynamoDB:
  Eventually consistent read: Cheaper, faster, may be stale.
  Strongly consistent read: 2x cost, always current.

Practical pattern:
  - Use eventual consistency for product listings (stale is OK)
  - Use strong consistency for inventory checks at checkout
  - Use strong consistency for payment processing

Real-World Consistency Choices

Amazon uses eventual consistency for the shopping cart. If you add an item on your phone and immediately check on your laptop, you might not see it for a few seconds. This is acceptable for the user experience and allows the cart service to be highly available.

Google Docs uses operational transformation with causal consistency. Edits from different users are applied in an order that preserves causality, but concurrent edits may be seen in different orders by different clients before converging.

Stripe uses strong consistency for payment processing. When a charge is created, the API does not return success until the data is durably stored and replicated.

Common Pitfalls

Assuming strong consistency is always correct: It is the most expensive option. Use it only where correctness requires it.
Ignoring consistency in the design phase: Bolting on stronger consistency later often requires architectural changes.
Confusing database consistency with application consistency: Your database may be strongly consistent, but if your application caches stale data, the user sees eventual consistency anyway.
Not testing with network delays: Inject artificial latency and partition failures in testing to verify your system's behavior under real-world conditions.
Treating CAP as a permanent binary choice: The trade-off only applies during partitions. Design your system to provide strong consistency in normal operation and degrade gracefully during failures.

Key Takeaways

Consistency models are a spectrum from eventual to linearizable. Pick the weakest model that meets your correctness requirements.
The CAP theorem is about behavior during network partitions. PACELC better captures the everyday latency vs consistency trade-off.
Tunable consistency lets you choose per-operation. Use strong consistency for critical paths and eventual consistency for everything else.
Most applications need strong consistency for a small number of operations (payments, inventory) and can tolerate eventual consistency for everything else.
Test your consistency assumptions under failure. Inject partitions and delays to verify behavior.
Consistency is a system-level property. A strongly consistent database behind an aggressive cache is effectively eventually consistent.