Transactions

A transaction is a sequence of database operations that form a single logical unit of work. Transactions provide the ACID guarantees that make databases reliable.

ACID Properties

Atomicity

All or nothing. Either all operations in the transaction commit, or none do. If a failure occurs mid-transaction, all changes are rolled back.

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;  -- both updates take effect, or neither does

Implementation: Write-ahead logging (WAL). Changes are logged before applied. On crash: redo committed transactions, undo uncommitted ones.

Consistency

The database moves from one valid state to another. All constraints (primary keys, foreign keys, CHECK constraints) are satisfied after the transaction.

Note: Application-level consistency (business rules) is the application's responsibility. The database enforces declared constraints.

Isolation

Concurrent transactions don't interfere with each other. Each transaction appears to execute in isolation, as if it were the only transaction running.

Implementation: Concurrency control mechanisms (locks, MVCC, timestamps).

Durability

Once committed, changes survive any subsequent failure (power loss, crash, disk failure).

Implementation: WAL flushed to disk on commit. Redundant storage (RAID, replication).

Transaction States

Active → Partially Committed → Committed
  ↓                ↓
Failed → Aborted (rolled back)

Active: Transaction is executing. Partially committed: All operations done, awaiting commit (WAL flush). Committed: WAL flushed. Changes are permanent. Failed: Error occurred. Must roll back. Aborted: All changes undone. Transaction can be restarted or abandoned.

Serializability

Schedule

A schedule is an interleaving of operations from multiple concurrent transactions.

Schedule S:
T1: R(A)  W(A)            R(B)  W(B)
T2:            R(A)  W(A)             R(B)  W(B)

Conflict Serializability

Two operations conflict if they:

Belong to different transactions
Access the same data item
At least one is a write

A schedule is conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting adjacent operations.

Testing: Build a precedence graph (serialization graph). Nodes = transactions. Edge Tᵢ → Tⱼ if Tᵢ has a conflicting operation before Tⱼ's. The schedule is conflict serializable iff the graph is acyclic.

View Serializability

A weaker condition: the schedule produces the same result as some serial schedule for all reads and final writes. More permissive but harder to test (NP-complete).

Recoverability

Recoverable Schedule

A transaction Tⱼ reads data written by Tᵢ → Tᵢ must commit before Tⱼ commits. Otherwise, if Tᵢ aborts after Tⱼ commits, Tⱼ's committed result depends on rolled-back data.

Cascadeless Schedule

No transaction reads uncommitted data. Prevents cascading aborts (one abort triggering a chain of aborts).

Strict Schedule

No transaction reads OR writes uncommitted data. Simplifies recovery (undo is straightforward).

Hierarchy: Strict ⊂ Cascadeless ⊂ Recoverable.

Concurrency Control

Two-Phase Locking (2PL)

Rule: A transaction acquires all locks before releasing any lock.

Phase 1 (Growing): Acquire locks. No releases. Phase 2 (Shrinking): Release locks. No new acquisitions.

Guarantees conflict serializability.

Lock types:

Shared lock (S): For reading. Multiple transactions can hold S locks simultaneously.
Exclusive lock (X): For writing. Only one transaction can hold an X lock.

Compatibility:

     | S    | X
  S  | ✓    | ✗
  X  | ✗    | ✗

Variants of 2PL

Basic 2PL: Releases locks during the shrinking phase. Guarantees serializability but not recoverability.

Strict 2PL: Hold exclusive locks until commit/abort. Guarantees strict recoverability. The standard in practice.

Rigorous 2PL: Hold all locks until commit/abort. Simplest, strongest guarantees.

Timestamp Ordering

Assign a timestamp to each transaction at start. Operations must follow timestamp order.

Rule: If Tᵢ (older) and Tⱼ (newer) conflict, the older transaction's operation should appear first. If violated, abort and restart the younger transaction.

Thomas's Write Rule: If Tᵢ writes a value that Tⱼ (newer) has already written, ignore Tᵢ's write (it would be overwritten anyway). Improves concurrency.

MVCC (Multi-Version Concurrency Control)

Maintain multiple versions of each data item. Readers access the version consistent with their start time. Writers create new versions.

Readers never block writers. Writers never block readers.

How it works: Each row has version metadata (creation timestamp, deletion timestamp or transaction ID). A read at time T sees the latest version created before T that hasn't been deleted before T.

PostgreSQL: Each row has xmin (creating transaction) and xmax (deleting/updating transaction). Visibility check: is the creating transaction committed and before my snapshot? Is the deleting transaction not committed or after my snapshot?

Garbage collection (VACUUM in PostgreSQL): Old versions that no active transaction can see are cleaned up.

Optimistic Concurrency Control (OCC)

Assume no conflicts. Process the transaction without locking. At commit time, validate that no conflicts occurred.

Phases:

Read: Read from the database. Write to a private workspace.
Validate: Check if any concurrent transaction conflicted (wrote to data this transaction read).
Write: If validation passes, apply changes. If fails, abort and retry.

Best for: Read-heavy workloads with rare conflicts. Avoids lock overhead and deadlocks.

Isolation Levels

Transaction isolation levels and anomalies they prevent

SQL standard defines four isolation levels, trading consistency for performance.

Read Uncommitted

No isolation. Can read uncommitted data (dirty reads). Rarely used.

Read Committed

Only read committed data. Prevents dirty reads. But: a query may see different committed data if re-executed within the same transaction (non-repeatable reads).

PostgreSQL default. Each statement sees the latest committed data.

Repeatable Read

Within a transaction, a row read once will always return the same value. Prevents non-repeatable reads. But: new rows inserted by other transactions may appear (phantom reads).

PostgreSQL: Implements Repeatable Read using MVCC snapshots (no phantoms either — actually provides Snapshot Isolation).

Serializable

Full isolation. Transactions behave as if executed serially. No anomalies.

PostgreSQL: Serializable Snapshot Isolation (SSI). Detects dangerous patterns and aborts transactions to prevent anomalies.

Anomalies Summary

Anomaly	Read Uncommitted	Read Committed	Repeatable Read	Serializable
Dirty read	Possible	✗	✗	✗
Non-repeatable read	Possible	Possible	✗	✗
Phantom	Possible	Possible	Possible*	✗
Write skew	Possible	Possible	Possible	✗

*PostgreSQL's RR actually prevents phantoms (uses snapshot isolation).

Snapshot Isolation (SI)

Each transaction sees a consistent snapshot of the database as of its start time. Writes create new versions. Conflicts detected at commit time (first-committer wins).

Write skew: Two transactions read the same data and make different modifications based on what they read. Both commit successfully but the combined result violates an invariant.

Example: Two doctors on call. Rule: at least one must be on call. Both check the DB, see two on call, and take themselves off. Both commit → zero on call. Violated!

SSI (Serializable Snapshot Isolation): Detects read-write dependencies that could cause write skew. Aborts one transaction to prevent the anomaly.

Keep transactions short.
Access resources in a consistent order.
Use the lowest isolation level that's correct for your use case.
Use SELECT ... FOR UPDATE to explicitly lock rows you'll update.

Applications in CS

Web applications: Each HTTP request typically runs in one transaction. Connection pools manage transaction lifecycle.
Financial systems: Strict ACID required. Serializable isolation for account transfers.
Microservices: Distributed transactions are hard. Saga pattern (compensating transactions) is the alternative.
Analytics: Read Committed or Snapshot Isolation for consistent reporting without blocking writes.
Testing: Transactions + rollback provide clean test isolation (wrap each test in a transaction, roll back after).