Deadlocks

A deadlock occurs when a set of processes are each waiting for a resource held by another process in the set — no process can proceed.

Deadlock Conditions (Coffman, 1971)

All four conditions must hold simultaneously for a deadlock:

Mutual exclusion: At least one resource is non-sharable (only one process can use it at a time).
Hold and wait: A process holding resources is waiting to acquire additional resources.
No preemption: Resources cannot be forcibly taken from a process — only voluntarily released.
Circular wait: A circular chain of processes, each waiting for a resource held by the next.

To prevent deadlock: Break any one of these conditions.

Resource Allocation Graph

A directed graph showing resource allocations and requests:

Process nodes (circles): P₁, P₂, ...
Resource nodes (rectangles): R₁, R₂, ... (with dots for instances)
Request edge: Pᵢ → Rⱼ (process requests resource)
Assignment edge: Rⱼ → Pᵢ (resource assigned to process)

P1 → R1 → P2 → R2 → P1     (DEADLOCK — cycle!)

With single-instance resources: Cycle in RAG ⟺ deadlock.

With multi-instance resources: Cycle is necessary but not sufficient. Need further analysis.

Deadlock Prevention

Break one of the four conditions by design.

Break Mutual Exclusion

Make resources sharable when possible (e.g., read-only files). Not always possible (printers, write locks).

Break Hold and Wait

Option 1: Request all resources at once before starting. Problem: Low utilization (resources held but unused). May not know all needs in advance.

Option 2: Release all held resources before requesting new ones. Problem: May lose work.

Break No Preemption

If a process can't get a resource, force it to release all held resources. Retry later.

Works for resources whose state can be saved/restored (CPU registers, memory pages). Doesn't work for printers, database locks.

Break Circular Wait

Total ordering: Assign a global order to all resource types. Processes must request resources in increasing order only.

Resources: R1 < R2 < R3 < R4
Process must acquire R1 before R2, R2 before R3, etc.
Can't hold R3 and request R1 → no circular wait possible.

Most practical prevention strategy. Lock ordering discipline (e.g., "always acquire mutex A before mutex B").

Deadlock Avoidance

Allow all four conditions but use runtime checks to ensure the system never enters an unsafe state.

Safe State

A state is safe if there exists a sequence of processes that can all finish (each process can eventually get all its needed resources).

Unsafe ≠ Deadlocked: Unsafe means deadlock is possible, not that it has occurred.

Banker's Algorithm (Dijkstra, 1965)

For multiple resource types with multiple instances.

Data structures:

Available[j]: Available instances of resource j
Max[i][j]: Maximum demand of process i for resource j
Allocation[i][j]: Currently allocated to process i
Need[i][j] = Max[i][j] - Allocation[i][j]

Safety check: Find an order to satisfy all processes.

1. Work = Available; Finish = [false, ..., false]
2. Find i where Finish[i] == false and Need[i] ≤ Work
3. If found: Work += Allocation[i]; Finish[i] = true; goto 2
4. If all Finish[i] == true → SAFE. Otherwise → UNSAFE.

Resource request algorithm: When process i requests resources:

Check Request ≤ Need (not asking more than declared max).
Check Request ≤ Available (resources available).
Pretend to allocate. Check if resulting state is safe (run safety algorithm).
If safe → grant. If unsafe → deny (process must wait).

Time: O(m × n²) where m = resource types, n = processes. Per request.

Limitations: Requires knowing maximum resource needs in advance. Conservative (may deny safe requests). Not practical for general-purpose OS (too expensive, max needs unknown).

Deadlock Detection

Allow deadlocks to occur, then detect and recover.

Detection for Single-Instance Resources

Build a wait-for graph (derived from RAG by removing resource nodes). Periodically check for cycles. Cycle = deadlock.

Time: O(V + E) using DFS.

Detection for Multi-Instance Resources

Similar to Banker's safety algorithm but applied to the current state (not a hypothetical state).

1. Work = Available; Finish[i] = true if Allocation[i] == 0
2. Find i where Finish[i] == false and Request[i] ≤ Work
3. If found: Work += Allocation[i]; Finish[i] = true; goto 2
4. Processes with Finish[i] == false are deadlocked.

When to Run Detection

On every resource request: Immediate detection. High overhead (O(n²) per request).
Periodically (e.g., every 5 minutes): Lower overhead. Deadlock persists until detected.
When CPU utilization drops: Deadlock reduces utilization. Detect when utilization falls below threshold.

Deadlock Recovery

Process Termination

Abort all deadlocked processes: Simple but loses all work.
Abort one at a time: Abort the "cheapest" process (least progress, fewest resources, lowest priority). Re-run detection after each abort. Higher overhead but more targeted.

Resource Preemption

Forcibly take resources from a deadlocked process.

Challenges:

Select victim: Which process to preempt? Minimize cost.
Rollback: Preempted process must be rolled back (checkpointed state).
Starvation: Ensure the same process isn't always the victim.

Livelock

Processes keep changing state in response to each other but make no progress. Like two people trying to pass each other in a hallway, each stepping aside the same way.

Example: Two processes keep trying and backing off:

P1: try lock A → fail, release. Try again → fail, release...
P2: try lock A → fail, release. Try again → fail, release...

Solution: Randomized backoff (Ethernet CSMA/CD uses exponential backoff to resolve contention).

Starvation

A process is perpetually denied resources while others proceed. Not a deadlock (other processes ARE making progress), but unfair.

Solutions: Aging (increase priority over time), fair queuing, bounded waiting guarantees.

Priority Inversion

A high-priority process is blocked by a low-priority process holding a needed resource. A medium-priority process preempts the low-priority process, indirectly blocking the high-priority one.

Timeline:
Low:    [lock A] [running...     ] [release A]
Medium:          [preempts Low!...running...]
High:       [wants A...blocked........blocked...]

Famous case: Mars Pathfinder (1997) experienced priority inversion causing system resets.

Priority inheritance: Temporarily boost the low-priority process's priority to that of the highest-priority blocked process. The low-priority process runs at high priority until it releases the lock.

Priority ceiling: Each lock has a priority ceiling (highest priority of any process that might acquire it). A process can only acquire a lock if its priority is higher than the ceiling of all locks held by other processes.

Practical Deadlock Handling

Strategy	Cost	Used When
Prevention (lock ordering)	Design-time discipline	Systems programming, databases
Avoidance (Banker's)	Runtime overhead	Rarely (too expensive/requires max info)
Detection + Recovery	Periodic overhead	Databases (transaction rollback)
Ignore (ostrich algorithm)	Zero overhead	Most general-purpose OS (UNIX, Windows)

Ostrich algorithm: Pretend deadlocks don't happen. Reboot if they do. Used by most OS because deadlocks are rare and the cost of prevention/detection isn't justified.

Databases: Use detection + recovery (roll back transactions). Transactions are designed to be restartable.

Applications in CS

Databases: Deadlock detection with waits-for graph. Victim selection and transaction rollback.
Operating systems: Lock ordering conventions in kernel code. Linux lockdep tool detects potential deadlocks at runtime.
Distributed systems: Distributed deadlock detection (more complex — no global state). Timeouts as a pragmatic approach.
Concurrent programming: Lock ordering discipline. Try-lock with backoff. Lock-free alternatives.
Resource managers: Thread pools, connection pools — avoid deadlock through careful resource ordering and timeouts.