4 min read
On this page

Theory of Constraints

Eliyahu Goldratt introduced the Theory of Constraints (TOC) in his 1984 novel The Goal. The premise is deceptively simple: every system has exactly one constraint that limits its overall throughput. The system can only move as fast as its slowest part. Optimizing anything other than the constraint is an illusion of progress. You feel productive. The metrics on the non-bottleneck look great. But the system output does not change. TOC gives engineers a disciplined method for finding the real bottleneck, exploiting it, and then moving on to the next one.

The Five Focusing Steps

Goldratt formalized TOC into five repeating steps:

1. IDENTIFY    - Find the constraint (the slowest step)
2. EXPLOIT     - Get maximum throughput from the constraint as-is
3. SUBORDINATE - Align everything else to support the constraint
4. ELEVATE     - Invest to increase the constraint's capacity
5. REPEAT      - The constraint has moved; find the new one

This is a cycle, not a one-time fix. The moment you elevate a constraint, something else becomes the bottleneck. The system never stops having a constraint. What changes is which part of the system it lives in.

Why Engineers Get This Wrong

Most engineering teams optimize whatever is in front of them. The frontend team makes the UI faster. The backend team refactors the API layer. The infrastructure team upgrades the database. Everyone is busy. Everyone is shipping. But if the actual constraint is the deployment pipeline taking 45 minutes, none of that work changes how fast value reaches users.

Before identifying the constraint:

  Code Review (2h) -> CI Build (30min) -> Manual QA (5 days) -> Deploy (2min)

  Total cycle time: ~5.1 days
  Actual bottleneck: Manual QA

  Team A optimizes CI from 30min to 10min.
  New total: ~5.08 days
  Improvement: 0.4%

  Team B automates 80% of QA checks.
  New total: ~1.1 days
  Improvement: 78%

Team A worked hard. Team B worked on the constraint.

Identifying the Constraint in Software Systems

Constraints show up in predictable places in engineering organizations:

Build & Deploy Pipeline

Symptom:  Engineers complain about slow feedback loops
Look for: Longest stage in CI/CD pipeline
Common:   Test suites that grew unchecked, sequential builds
          that could be parallel, manual approval gates

Code Review

Symptom:  PRs sit open for days
Look for: Review queue depth, number of qualified reviewers
Common:   One senior engineer reviews everything, no review
          rotation, unclear ownership

Decision Making

Symptom:  Work stalls waiting for "alignment" or "approval"
Look for: How many people must agree before work starts
Common:   Architecture review boards that meet monthly,
          managers who must approve every technical decision

External Dependencies

Symptom:  Teams blocked on other teams
Look for: Cross-team request queues, API contract disputes
Common:   Platform team with 30 consumers and no self-serve,
          shared database with no ownership model

Exploit Before You Elevate

Step 2 (Exploit) is the most underused step. Engineers jump straight to "elevate" — throw money or people at the problem. But exploitation means getting maximum output from the constraint without adding resources.

Constraint: QA team can test 5 features per sprint

Exploit:
  - Prioritize which features actually need QA
  - Give QA automated smoke tests so manual testing
    focuses on edge cases
  - Move QA earlier in the process (shift left)
  - Write better acceptance criteria so QA does not
    have to guess

Elevate (only after exploit is exhausted):
  - Hire more QA engineers
  - Buy better testing infrastructure
  - Build a dedicated test environment

Exploitation is almost free. Elevation costs real money and time. Do exploitation first.

Subordination: The Hardest Step

Subordination means telling non-constraints to slow down if they are creating waste. This is politically difficult. If the backend team can produce 20 features per sprint but QA can only test 5, the backend team should not produce 20. Producing 15 features that sit in a queue is not productivity — it is inventory. In software, inventory is work-in-progress: unmerged branches, untested features, unreleased code. It has carrying costs: merge conflicts, context switching, stale implementations.

Without subordination:
  Backend ships 20 features -> QA queue grows -> 15 features
  sit untested -> merge conflicts multiply -> rework increases

With subordination:
  Backend ships 5 features -> QA tests 5 features -> 5 features
  reach production -> backend uses remaining time to write
  automated tests, reduce QA burden, improve documentation

The backend team feels "slower" but the system moves faster.

Real-World Engineering Examples

Example: Microservice Latency

A request travels through five services. Service C takes 800ms. Services A, B, D, and E take 50ms each. Total latency: 1000ms.

A (50ms) -> B (50ms) -> C (800ms) -> D (50ms) -> E (50ms)

Optimizing A from 50ms to 10ms: total = 960ms (4% improvement)
Optimizing C from 800ms to 200ms: total = 400ms (60% improvement)

Always profile before optimizing. Always optimize the constraint.

Example: Hiring Pipeline

Your team needs to grow from 5 to 10 engineers. The pipeline:

Sourcing (50 candidates/week) -> Phone Screen (20/week) ->
  Technical Interview (3/week) -> Offer (2/week)

Constraint: Technical interviews (3/week)
  - Only 2 interviewers qualified
  - Each interview takes 2 hours including write-up

Exploit: Train 3 more interviewers, use structured rubrics
         to speed write-ups, batch interviews on 2 days
Subordinate: Stop sourcing 50/week when you can only
             interview 3. Reduce sourcing, increase quality.
Elevate: Hire a recruiting coordinator, build interview
         tooling

Example: Incident Response

Mean time to resolution (MTTR) is 4 hours. The breakdown:

Detection (5min) -> Alerting (2min) -> Triage (15min) ->
  Diagnosis (3h) -> Fix (30min) -> Deploy (10min)

Constraint: Diagnosis (3 hours)

Exploit: Better logging, runbooks, dashboards that surface
         the probable cause automatically
Subordinate: Do not invest in faster alerting (2min is fine)
Elevate: Build anomaly detection, invest in observability
         tooling

TOC & Work-in-Progress Limits

TOC directly influenced Kanban and lean manufacturing. WIP limits exist because of TOC: if you limit work-in-progress to match the constraint's capacity, you stop building inventory and start finishing work.

Without WIP limits:
  10 things started, 2 things finished per week
  8 things aging in the queue

With WIP limits (matched to constraint):
  3 things started, 2-3 things finished per week
  Near-zero queue, faster cycle time

Common Pitfalls

  • Optimizing non-constraints: The most common mistake. It feels productive but does not improve system throughput. Measure the whole system, not individual components.
  • Skipping exploitation: Jumping to "hire more people" or "buy better tools" before extracting full value from the current constraint. Exploitation is cheaper and faster.
  • Ignoring that constraints move: After you fix one bottleneck, another emerges. Teams celebrate the fix and stop looking. TOC is a continuous cycle.
  • Confusing busy with productive: A team at 100% utilization is not necessarily productive. If they are producing work that sits in a queue downstream, they are creating inventory, not value.
  • Political resistance to subordination: Telling a team to slow down is career-threatening advice in most organizations. Frame it as "redirect capacity to support the constraint" instead.

Key Takeaways

  • Every system has exactly one constraint at any given time. Find it before you optimize anything.
  • The five focusing steps (Identify, Exploit, Subordinate, Elevate, Repeat) are a cycle, not a checklist.
  • Exploit the constraint before investing in elevation. Get maximum throughput from what you have.
  • Subordinate non-constraints to the constraint. Overproduction upstream creates inventory and waste.
  • After fixing a constraint, it moves. Immediately look for the new one.
  • Measure system throughput, not local efficiency. A fast component feeding a slow component is waste.