All Models Are Wrong

Overview

George Box, the statistician, stated it plainly: "All models are wrong, but some are useful." Every model simplifies reality. Big O notation ignores constants and lower-order terms. Architecture diagrams omit implementation details. Sprint velocity is a rough guide, not a prediction. The models we use in engineering are approximations, and that is fine as long as we remember they are approximations.

The danger is not using models. The danger is forgetting that a model is a model and treating it as reality.

What a Model Is

A model is a simplified representation of something complex.

Reality: Your production system has 47 services, 12 databases,
  message queues, caches, load balancers, CDNs, third-party APIs,
  and thousands of network connections between them.

Model: An architecture diagram with 6 boxes and 8 arrows.

The diagram is useful because it shows the major components
and their relationships. It is wrong because it omits connection
pooling, retry logic, failover paths, DNS resolution, TLS
handshakes, and everything else that affects real behavior.

The diagram is a map. The production system is the territory.
The map helps you navigate. But the map is not the territory.

Engineering Models and What They Simplify Away

Big O Notation

What it tells you:
  Binary search is O(log n). Linear search is O(n).
  Binary search scales better.

What it simplifies away:
  Constants: Binary search on a linked list is O(n log n)
    because each access is O(n). The data structure matters.
  Cache behavior: Linear search on a small array is faster
    than binary search on a large one because the array fits
    in L1 cache.
  Setup cost: For n < 20, the overhead of binary search
    (comparisons, pointer arithmetic) can exceed the cost
    of just scanning linearly.

When the simplification matters:
  You profile your "O(log n)" search and it's slower than
  the "O(n)" approach it replaced. Because n = 50 and the
  constants dominate. Big O told you about asymptotic behavior.
  Your data set is not asymptotic.

The useful model:
  Big O is the right tool for comparing algorithms at scale.
  It is the wrong tool for optimizing code where n is small
  or where constants and cache effects dominate.

Story Points and Velocity

What they tell you:
  The team completes about 40 story points per sprint.
  This feature is estimated at 8 points.
  It should take about one-fifth of a sprint.

What they simplify away:
  Story points are subjective and inconsistent across teams.
  Velocity varies sprint to sprint (sickness, on-call, context switching).
  Estimation accuracy degrades for tasks over 5 points.
  Dependencies on other teams are not captured in points.
  The definition of "done" shifts over time.

When the simplification matters:
  Management sees "velocity is 40 points" and plans 6 months
  of work assuming 40 points every two weeks. Three sprints in,
  velocity drops to 25 because the team is now working on
  unfamiliar code with more unknowns. The plan is off by 40%.

The useful model:
  Velocity is useful for rough capacity planning over the next
  1-2 sprints. It is unreliable for long-term planning because
  it treats future work as if it has the same characteristics
  as past work.

Architecture Diagrams

What they tell you:
  The frontend talks to the API gateway. The API gateway routes
  to three backend services. Each service has its own database.

What they simplify away:
  How errors propagate between services.
  What happens when one service is slow or down.
  The actual network topology (VPCs, subnets, firewalls).
  Data consistency between the three databases.
  Authentication and authorization flow.
  Logging, monitoring, and tracing infrastructure.

When the simplification matters:
  A new engineer looks at the diagram and assumes the system is
  clean and modular. In reality, Service A calls Service B which
  calls Service C which calls Service A for a different purpose.
  The diagram shows three independent services. The territory
  shows a tightly coupled system with circular dependencies.

The useful model:
  Architecture diagrams are useful for high-level communication
  and onboarding. They should include a disclaimer: "This diagram
  shows the intended architecture. The actual system may differ."
  Better yet, generate the diagram from actual traffic data.

The Test Pyramid

What it tells you:
  Write many unit tests, fewer integration tests, and even
  fewer end-to-end tests. This gives fast feedback and broad
  coverage.

What it simplifies away:
  Some systems have most of their complexity at the integration
  layer (API gateways, data pipelines, workflow engines).
  Unit testing these in isolation misses the bugs that matter.
  The "right" shape of your testing strategy depends on where
  your system's complexity lives.

When the simplification matters:
  A team building a data pipeline writes hundreds of unit tests
  for individual transformation functions. All pass. In production,
  the pipeline fails because the transformation order is wrong.
  No integration test caught it because the team followed the
  pyramid shape instead of testing where the risk actually is.

The useful model:
  The test pyramid is a useful default. Deviate from it when
  your system's risk profile doesn't match the pyramid's
  assumptions. A data pipeline might need an inverted pyramid
  with more integration tests than unit tests.

SLAs, SLOs, and SLIs

What they tell you:
  The service has 99.9% availability (three nines).
  That's 8.7 hours of downtime per year. Acceptable.

What they simplify away:
  When the downtime occurs matters. 8 hours on Christmas Day
  is different from 8 hours spread across 480 separate
  one-minute blips.
  Partial degradation is not captured. The service might
  be "up" but returning errors for 10% of users.
  Dependent services are not included. Your service is up
  but the payment provider it depends on is down.
  The measurement window matters. 99.9% this month could
  mean 100% for three weeks and 97% for one week.

When the simplification matters:
  A team achieves 99.95% availability and celebrates. But
  the 0.05% downtime occurred during peak hours for their
  largest customer, who experienced it as "the service is
  always down when we need it." The SLO was met. The
  customer experience was not.

The useful model:
  SLOs are useful for setting targets and triggering alerts.
  Complement them with distribution analysis (when does
  downtime occur?), user-segment analysis (who is affected?),
  and customer feedback (does the number match the experience?).

The Danger of Treating Models as Reality

Pattern: Model replaces reality

1. You create a model (a metric, a diagram, an estimate)
2. The model is useful, so people rely on it
3. Over time, people forget it's a model
4. Decisions are made based on the model, not reality
5. When the model and reality diverge, people trust the model

Examples:

  "Our coverage is 90%, so our code is well-tested."
  → Coverage measures lines executed, not behavior validated.

  "Our velocity is stable, so we're predictable."
  → Velocity measures completed points, not delivered value.

  "The architecture diagram shows clean separation."
  → The diagram shows intended design, not actual dependencies.

  "Big O says this is optimal."
  → Big O says this is asymptotically optimal. Your input
    size may not be in the asymptotic range.

  "Our SLO is met."
  → The SLO is an aggregate. Individual users may be suffering.

Knowing What the Model Simplifies Away

The skill is not abandoning models. It is knowing what each model does not capture.

For every model you use, know:

  1. What assumptions does it make?
     Big O assumes infinite input size and uniform cost operations.
     Story points assume consistent team composition and velocity.
     Architecture diagrams assume clean boundaries.

  2. When do those assumptions break?
     Big O breaks for small n or when cache effects dominate.
     Story points break when the team changes or faces new domains.
     Architecture diagrams break when runtime behavior doesn't
     match design-time intent.

  3. What would you need to check to validate the model?
     Big O: benchmark with real data sizes and hardware.
     Story points: compare estimates to actuals over time.
     Architecture diagrams: trace actual production traffic.

  4. What would the model fail to warn you about?
     Big O won't warn you about memory allocation patterns.
     Story points won't warn you about team morale.
     Architecture diagrams won't warn you about cascading failures.

Using Models Responsibly

Rule 1: State the model's limitations when presenting it
  Don't say: "This takes O(n log n)"
  Say: "This takes O(n log n) for the sorting step.
       For our data size of 10K records, the serialization
       cost will likely dominate."

Rule 2: Use multiple models for important decisions
  Don't rely solely on Big O for performance decisions.
  Use Big O AND benchmarks AND profiling.
  Don't rely solely on velocity for planning.
  Use velocity AND team input AND historical comparison.

Rule 3: Regularly compare the model to reality
  If your velocity model says the project takes 3 months
  and you're at month 2 with 30% complete, the model is
  wrong. Update it instead of hoping reality will catch up.

Rule 4: Replace the model when it stops being useful
  If code coverage no longer correlates with defect rates
  on your team, stop using it as a quality signal.
  Find a better model (mutation testing, defect escape rate).

Common Pitfalls

Treating the model as reality: The most fundamental error. Code coverage is not quality. Velocity is not productivity. The architecture diagram is not the system. Every model is a lossy compression of reality. Treat it as such.
Discarding models because they are imperfect: "All models are wrong" does not mean "all models are useless." Big O notation is imperfect but it correctly guided billions of algorithm choices. The right response to an imperfect model is to understand its limitations, not to abandon it.
Using a single model for complex decisions: No single model captures enough of reality for a high-stakes decision. Use multiple models (quantitative and qualitative, theoretical and empirical) and look for where they agree and disagree.
Not updating models as context changes: A model that was accurate six months ago may be misleading today. Team composition changed, traffic patterns shifted, requirements evolved. Models have expiration dates. Review them regularly.
Confusing precision with accuracy: A model that says "this will take 47.3 story points" is precise but not accurate. Precision implies confidence that the model does not warrant. Round numbers (about 50 points, roughly 3 months) more honestly represent the model's uncertainty.

Key Takeaways

All models are simplifications. Big O ignores constants, architecture diagrams omit details, velocity ignores context, and SLOs aggregate away individual experience. This is what makes models useful and also what makes them wrong.
The danger is forgetting a model is a model. When decisions are made on the model instead of reality (shipping because coverage is 90%, planning based on stable velocity), the model has replaced the territory.
For every model you rely on, know its assumptions, when those assumptions break, and what it cannot warn you about. This knowledge is the difference between using a model and being used by one.
Use multiple models for important decisions. Where they agree, you have signal. Where they disagree, you have discovered something important about the problem.
Models have expiration dates. A model that was useful in one context may be misleading in another. Review your models regularly and replace them when they stop correlating with the outcomes you care about.