Pre-Mortems

Overview

A post-mortem asks "what went wrong?" after a failure. A pre-mortem asks "what went wrong?" before you start. The technique, developed by psychologist Gary Klein, works like this: imagine your project has already failed. Now explain why it failed.

This simple shift in framing is surprisingly powerful. It removes the optimism bias that plagues planning. When a team is asked "what could go wrong?" they downplay risks to avoid looking negative. When asked "the project failed, why?" they compete to identify the most likely cause. The framing gives people permission to be honest about risks they already see but hesitate to voice.

How to Run a Pre-Mortem

Step 1: Gather the team
  Everyone involved in the project. Include engineers, product
  managers, designers, and anyone who will be affected by the
  outcome. 5-8 people is ideal.

Step 2: Set the frame
  "It is six months from now. This project has failed completely.
  It did not ship, or it shipped and was a disaster.
  Take 5 minutes. Write down independently why it failed."

  Key word: independently. No discussion yet. This prevents
  groupthink and ensures the loudest person doesn't anchor
  everyone else's thinking.

Step 3: Collect and share
  Go around the room. Each person reads their failure reasons.
  No judgment, no rebuttals. Just collect.

Step 4: Cluster and prioritize
  Group similar failure reasons. Vote on which are most likely
  and most impactful.

Step 5: Mitigate
  For the top 3-5 risks, define concrete actions:
  → What would prevent this failure mode?
  → What early warning signal would tell us this is happening?
  → Who owns the mitigation?

Step 6: Revisit
  Check back on these risks at regular intervals (monthly or
  at each milestone). Were we right? Have new risks appeared?

Why Pre-Mortems Work Better Than Risk Assessment

Traditional risk assessment:
  "What could go wrong?"
  → People minimize risks to appear confident
  → The planning fallacy kicks in: we assume the best case
  → Risks feel hypothetical and distant
  → The team has already committed to the plan emotionally

Pre-mortem:
  "The project failed. Why?"
  → People compete to identify the most compelling failure
  → The framing breaks the optimism bias
  → Risks feel concrete because the failure has "already happened"
  → No one has to be the pessimist — everyone is explaining
    what went wrong

Research shows pre-mortems increase the ability to identify
reasons for future outcomes by 30% compared to standard
prospective thinking.

Pre-Mortem for a System Migration

Project: Migrate the payment service from a monolith to
a standalone microservice

Pre-mortem framing: "It's three months later. The migration
failed. We had to roll back to the monolith. Why?"

Team member responses (written independently):

Engineer A:
  "We didn't account for the 47 places in the monolith that
   directly query the payments table. The new service had an API
   but half the callers were still hitting the database directly."

Engineer B:
  "The data migration had a bug. Some payment records were
   duplicated. We didn't discover this until customers were
   charged twice."

Product Manager:
  "We set a deadline based on the executive presentation date,
   not based on the actual complexity. We cut testing to hit
   the deadline."

Engineer C:
  "The new service couldn't handle the traffic spike on the
   first of the month (billing day). We load tested with
   average traffic, not peak traffic."

QA Engineer:
  "We tested the happy path but not the failure paths. When
   the new service went down, the monolith had no fallback.
   In the old architecture, payments just worked because
   everything was in-process."

Top risks after clustering and voting:
  1. Hidden database dependencies (Engineer A)
  2. Data migration errors (Engineer B)
  3. Deadline-driven scope cuts (Product Manager)
  4. Load testing gaps (Engineer C)

Mitigations:
  1. Audit all database queries for payments tables before
     starting. No migration until we have a complete dependency map.
  2. Run data migration on a copy of production data first.
     Reconcile every record. Automate the reconciliation check.
  3. Define a minimum viable migration. Anything below that
     scope means we delay, not cut quality.
  4. Load test with 2x peak traffic, not average traffic.
     Run the test monthly leading up to cutover.

Pre-Mortem for a Sprint

Pre-mortems are not just for large projects. A 10-minute pre-mortem at the start of a sprint is one of the highest-value planning investments.

Sprint goal: Ship the new user onboarding flow

Pre-mortem framing: "The sprint ended. We didn't ship the
onboarding flow. Why?"

Team responses:

  "The design wasn't finalized. We started building based on
   wireframes and the final designs required significant changes
   midway through the sprint."

  "We underestimated the email integration. The email service
   has a 3-day SLA for new template approvals and we didn't
   submit the templates before sprint start."

  "The A/B testing framework had a bug that took two days to
   diagnose. We assumed it worked because the last experiment
   was months ago."

  "Two of the three engineers had on-call shifts this sprint.
   One incident took 8 hours."

Mitigations:
  → Get design sign-off before sprint starts (or timebox
    design changes to day 1-2 only)
  → Submit email templates today, before sprint officially begins
  → Test the A/B framework with a dummy experiment on day 1
  → Account for on-call load in sprint capacity planning

Pre-Mortem for a Product Launch

Project: Launch the public API for external developers

Pre-mortem framing: "The API launched three months ago. It's
considered a failure. Adoption is near zero and the few users
we have are angry. Why?"

Team responses:

  "The documentation was auto-generated from code comments.
   It was technically accurate but had no tutorials, no quick
   start guide, no example applications. Developers couldn't
   figure out how to get started."

  "We didn't version the API from day one. We made breaking
   changes in the first month. Early adopters had to rewrite
   their integrations. They posted about it publicly."

  "Rate limits were too aggressive. Developers hit the limit
   during normal usage and got cryptic 429 errors with no
   explanation of the limit or when it resets."

  "The authentication flow required six steps. Competitors
   required two. Developers gave up during onboarding."

  "We had no way to communicate with API users. When we
   had planned downtime, they found out when their calls
   started failing."

Top risks:
  1. Documentation quality
  2. API versioning
  3. Developer experience (rate limits, auth, errors)
  4. Communication channel

Mitigations:
  1. Write a quick start guide and three example apps before
     launch. Have an external developer test the docs.
  2. Version the API from day one. Define the breaking change
     policy. Commit to 6-month deprecation windows.
  3. Test the full developer experience end-to-end. Simplify
     auth. Make rate limit errors include the limit, current
     usage, and reset time.
  4. Create a developer changelog, status page, and mailing
     list before launch.

Pre-Mortem for Architecture Decisions

Decision: Adopt event-driven architecture for the order system

Pre-mortem framing: "We adopted event-driven architecture a
year ago. The team regrets it. Why?"

Team responses:

  "Debugging is a nightmare. When an order fails, we have to
   trace events across five services. There's no single place
   to see the full order lifecycle."

  "Event ordering issues cause subtle bugs. A 'payment received'
   event sometimes arrives before the 'order created' event.
   The code handles this most of the time but not always."

  "The team doesn't understand event-driven patterns. Most
   engineers came from a request-response background. They're
   building request-response patterns on top of events, getting
   the worst of both worlds."

  "We have no way to replay events. When we need to fix a bug
   in an event handler and reprocess, we have to write a
   one-off script every time."

Mitigations:
  → Build distributed tracing from day one, not after problems appear
  → Design for out-of-order events explicitly. Every handler
    must be idempotent and handle events arriving in any order.
  → Invest in training. Run workshops on event-driven patterns
    before starting implementation.
  → Build event replay capability into the infrastructure
    before writing the first event handler.

Common Pre-Mortem Failure Categories

When running a pre-mortem, prompt the team to consider these categories if they get stuck.

Category              Example failures
──────────────────────────────────────────────────────────────
Technical risk        "The third-party API we depend on has no SLA"
People risk           "The only person who knows the auth system
                       is on paternity leave during the migration"
Scope risk            "Requirements will change midway through"
Timeline risk         "The deadline assumes everything goes right"
Dependency risk       "We need the platform team to deploy X first
                       and they haven't committed to a date"
Knowledge risk        "Nobody on the team has built this before"
Operational risk      "We have no runbook for the new system"
Data risk             "Migration could corrupt or lose records"
Communication risk    "Stakeholders expect X but we're building Y"

Making Pre-Mortems a Habit

When to run a pre-mortem:

  Always:
    → Before any project longer than 2 weeks
    → Before a major migration or architectural change
    → Before a product launch

  Recommended:
    → At the start of each sprint (10-minute version)
    → Before any decision that is hard to reverse
    → When joining a project that's already in progress
      ("if this project fails from here, why?")

  Format options:
    → Full session: 30-60 minutes for major projects
    → Quick round: 10 minutes at sprint planning
    → Async: Post the prompt in Slack, collect responses
      over 24 hours, discuss the top risks synchronously

  The async format works well for distributed teams. The key
  is the independent writing step — people must write their
  own failure reasons before seeing others' responses.

Common Pitfalls

Skipping the independent writing step: If you just ask the room "why did it fail?" the first person to speak anchors everyone else. The power of pre-mortems comes from independent, parallel thinking. People must write their reasons before any discussion.
Not acting on the results: A pre-mortem that produces a list of risks and no mitigations is a waste of time. Every pre-mortem must end with specific actions assigned to specific people with specific deadlines.
Only considering technical risks: Projects fail for people reasons as often as technical reasons. Make sure the pre-mortem covers communication gaps, knowledge concentration, unrealistic timelines, and organizational dependencies.
Running it too late: A pre-mortem after the architecture is locked in and the deadline is set has limited value. Run it early enough that you can actually change the plan based on what you learn.
Treating it as a one-time event: The risks you identify at the start of a project are not the only risks. Revisit the pre-mortem results at each milestone. Add new risks as the project progresses. Some original risks will have been mitigated; new ones will have emerged.
Dismissing "unlikely" risks: The value of a pre-mortem is surfacing risks that people feel but don't voice. If someone raises a risk that seems unlikely, don't dismiss it. Discuss what the early warning signs would be and what a low-cost mitigation looks like.

Key Takeaways

A pre-mortem imagines the project has already failed and asks why. This framing breaks optimism bias and gives people permission to voice risks they already see but hesitate to raise.
The independent writing step is essential. Everyone writes down their failure reasons before discussion. This prevents groupthink and ensures diverse perspectives surface.
Apply pre-mortems to projects, sprints, migrations, architecture decisions, and product launches. Scale the format from 10-minute quick rounds to 60-minute full sessions based on the stakes.
Every pre-mortem must produce specific mitigations with owners and deadlines. A list of risks without actions is planning theater.
Revisit pre-mortem results regularly. Risks evolve as the project progresses. Some are mitigated, new ones emerge, and the likelihood of others changes based on what you learn during execution.