Reproduce First

You cannot fix what you cannot reproduce. This is the most violated rule in debugging. Engineers jump straight to reading code, forming theories, and making changes — without ever confirming they can reliably trigger the bug. A fix applied to a bug you cannot reproduce is a guess. You will not know if you fixed it, you will not know if you broke something else, and you will not know if it comes back.

Why Reproduction Comes First

Reproduction is not a preliminary step. It is the core of debugging. When you can reliably trigger a bug, everything else becomes easy:

Without reproduction:
  1. Read code and form a theory
  2. Make a change based on the theory
  3. Deploy the change
  4. Wait to see if the bug stops happening
  5. It does not stop happening
  6. Go back to step 1 with a different theory
  7. Repeat for days

With reproduction:
  1. Trigger the bug
  2. Read the error, inspect the state
  3. Form a theory
  4. Make a change
  5. Trigger the bug again
  6. Confirm the fix
  7. Done

The difference is not incremental. It is the difference between systematic progress and wandering in the dark.

The Minimal Reproduction

A minimal reproduction is the smallest set of steps, inputs, and conditions that triggers the bug. "Minimal" is the key word. The bug report says "the app crashes when I upload a file." That is not a reproduction. A minimal reproduction is:

Environment: macOS 14.2, Chrome 120, Node 20.10
Steps:
  1. Start the server with: npm run dev
  2. Navigate to /upload
  3. Select a PNG file larger than 5MB
  4. Click "Upload"
Expected: File uploads successfully
Actual: Server returns 500, logs show "PayloadTooLargeError"

Every detail matters. The file size matters. The file type might matter. The browser might matter. You do not know which details matter until you start removing them.

How to Minimize a Reproduction

Start with the full scenario that triggers the bug, then remove variables one at a time:

Minimization process:
  1. Start with the exact conditions from the bug report
  2. Remove one variable
  3. Does the bug still happen?
     Yes -> that variable was irrelevant, keep it removed
     No  -> that variable is part of the cause, put it back
  4. Repeat until you cannot remove anything else

Example:
  "Bug happens when user with admin role uploads a CSV
   file over 10MB on the /admin/import page after logging
   in through SSO on Firefox."

  Remove: admin role -> bug still happens -> not role-specific
  Remove: CSV format -> try PNG -> bug still happens -> not format-specific
  Remove: over 10MB -> try 1KB file -> bug goes away -> SIZE MATTERS
  Remove: /admin/import -> try /upload -> bug goes away -> ROUTE MATTERS
  Remove: SSO login -> try password login -> bug still happens -> not auth-specific
  Remove: Firefox -> try Chrome -> bug still happens -> not browser-specific

  Minimal repro: upload any file over 10MB on /admin/import

Now you know exactly where to look: the /admin/import route has a different file size limit or a different upload handler than /upload.

Environment-Specific Bugs

Some bugs only happen in certain environments. These are the hardest to reproduce and the most important to isolate.

Common environment variables:
  - Operating system (Linux vs macOS vs Windows)
  - Runtime version (Node 18 vs Node 20)
  - Database state (empty vs seeded vs production-like)
  - Configuration (dev defaults vs production settings)
  - Network conditions (localhost vs remote, fast vs slow)
  - Time and timezone (bugs that only happen at midnight UTC)
  - Concurrency (single user vs multiple simultaneous users)
  - Memory/disk (bugs that only appear under resource pressure)

The "Works on My Machine" Problem

When a bug happens in production but not locally, the difference IS the bug. Systematically compare the environments:

Comparison checklist:
  1. Runtime version (exact patch version, not just major)
  2. Environment variables (especially secrets and config)
  3. Database schema and data (migrations, seed data)
  4. Network topology (localhost vs load balancer vs CDN)
  5. File system (case sensitivity, permissions, disk space)
  6. Dependencies (lockfile differences, native modules)
  7. OS and architecture (x86 vs ARM, Linux vs macOS)

The fastest way to find the difference is to make your local environment as close to production as possible. Docker helps. Production-like seed data helps. Replaying production traffic helps most of all.

Reproducing Intermittent Bugs

Intermittent bugs — bugs that happen "sometimes" — are not random. They are deterministic bugs with a trigger you have not identified yet. Common hidden triggers:

Race conditions:
  The bug depends on the ordering of concurrent operations.
  Reproduction: run the operation in a tight loop 1000 times.
  Or add artificial delays to force a specific ordering.

Timing-dependent:
  The bug depends on how fast something happens.
  Reproduction: add network latency, slow down disk I/O,
  throttle CPU. Tools like tc (traffic control) and
  stress-ng help here.

State-dependent:
  The bug depends on accumulated state over time.
  Reproduction: do not start fresh. Run the operation
  after the system has been running for a while, or
  after specific prior operations.

Resource-dependent:
  The bug depends on memory pressure, disk space, or
  connection pool exhaustion.
  Reproduction: artificially limit resources. Run with
  less memory, fill up disk to 95%, set connection pool
  size to 1.

If you cannot reproduce an intermittent bug after exhausting these approaches, add more logging and wait. Ship instrumentation, not a guess-fix.

Reproduction in Different Contexts

Backend Bugs

Best tools for backend reproduction:
  - curl or httpie for API endpoints
  - A test database with known state
  - Docker Compose for the full service stack
  - Request replay from production logs (sanitized)
  - Unit tests that call the function directly

Write the reproduction as a test. Even before you fix the bug, write a failing test that demonstrates it. This gives you a reliable reproduction that also serves as a regression test.

Frontend Bugs

Best tools for frontend reproduction:
  - Browser DevTools (Network tab for request/response)
  - Specific browser versions (not just "Chrome")
  - Device emulation for mobile bugs
  - Slow network simulation (DevTools throttling)
  - Specific screen sizes and zoom levels
  - Accessibility tools for screen reader bugs

Frontend bugs are often viewport-specific or interaction-timing-specific. Record the exact sequence of clicks, not just "click the button."

Data Bugs

Best tools for data reproduction:
  - A snapshot of the problematic data (anonymized if needed)
  - SQL queries that create the minimal bad state
  - Seed scripts that set up the reproduction
  - Database dumps from before and after the bug

Data bugs are often the hardest because the reproduction requires specific data. If you cannot get production data, work backward from the error to construct the minimal data state that triggers it.

The Reproduction As a Test

The best reproduction is a test that fails:

# This test should pass but currently fails due to bug #1234
def test_upload_large_file_on_admin_import():
    # Setup: create a file larger than 5MB
    large_file = create_temp_file(size_mb=6)

    # Act: upload via the admin import endpoint
    response = client.post("/admin/import", files={"file": large_file})

    # Assert: should succeed
    assert response.status_code == 200

    # Currently fails with: 500 PayloadTooLargeError

This test is now your reproduction, your documentation, and your regression guard. When the test passes, the bug is fixed. When the test stays in the suite, the bug cannot come back without being caught.

When You Cannot Reproduce

Sometimes, after genuine effort, you cannot reproduce the bug. This is valuable information:

If you cannot reproduce:
  1. Add logging/instrumentation to capture more context
     next time it happens
  2. Set up alerts for the specific error pattern
  3. Ask the reporter for more details (screen recordings,
     network traces, exact timestamps)
  4. Check if it was a transient infrastructure issue
     (deploy in progress, database failover, network blip)
  5. Document what you tried and what you ruled out
  6. Do NOT push a speculative fix. Wait for more data.

A speculative fix to a bug you cannot reproduce is worse than no fix. It gives false confidence, clutters the code, and the bug will come back because you did not actually understand it.

Real-World Example: The Tuesday Bug

A team had a bug that only happened on Tuesdays. Users reported that scheduled reports were not being sent. The team spent two weeks adding retries, increasing timeouts, and improving error handling. None of it helped. The bug kept happening on Tuesdays.

Finally, someone tried to reproduce it by setting their system clock to Tuesday. They discovered that the cron expression for Tuesday was wrong — it was set to run at 25:00 instead of 02:00 (a typo in a 24-hour format). The scheduler silently skipped invalid times. The fix was changing one character.

The lesson: two weeks of speculative fixes versus one hour of actual reproduction.

Common Pitfalls

Fixing without reproducing — making changes based on theory alone. You cannot confirm your fix works if you never confirmed the bug exists in your environment.
Reproducing in the wrong environment — confirming the bug in staging when it only happens in production. The environments are different; that difference might be the cause.
Accepting "it just happens sometimes" — intermittent bugs have deterministic causes. "Sometimes" means "under conditions you have not identified yet."
Over-complicating the reproduction — trying to reproduce the exact production scenario instead of minimizing. Start big, then strip away variables.
Not writing the reproduction down — you reproduce the bug, fix it, and move on. Six months later the same bug comes back and nobody remembers how to trigger it. Write the reproduction as a test.

Key Takeaways

Reproduction is not a preliminary step. It is the main work of debugging. Without reproduction, you are guessing.
Minimal reproduction means removing every variable that is not necessary to trigger the bug. Each variable you remove narrows the cause.
Environment-specific bugs are caused by the difference between environments. Systematically compare them to find the difference.
Intermittent bugs are not random. They have deterministic triggers you have not identified — race conditions, timing, state accumulation, or resource pressure.
Write the reproduction as a failing test. It serves as reproduction, documentation, and regression prevention simultaneously.
If you cannot reproduce the bug, add instrumentation and wait. Do not push speculative fixes.