Unintended Side Effects

Overview

In British India, the government wanted to reduce the cobra population. They offered a bounty for every dead cobra brought in. It worked at first. Then people started breeding cobras to collect the bounty. When the government caught on and cancelled the program, the breeders released their now-worthless cobras. The cobra population ended up larger than before.

This is the cobra effect: a well-intentioned incentive that produces the opposite of the desired outcome. In engineering, cobra effects appear whenever you create metrics, incentives, or systems without thinking about how people will adapt their behavior in response.

Goodhart's Law

The core principle behind unintended side effects was stated by Charles Goodhart: "When a measure becomes a target, it ceases to be a good measure."

The pattern:

  1. You observe a useful metric
     → Teams that write more tests tend to have fewer bugs

  2. You make it a target
     → "All teams must achieve 90% code coverage"

  3. The metric gets gamed
     → Engineers write tests that assert nothing meaningful
     → Tests pass but don't catch bugs
     → Coverage is 95% but bug rate is unchanged

  4. The metric is now useless
     → High coverage no longer correlates with fewer bugs
     → You've lost the signal AND spent engineering time on junk tests

The Cobra Effect in Engineering

Rewarding Lines of Code

Incentive: Track lines of code per developer as a productivity metric

Intended effect:
  Engineers write more code, more features get shipped

Actual effect:
  Engineers write verbose code instead of concise code
  Copy-paste increases because it inflates the count
  Refactoring (which reduces lines) is punished
  Engineers avoid deleting dead code

End result:
  The codebase grows 3x faster than feature count
  Maintenance cost skyrockets
  The most "productive" engineers created the most tech debt

Rewarding Bug Fixes

Incentive: Engineers get recognition for fixing production bugs

Intended effect:
  Bugs get fixed quickly, production is more stable

Actual effect:
  Engineers who introduce bugs get more opportunities to fix them
  Some engineers delay reporting bugs they know about so they
    can "discover" and fix them during high-visibility periods
  Prevention work (testing, code review) is unrewarded
  Engineers prefer firefighting over fire prevention

End result:
  The team that fixes the most bugs is not the most reliable —
  they're the ones introducing the most bugs

Optimizing for Deployment Frequency

Incentive: Track deployments per week as a measure of team velocity

Intended effect:
  Teams ship smaller changes more frequently, reducing risk

Actual effect:
  Teams split changes into artificially small deploys
  Config changes and whitespace fixes count as deploys
  Risky changes get bundled into "just one deploy" at the end
    of the week to keep the count up without the overhead
  Teams avoid large refactors because they count as one deploy

End result:
  Deployment count is high but meaningful throughput is unchanged
  The metric tracks busywork, not value delivery

Categories of Unintended Side Effects

Perverse Incentives

The incentive directly rewards the wrong behavior.

Metric: Mean time to resolve incidents (MTTR)
Gaming: Classify incidents as lower severity to avoid the clock
        Close incidents before they're actually resolved
        Avoid declaring incidents at all

Metric: Sprint velocity (story points completed)
Gaming: Inflate story point estimates
        Avoid taking on risky or uncertain work
        Declare stories "done" before edge cases are handled

Metric: Response time SLA (99th percentile under 500ms)
Gaming: Timeout slow requests at 499ms (they fail but don't
        count against the SLA)
        Move slow operations to async (user still waits,
        but the API responds fast)

Cobra Farming

People create the problem in order to solve it.

Example: A company rewards engineers for on-call heroics.
  → Engineers are disincentivized from fixing root causes
  → The team with the most incidents gets the most recognition
  → Reliability degrades because prevention is unrewarded

Example: A team measures "customer issues resolved per month."
  → Support engineers avoid creating self-service documentation
  → Automating solutions away reduces their metric
  → The team optimizes for a high volume of manual resolution

Example: A promotion process rewards "impact."
  → Engineers create complex systems that require ongoing heroics
  → Simple, reliable systems don't generate enough "impact stories"
  → The incentive selects for complexity over simplicity

Displacement Effects

The problem moves rather than being solved.

Example: You add strict linting rules to prevent bad code.
  → Engineers suppress lint warnings with ignore comments
  → The code is just as bad, but the linter says it's clean
  → You've moved the problem from "visible bad code" to
    "hidden bad code with suppressed warnings"

Example: You add a code review requirement for the main branch.
  → Engineers create a secondary branch with no review requirement
  → "Hotfixes" bypass the review process
  → The percentage of unreviewed code is the same, just relabeled

Example: You limit the size of pull requests to 200 lines.
  → Engineers create chains of dependent PRs
  → Each PR is under 200 lines but only makes sense in context
  → Reviewers now need to understand 5 PRs instead of 1

Designing Better Incentives

Principle 1: Measure Outcomes, Not Activities

Bad metric: Number of deploys per week
  → Measures activity, not value
  → Can be inflated without delivering anything

Better metric: Time from commit to production for user-facing changes
  → Measures the actual pipeline efficiency
  → Harder to game because it tracks real changes

Bad metric: Lines of code written
  → Measures volume, not quality

Better metric: Customer problems solved per quarter
  → Measures outcome
  → Aligns engineering effort with business value

Principle 2: Use Paired Metrics

Always pair a quantity metric with a quality metric so that gaming one degrades the other.

Pair: Deployment frequency + Change failure rate
  → Deploying junk increases frequency but also failure rate
  → You can't game both simultaneously

Pair: Code coverage + Mutation testing score
  → Writing empty tests increases coverage but not mutation score
  → Mutation testing checks if tests actually catch bugs

Pair: Sprint velocity + Customer satisfaction
  → Inflating velocity doesn't help if users are unhappy
  → Forces the team to deliver actual value

Pair: Bug fix count + Bug introduction rate
  → Fixing many bugs looks bad if you're also introducing many
  → Rewards prevention alongside response

Principle 3: Watch for Behavioral Changes

When you introduce a new metric or incentive, observe:

  Week 1-2: Baseline behavior (people haven't adapted yet)
  Week 3-4: Early optimization (people start working toward the metric)
  Month 2-3: Gaming begins (people find shortcuts)
  Month 4+:  Goodhart's Law in full effect (metric is divorced from reality)

Set a calendar reminder to review the metric at month 3.
Ask: "Is this metric still measuring what we intended?"
Look for: sudden jumps in the metric without corresponding
  improvement in the outcome you actually care about.

Principle 4: Make Gaming Harder Than Doing the Work

If it's easier to game the metric than to do the actual work,
the metric will be gamed.

Example: Code review quality
  Bad approach: Count number of review comments (easy to game:
    leave trivial comments)
  Better approach: Track bugs caught in review vs. bugs found
    in production. Harder to fake.

Example: Testing quality
  Bad approach: Require 80% code coverage (easy to game:
    write assertion-free tests)
  Better approach: Run mutation testing. Generate mutants
    automatically. If your tests don't catch them, coverage
    is meaningless. Harder to fake.

Real-World Cobra Effects Beyond Software

Understanding the pattern in other domains helps you spot it in engineering.

Hanoi rat bounty (1902):
  French colonial government paid for rat tails.
  People cut off tails and released rats to breed more.
  Rat population increased.
  Lesson: if the proof of "solving" the problem is cheap to
  manufacture, the incentive is broken.

Wells Fargo accounts (2016):
  Employees were incentivized on new accounts opened.
  Employees created millions of fake accounts.
  Lesson: if the metric can be satisfied without serving
  the customer, it will be.

Standardized testing in schools:
  Schools rewarded for high test scores.
  Teachers taught to the test instead of teaching comprehension.
  Students scored higher but learned less.
  Lesson: the metric improved but the outcome it was supposed
  to represent did not.

Spotting Cobra Effects Before They Happen

Ask these questions before implementing any metric or incentive:

  1. If someone wanted to game this metric, how would they do it?
     → The easier it is to answer, the worse the metric

  2. Can someone improve this metric without improving the
     underlying outcome?
     → If yes, the metric will diverge from reality

  3. What behavior does this incentive punish?
     → Rewarding bug fixes punishes prevention
     → Rewarding velocity punishes careful work
     → Rewarding individual output punishes collaboration

  4. What will people stop doing because of this incentive?
     → People respond to incentives by reallocating effort
     → Time spent optimizing the metric comes from somewhere

  5. What happens if this metric reaches its target?
     → If 100% coverage is reached, does quality actually improve?
     → If deployment frequency doubles, do users actually benefit?

Common Pitfalls

Assuming people will not game the metric: They will. Not out of malice, but because incentives shape behavior. People optimize for what is measured and rewarded, consciously or not.
Blaming individuals instead of the system: When someone games a metric, the problem is usually the metric, not the person. A well-designed incentive makes the easiest path the right path.
Adding more metrics to fix gaming: Each new metric creates new gaming opportunities. You end up with a dashboard of thirty metrics, none of which tell you how the team is actually doing. Fewer, better metrics beat more metrics.
Measuring what is easy instead of what matters: Lines of code, story points, and deploy counts are easy to track. Customer impact, code maintainability, and system reliability are hard to track. The easy metrics are almost always the wrong ones.
Keeping bad metrics because "we've always tracked this": If a metric is being gamed and no longer reflects reality, remove it. A useless metric is worse than no metric because it creates false confidence.

Key Takeaways

The cobra effect occurs when incentives produce the opposite of the intended outcome. In engineering, this happens with metrics like lines of code, bug fix counts, code coverage targets, and deployment frequency.
Goodhart's Law states that when a measure becomes a target, it ceases to be a good measure. Every metric you set as a goal will be optimized for, often at the expense of the actual outcome you care about.
Design incentives by measuring outcomes (customer problems solved) rather than activities (deploys per week), pairing quantity metrics with quality metrics, and explicitly asking "how would someone game this?"
Watch for behavioral changes after introducing metrics. Gaming typically appears within two to three months. Review your metrics regularly and retire the ones that have diverged from the outcomes they were meant to represent.
When you see gaming, fix the incentive, not the person. The system created the behavior. A well-designed incentive makes doing the right thing the easiest thing.