Unintended Side Effects
Overview
In British India, the government wanted to reduce the cobra population. They offered a bounty for every dead cobra brought in. It worked at first. Then people started breeding cobras to collect the bounty. When the government caught on and cancelled the program, the breeders released their now-worthless cobras. The cobra population ended up larger than before.
This is the cobra effect: a well-intentioned incentive that produces the opposite of the desired outcome. In engineering, cobra effects appear whenever you create metrics, incentives, or systems without thinking about how people will adapt their behavior in response.
Goodhart's Law
The core principle behind unintended side effects was stated by Charles Goodhart: "When a measure becomes a target, it ceases to be a good measure."
The pattern:
1. You observe a useful metric
→ Teams that write more tests tend to have fewer bugs
2. You make it a target
→ "All teams must achieve 90% code coverage"
3. The metric gets gamed
→ Engineers write tests that assert nothing meaningful
→ Tests pass but don't catch bugs
→ Coverage is 95% but bug rate is unchanged
4. The metric is now useless
→ High coverage no longer correlates with fewer bugs
→ You've lost the signal AND spent engineering time on junk tests
The Cobra Effect in Engineering
Rewarding Lines of Code
Incentive: Track lines of code per developer as a productivity metric
Intended effect:
Engineers write more code, more features get shipped
Actual effect:
Engineers write verbose code instead of concise code
Copy-paste increases because it inflates the count
Refactoring (which reduces lines) is punished
Engineers avoid deleting dead code
End result:
The codebase grows 3x faster than feature count
Maintenance cost skyrockets
The most "productive" engineers created the most tech debt
Rewarding Bug Fixes
Incentive: Engineers get recognition for fixing production bugs
Intended effect:
Bugs get fixed quickly, production is more stable
Actual effect:
Engineers who introduce bugs get more opportunities to fix them
Some engineers delay reporting bugs they know about so they
can "discover" and fix them during high-visibility periods
Prevention work (testing, code review) is unrewarded
Engineers prefer firefighting over fire prevention
End result:
The team that fixes the most bugs is not the most reliable —
they're the ones introducing the most bugs
Optimizing for Deployment Frequency
Incentive: Track deployments per week as a measure of team velocity
Intended effect:
Teams ship smaller changes more frequently, reducing risk
Actual effect:
Teams split changes into artificially small deploys
Config changes and whitespace fixes count as deploys
Risky changes get bundled into "just one deploy" at the end
of the week to keep the count up without the overhead
Teams avoid large refactors because they count as one deploy
End result:
Deployment count is high but meaningful throughput is unchanged
The metric tracks busywork, not value delivery
Categories of Unintended Side Effects
Perverse Incentives
The incentive directly rewards the wrong behavior.
Metric: Mean time to resolve incidents (MTTR)
Gaming: Classify incidents as lower severity to avoid the clock
Close incidents before they're actually resolved
Avoid declaring incidents at all
Metric: Sprint velocity (story points completed)
Gaming: Inflate story point estimates
Avoid taking on risky or uncertain work
Declare stories "done" before edge cases are handled
Metric: Response time SLA (99th percentile under 500ms)
Gaming: Timeout slow requests at 499ms (they fail but don't
count against the SLA)
Move slow operations to async (user still waits,
but the API responds fast)
Cobra Farming
People create the problem in order to solve it.
Example: A company rewards engineers for on-call heroics.
→ Engineers are disincentivized from fixing root causes
→ The team with the most incidents gets the most recognition
→ Reliability degrades because prevention is unrewarded
Example: A team measures "customer issues resolved per month."
→ Support engineers avoid creating self-service documentation
→ Automating solutions away reduces their metric
→ The team optimizes for a high volume of manual resolution
Example: A promotion process rewards "impact."
→ Engineers create complex systems that require ongoing heroics
→ Simple, reliable systems don't generate enough "impact stories"
→ The incentive selects for complexity over simplicity
Displacement Effects
The problem moves rather than being solved.
Example: You add strict linting rules to prevent bad code.
→ Engineers suppress lint warnings with ignore comments
→ The code is just as bad, but the linter says it's clean
→ You've moved the problem from "visible bad code" to
"hidden bad code with suppressed warnings"
Example: You add a code review requirement for the main branch.
→ Engineers create a secondary branch with no review requirement
→ "Hotfixes" bypass the review process
→ The percentage of unreviewed code is the same, just relabeled
Example: You limit the size of pull requests to 200 lines.
→ Engineers create chains of dependent PRs
→ Each PR is under 200 lines but only makes sense in context
→ Reviewers now need to understand 5 PRs instead of 1
Designing Better Incentives
Principle 1: Measure Outcomes, Not Activities
Bad metric: Number of deploys per week
→ Measures activity, not value
→ Can be inflated without delivering anything
Better metric: Time from commit to production for user-facing changes
→ Measures the actual pipeline efficiency
→ Harder to game because it tracks real changes
Bad metric: Lines of code written
→ Measures volume, not quality
Better metric: Customer problems solved per quarter
→ Measures outcome
→ Aligns engineering effort with business value
Principle 2: Use Paired Metrics
Always pair a quantity metric with a quality metric so that gaming one degrades the other.
Pair: Deployment frequency + Change failure rate
→ Deploying junk increases frequency but also failure rate
→ You can't game both simultaneously
Pair: Code coverage + Mutation testing score
→ Writing empty tests increases coverage but not mutation score
→ Mutation testing checks if tests actually catch bugs
Pair: Sprint velocity + Customer satisfaction
→ Inflating velocity doesn't help if users are unhappy
→ Forces the team to deliver actual value
Pair: Bug fix count + Bug introduction rate
→ Fixing many bugs looks bad if you're also introducing many
→ Rewards prevention alongside response
Principle 3: Watch for Behavioral Changes
When you introduce a new metric or incentive, observe:
Week 1-2: Baseline behavior (people haven't adapted yet)
Week 3-4: Early optimization (people start working toward the metric)
Month 2-3: Gaming begins (people find shortcuts)
Month 4+: Goodhart's Law in full effect (metric is divorced from reality)
Set a calendar reminder to review the metric at month 3.
Ask: "Is this metric still measuring what we intended?"
Look for: sudden jumps in the metric without corresponding
improvement in the outcome you actually care about.
Principle 4: Make Gaming Harder Than Doing the Work
If it's easier to game the metric than to do the actual work,
the metric will be gamed.
Example: Code review quality
Bad approach: Count number of review comments (easy to game:
leave trivial comments)
Better approach: Track bugs caught in review vs. bugs found
in production. Harder to fake.
Example: Testing quality
Bad approach: Require 80% code coverage (easy to game:
write assertion-free tests)
Better approach: Run mutation testing. Generate mutants
automatically. If your tests don't catch them, coverage
is meaningless. Harder to fake.
Real-World Cobra Effects Beyond Software
Understanding the pattern in other domains helps you spot it in engineering.
Hanoi rat bounty (1902):
French colonial government paid for rat tails.
People cut off tails and released rats to breed more.
Rat population increased.
Lesson: if the proof of "solving" the problem is cheap to
manufacture, the incentive is broken.
Wells Fargo accounts (2016):
Employees were incentivized on new accounts opened.
Employees created millions of fake accounts.
Lesson: if the metric can be satisfied without serving
the customer, it will be.
Standardized testing in schools:
Schools rewarded for high test scores.
Teachers taught to the test instead of teaching comprehension.
Students scored higher but learned less.
Lesson: the metric improved but the outcome it was supposed
to represent did not.
Spotting Cobra Effects Before They Happen
Ask these questions before implementing any metric or incentive:
1. If someone wanted to game this metric, how would they do it?
→ The easier it is to answer, the worse the metric
2. Can someone improve this metric without improving the
underlying outcome?
→ If yes, the metric will diverge from reality
3. What behavior does this incentive punish?
→ Rewarding bug fixes punishes prevention
→ Rewarding velocity punishes careful work
→ Rewarding individual output punishes collaboration
4. What will people stop doing because of this incentive?
→ People respond to incentives by reallocating effort
→ Time spent optimizing the metric comes from somewhere
5. What happens if this metric reaches its target?
→ If 100% coverage is reached, does quality actually improve?
→ If deployment frequency doubles, do users actually benefit?
Common Pitfalls
- Assuming people will not game the metric: They will. Not out of malice, but because incentives shape behavior. People optimize for what is measured and rewarded, consciously or not.
- Blaming individuals instead of the system: When someone games a metric, the problem is usually the metric, not the person. A well-designed incentive makes the easiest path the right path.
- Adding more metrics to fix gaming: Each new metric creates new gaming opportunities. You end up with a dashboard of thirty metrics, none of which tell you how the team is actually doing. Fewer, better metrics beat more metrics.
- Measuring what is easy instead of what matters: Lines of code, story points, and deploy counts are easy to track. Customer impact, code maintainability, and system reliability are hard to track. The easy metrics are almost always the wrong ones.
- Keeping bad metrics because "we've always tracked this": If a metric is being gamed and no longer reflects reality, remove it. A useless metric is worse than no metric because it creates false confidence.
Key Takeaways
- The cobra effect occurs when incentives produce the opposite of the intended outcome. In engineering, this happens with metrics like lines of code, bug fix counts, code coverage targets, and deployment frequency.
- Goodhart's Law states that when a measure becomes a target, it ceases to be a good measure. Every metric you set as a goal will be optimized for, often at the expense of the actual outcome you care about.
- Design incentives by measuring outcomes (customer problems solved) rather than activities (deploys per week), pairing quantity metrics with quality metrics, and explicitly asking "how would someone game this?"
- Watch for behavioral changes after introducing metrics. Gaming typically appears within two to three months. Review your metrics regularly and retire the ones that have diverged from the outcomes they were meant to represent.
- When you see gaming, fix the incentive, not the person. The system created the behavior. A well-designed incentive makes doing the right thing the easiest thing.