Avoiding Stupidity

Overview

It is easier to avoid being stupid than to be brilliant. Warren Buffett keeps a "too hard" pile on his desk. When a problem is excessively complex, he does not try harder. He puts it in the pile and moves on to problems where he has an advantage.

In engineering, this translates to a simple principle: don't use distributed transactions if you can avoid them. Don't build real-time if batch works. Don't optimize if the code is only called once. Don't introduce complexity when simplicity solves the problem.

Avoiding mistakes compounds into success. A team that consistently avoids stupid decisions will outperform a team that occasionally makes brilliant ones but regularly makes avoidable errors.

The Asymmetry of Mistakes and Brilliance

Brilliant move:
  You design an elegant distributed caching system that handles
  invalidation perfectly. It saves 200ms on every request.
  Impact: moderate positive. Users are slightly happier.

Stupid move:
  You deploy the caching system without a kill switch. A bug
  causes it to serve stale data for 4 hours. Customers see
  wrong account balances. Trust is damaged.
  Impact: severe negative. Front-page news.

The math:
  The upside of brilliance is bounded.
  The downside of stupidity is unbounded.
  Avoiding the stupid move matters more than making the brilliant one.

This asymmetry exists because users barely notice when things are fast and correct, but they absolutely notice when things are wrong. A 200ms improvement is invisible. A wrong account balance is unforgettable.

The "Too Hard" Pile

Not every problem is worth solving. Some problems are better avoided entirely.

Problem: We need exactly-once delivery in our message queue
Too hard pile: Can we design the consumer to be idempotent instead?
  → Idempotent consumers make "at-least-once" delivery equivalent
    to exactly-once in practice.
  → The idempotent approach is simple. Exactly-once delivery
    is a distributed systems research problem.

Problem: We need real-time analytics across all microservices
Too hard pile: Do we actually need real-time?
  → Check with stakeholders: "real-time" often means "within
    an hour" when you ask what decisions depend on the data.
  → Batch processing with hourly updates is 10x simpler.
  → Build batch first. If someone proves they need real-time
    for a specific use case, build it for that case only.

Problem: We need to support offline mode for the mobile app
Too hard pile: Do our users actually go offline?
  → Check the data: 99.3% of sessions have continuous connectivity.
  → The 0.7% that don't are in elevators and tunnels
    (sessions under 30 seconds).
  → A "retry when connected" approach is 100x simpler than
    full offline mode with conflict resolution.

Problem: We need to migrate to a new database with zero downtime
Too hard pile: Can we afford 5 minutes of downtime at 3am?
  → Zero-downtime migration requires dual-writes, shadow reads,
    reconciliation, gradual cutover, and rollback capability.
  → A 5-minute maintenance window requires a tested migration
    script and a rollback script.
  → The maintenance window approach takes days to implement.
    The zero-downtime approach takes months.

Engineering Decisions Where Simplicity Wins

Don't Use Distributed Transactions If You Can Avoid Them

The complex approach:
  Two-phase commit across three services to ensure order,
  payment, and inventory are all updated atomically.
  Requires: distributed transaction coordinator, timeout handling,
  rollback logic, failure recovery, and extensive testing.

The simple approach:
  Use the Saga pattern with compensating transactions.
  Order service creates order → Payment service charges card →
  If payment fails, order service cancels order.
  Each step is a local transaction. Each service is independent.

Even simpler:
  Put order and payment in the same service and database.
  Use a regular database transaction.
  "But that's not microservices!" — correct, and it works.

The simplest solution that handles your actual requirements
is the best solution. Not the most architecturally pure one.

Don't Build Real-Time If Batch Works

Scenario: Product wants a dashboard showing order volume trends

The complex approach:
  Stream processing with Kafka, Flink, and a real-time data
  pipeline feeding a time-series database.
  Cost: 3 engineers for 2 months, plus ongoing maintenance.

The simple approach:
  A cron job that runs a SQL query every 15 minutes and
  writes results to a dashboard table.
  Cost: 1 engineer for 2 days.

The simple approach is correct when:
  → The dashboard is checked a few times a day
  → Decisions based on the data don't need minute-level precision
  → The data volume fits in a single database query
  → The team is small and can't afford to maintain a streaming pipeline

Build the cron job. If it breaks, add streaming for that specific case.
Don't build the streaming pipeline because it feels more professional.

Don't Optimize Code That Runs Once

Scenario: A data migration script that runs one time

The complex approach:
  Optimize the script to run in 10 minutes instead of 2 hours.
  Add parallelism, batch processing, and connection pooling.
  Time to write: 3 days.

The simple approach:
  Let it run for 2 hours. Go get lunch.
  Time to write: 2 hours.

The optimization saves 110 minutes of computer time
and costs 22 hours of engineer time. The math doesn't work.

Exceptions where optimization matters for one-time scripts:
  → The script must complete within a maintenance window
  → The script locks tables that block production traffic
  → The script will be run multiple times during testing
  → The approach teaches you something valuable for other work

Don't Build a Framework When a Script Will Do

Scenario: You need to generate test data for development

The complex approach:
  Build a configurable test data generation framework with
  pluggable generators, relationship resolution, and a CLI
  with 15 flags.
  Time to build: 2 weeks.

The simple approach:
  Write a Python script with hardcoded values that creates
  the 5 types of test data you actually need.
  Time to build: 2 hours.

The framework handles future needs that may never materialize.
The script handles today's actual need. If the need grows,
evolve the script. You'll know more about the requirements then.

The Checklist of Avoidable Stupidity

Before making a technical decision, check:

  □ Am I solving the actual problem or an imagined future problem?
    → "We might need to scale to 10M users" — do you have 10K today?

  □ Am I adding complexity to handle a case that may never happen?
    → "What if two users edit the same record at the exact same time?"
    → Check: how often does this actually happen in production?

  □ Is there a simpler technology that solves this?
    → PostgreSQL before Cassandra
    → Cron before Airflow
    → Monolith before microservices
    → File storage before object storage
    → SQL before NoSQL

  □ Am I choosing this technology because it's interesting or
    because it's the best fit?
    → "Let's use Rust for this CRUD API" — is there a performance
      requirement that Go or Python can't meet?

  □ Can I avoid the problem entirely instead of solving it?
    → "How do we handle cache invalidation?" — can you avoid caching?
    → "How do we sync data between services?" — can it live in one service?

  □ Am I optimizing something that doesn't need to be fast?
    → Profile before optimizing. Measure before assuming.

Compounding Returns of Avoiding Mistakes

Year 1:
  Team A makes 1 brilliant decision and 3 avoidable mistakes
  Team B makes 0 brilliant decisions and 0 avoidable mistakes

  Team A: brilliant decision gains 20% velocity
          3 mistakes cost 15% velocity each
          Net: 20% - 45% = -25% velocity

  Team B: no gains from brilliance
          no losses from mistakes
          Net: 0% change, steady velocity

Year 2:
  Team A is slower because they're maintaining the complexity
  from their brilliant decision AND fixing the consequences
  of their mistakes. Both compound.

  Team B is shipping features at a steady pace. Their simple
  architecture is easy to modify. New hires are productive
  in weeks, not months.

By Year 3, Team B is significantly ahead. Not because they
did anything exceptional, but because they consistently
avoided the mistakes that slow teams down.

When to Be Brilliant Instead

Avoiding stupidity is the default strategy. But there are times when the simple path is not enough.

Choose complexity when:

  → The simple approach demonstrably cannot meet requirements
    (you've measured, not guessed)
  → The cost of getting it wrong is low (you can revert easily)
  → The team has deep expertise in the complex approach
  → The complexity is in a contained area, not spread across
    the entire system

Choose simplicity when:

  → You're unsure whether the requirements justify complexity
  → The team is learning the domain (prefer fewer unknowns)
  → The system is early-stage and requirements will change
  → You can always add complexity later but can't easily
    remove it

Default to simplicity. Escalate to complexity when forced by
evidence, not by speculation or enthusiasm.

Real-World Examples

Stupid: Using Kubernetes for a single application with 3 instances
Simple: A load balancer and 3 VMs with a deploy script
Why: Kubernetes solves problems you don't have yet. The operational
overhead of running a cluster exceeds the value of orchestration
when you have 3 instances.

Stupid: Building a custom authentication system
Simple: Use an existing auth provider (Auth0, Cognito, Firebase Auth)
Why: Authentication has edge cases (password reset, account lockout,
MFA, session management, CSRF, token rotation) that take months to
handle correctly. A provider handles them from day one.

Stupid: Designing for 10x current scale from the start
Simple: Design for 2-3x current scale with a plan for how
you'd reach 10x
Why: The bottlenecks at 10x are almost never where you predict
them. Design for the next step, not the step after that.

Stupid: Writing a custom ORM
Simple: Use an existing ORM or write raw SQL
Why: ORMs are deceptively complex. The first 80% is easy to build.
The last 20% (migrations, connection pooling, lazy loading, query
optimization, transaction management) is where years of engineering
hide.

Common Pitfalls

Confusing simplicity with laziness: Choosing the simple approach is not taking shortcuts. It is deliberately selecting the solution with the fewest failure modes. Simplicity requires discipline because the complex approach is often more intellectually appealing.
Using "too hard" as an excuse to avoid learning: The too hard pile is for problems that are unnecessarily complex for your situation, not for problems you don't feel like understanding. If the simple approach can't meet your requirements, you need to learn the complex one.
Premature simplification: Just as premature optimization is wasteful, prematurely simplifying can paint you into a corner. Make sure the simple approach actually meets current requirements. A cron job that needs to run every second is no longer simple.
Not recognizing when you've crossed the threshold: The simple approach works until it doesn't. Know the signals: increasing operational burden, scaling limits being hit, user-facing quality degradation. When you see these, it's time to invest in complexity for that specific area.
Assuming others are being stupid: When you see a complex solution, apply Chesterton's Fence before concluding it's unnecessary. Maybe the team tried the simple approach and hit a wall you haven't encountered yet.

Key Takeaways

Avoiding mistakes compounds into success more reliably than pursuing brilliance. The upside of a clever solution is bounded; the downside of a stupid one is unbounded.
Keep a "too hard" pile. Before solving a hard problem, ask whether you can avoid it entirely. Idempotent consumers instead of exactly-once delivery. Batch instead of real-time. Maintenance window instead of zero-downtime migration.
Default to the simplest technology and architecture that meets your actual requirements. PostgreSQL before Cassandra, monolith before microservices, cron before streaming. Escalate to complexity only when evidence (not speculation) demands it.
Don't optimize code that runs once, don't build frameworks when scripts work, and don't design for 10x scale when you're at 1x. Solve today's problem today. When tomorrow's problem arrives, you'll know more about it than you do now.
The discipline of simplicity is harder than the indulgence of complexity. Complex solutions feel impressive. Simple solutions feel obvious in retrospect. Choose obvious.