When to Change Your Stack

Every engineer eventually looks at their stack and thinks "we should rewrite this." Sometimes they are right. Most of the time, they are bored.

The difference between a necessary stack change and a costly distraction is evidence. Not gut feeling, not blog posts, not what the hot startup across town is using. Evidence that your current stack is measurably holding you back in ways that matter to the business.

Changing your stack is one of the most expensive things a startup can do. It has killed companies. It has also saved companies. Knowing which situation you are in is the skill.

Signs Your Stack Is Actually Holding You Back

These are real problems that sometimes require a stack change:

Hiring Is Impossible

You are using a language or framework with a tiny talent pool, and you have been trying to hire for three months with zero qualified candidates. This is a real business constraint, not a theoretical one.

If your startup runs on a niche language and you cannot hire, the language is costing you growth. This happened to companies that bet early on Haskell or Erlang for general web development — the technology was excellent but the hiring pool was minuscule.

Performance Is Losing You Customers

Not "we could be faster." Actual customers leaving or refusing to sign up because of performance. Measurable revenue impact.

Twitter famously hit this wall with Ruby on Rails. Their "fail whale" became a cultural phenomenon. They were literally losing users because the platform could not handle the load. They rewrote critical paths in Scala and JVM-based services. This was a justified stack change.

Development Speed Has Collapsed

Features that should take days take weeks. Every change causes unexpected breakage in unrelated parts of the system. New engineers take months to become productive.

This is often a codebase problem rather than a stack problem — but sometimes the stack contributes. A dynamically typed codebase at 500,000 lines with no type annotations becomes genuinely harder to work in than a typed equivalent.

Security or Compliance Requirements

Your stack cannot meet regulatory requirements. Your framework has known vulnerabilities that are not being patched. Your language lacks the security libraries you need for your industry.

This is rare but real. Financial and healthcare startups sometimes discover that their initial stack cannot meet the compliance requirements they need to close enterprise deals.

Real signals to change your stack:
- Cannot hire for 3+ months (talent pool problem)
- Measurable customer churn due to performance
- Feature velocity has dropped 50%+ over 6 months
- Cannot meet regulatory requirements
- Framework/language is abandoned (no security patches)

False signals:
- "Language X is faster in benchmarks"
- "Framework Y has more GitHub stars"
- "Company Z switched and it worked for them"
- "I'm bored with our current stack"
- "The new version of that framework looks cool"

Signs You Are Just Bored

Engineer boredom is real and it feels exactly like a legitimate technical concern. The tell is that the argument for switching focuses on the new technology rather than the problem it solves.

"We should switch to Rust because Rust is memory-safe and fast" is a boredom signal. "We should rewrite our image processing pipeline in Rust because it is our biggest operational cost and Rust would reduce it by 60%" is a legitimate argument.

"We should switch to microservices because microservices are the industry standard" is a boredom signal. "We should extract the billing service because the billing team and the product team are blocking each other 3 times per week" is a legitimate argument.

The pattern: bored engineers talk about the solution. Engineers with real problems talk about the problem.

The Rewrite Trap

Joel Spolsky wrote about this in 2000 and it remains true: the Big Rewrite is the single worst strategic mistake a software company can make.

The trap works like this:

Month 1:  "Our codebase is terrible. Let's rewrite it."
Month 2:  New codebase is clean and fast to work in.
Month 3:  New codebase catches up to 50% of old functionality.
Month 4:  "Why doesn't feature X work?" "We haven't rebuilt that yet."
Month 5:  Edge cases start appearing. The old code handled them. The new code doesn't.
Month 6:  New codebase is at 70% functionality. Team is tired. Old users are frustrated.
Month 9:  New codebase is at 85%. That last 15% is the hardest part.
Month 12: New codebase is at 90%. The old codebase had a year of bug fixes that need to be ported.
Month 15: You've spent over a year and the new thing is about as good as the old thing was.

Netscape did this and it contributed to their death. They rewrote their browser from scratch, took years, and by the time they finished, Internet Explorer had eaten their market share.

The rewrite trap exists because old code looks worse than it is. Every strange conditional, every weird hack, every "why is this here" comment — those exist because they solved a real problem that you will encounter again. You are not replacing bad code with good code. You are replacing code that handles known edge cases with code that does not.

Incremental Migration

The alternative to a rewrite is incremental migration. Change your stack piece by piece while keeping the system running. This is harder to plan but far less risky.

The Strangler Fig Pattern

Named after the strangler fig tree that grows around its host tree and eventually replaces it. The idea: build new functionality in the new stack, redirect traffic from old to new piece by piece, and eventually the old system has nothing left to do.

Step 1: New requests go to a router
Step 2: Router sends most requests to old system
Step 3: Build new user auth in new stack
Step 4: Router sends auth requests to new system, everything else to old
Step 5: Build new billing in new stack
Step 6: Router sends auth + billing to new, everything else to old
...
Step N: Old system handles nothing. Shut it down.

This is how Shopify migrated critical systems. This is how Amazon moved from their original monolith. This is how nearly every successful large-scale migration works.

The strangler fig pattern has a crucial advantage: at every step, the system works. If you run out of time or money or motivation at step 4, you have a system that is partly migrated and fully functional. With a rewrite, if you stop at step 4, you have two broken systems.

Database Migration

The hardest part of any stack change is usually the data. Moving from MySQL to PostgreSQL, or from PostgreSQL to DynamoDB, or from a relational database to something else — this is where migrations fail.

The safe approach:

1. Dual-write: Write to both old and new database
2. Backfill: Copy historical data to new database
3. Verify: Compare reads from both databases
4. Switch reads: Start reading from new database
5. Remove dual-write: Stop writing to old database

This is more work than "dump and restore" but it gives you a rollback path at every step.

API Versioning

If you are changing your backend stack but keeping your API contract, version your API and run old and new in parallel.

/api/v1/* -> old backend (Python/Django)
/api/v2/* -> new backend (Go)

Clients migrate from v1 to v2 at their own pace.
Old backend is shut down when v1 traffic reaches zero.

How to Measure the Cost Honestly

Before committing to a stack change, quantify both sides of the equation.

Cost of Staying

- Hours per week lost to stack limitations
- Revenue lost due to performance issues
- Cost of workarounds and hacks
- Hiring failure rate attributable to stack
- Engineer attrition attributable to stack

Be honest. "Our Rails app is a little slow on one page" is not a compelling cost. "We are losing $50,000/month in churn because our dashboard takes 8 seconds to load" is.

Cost of Switching

- Engineer-months to complete the migration
- Feature development paused during migration
- Risk of bugs in the new system
- Learning curve for the team on new technology
- Opportunity cost (what else could you build?)

Be honest here too. Engineers consistently underestimate migration timelines by 2-3x. Whatever you think it will take, double it.

The switch is justified when:
  Annualized cost of staying > Cost of switching + 50% buffer

The 50% buffer accounts for the fact that you are
underestimating the cost of switching. You always are.

Real-World Migration Stories

Twitter: Ruby to JVM. Justified. They were losing users due to downtime. The fail whale was a brand problem. They migrated critical services to Scala incrementally over years, not all at once.

Facebook: PHP to Hack. Incremental. Rather than abandoning PHP, they created a typed superset (Hack) and migrated gradually. The existing code kept working. They avoided the rewrite trap entirely.

Airbnb: Rails monolith to services. Gradual, over years. They extracted services one at a time as specific pain points emerged. The monolith still exists — it just does less.

Segment: Microservices back to monolith. Yes, the reverse happens too. Segment famously moved from microservices back to a monolith because the operational overhead was crushing their small team. Sometimes the "old" architecture is the right one.

Common Pitfalls

Deciding to switch based on excitement. A conference talk about a cool new framework is not evidence that you need to switch. Ask what problem it solves for your specific business.

The full rewrite. Almost always wrong. Incremental migration is almost always right. If someone says "we need to rewrite everything from scratch," push back hard.

Underestimating migration cost. Double your estimate, then add 50%. Migrations take longer than expected because you discover edge cases, data inconsistencies, and undocumented behavior in the old system.

Migrating during a critical growth period. If you are actively growing and closing deals, this is the worst time to destabilize your stack. Migrate when you have breathing room, not when you are sprinting.

Not setting a deadline. Incremental migrations can drag on forever. Set a deadline and a definition of done. "All traffic on the new system by Q3" is a goal you can track. "Gradually migrate" is a goal that never finishes.

Changing stack to fix a people problem. If your code is messy because your team lacks discipline, a new stack will become messy too. Fix the habits before changing the tools.

Key Takeaways

Change your stack based on evidence, not excitement. Measurable hiring failure, measurable customer churn, or measurable development slowdown.
Most stack frustration is boredom, not a real problem. The tell: you are talking about the new technology instead of the problem it solves.
Avoid the Big Rewrite. Incremental migration (strangler fig pattern) is slower but dramatically less risky.
Honestly measure both the cost of staying and the cost of switching. Then add a 50% buffer to your switching cost estimate.
Data migration is the hardest part. Plan for it explicitly with dual-write and verification.
Set a deadline for your migration. Without one, it will never finish.