Microservices vs Monolith
Two Architectural Models
A monolith is a single deployable unit containing all of an application's functionality. A microservices architecture splits that functionality into many small, independently deployable services, each owning a specific business domain.
Neither is inherently better. The right choice depends on your team size, organizational structure, operational maturity, and scaling requirements. Most failed architecture decisions come from choosing microservices too early or sticking with a monolith too long.
The Monolith
All code lives in one repository, compiles into one artifact, and deploys as one unit.
┌─────────────────────────────────────────┐
│ Monolith │
│ │
│ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ Users │ │ Orders │ │ Payment │ │
│ │ Module │ │ Module │ │ Module │ │
│ └────┬─────┘ └────┬─────┘ └────┬────┘ │
│ │ │ │ │
│ └─────────────┼────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ Shared DB │ │
│ └─────────────┘ │
└─────────────────────────────────────────┘
Advantages:
- Simple to develop, test, and deploy — one build, one artifact, one deployment
- In-process function calls between modules (nanoseconds, not milliseconds)
- ACID transactions across all data — no distributed transaction headaches
- Easy to debug — a single stack trace tells the whole story
- One set of infrastructure to manage (one CI pipeline, one monitoring setup, one logging destination)
Disadvantages:
- Deployment coupling — changing one line in the payment module requires redeploying the entire application
- Scaling is all-or-nothing — if the search module needs 10x more CPU, you scale the entire monolith 10x
- Team coupling — 50 engineers committing to the same codebase creates merge conflicts, broken builds, and coordination overhead
- Technology lock-in — every module must use the same language, framework, and database
- Reliability risk — a memory leak in one module can crash the entire application
Microservices
Each service is independently developed, deployed, and scaled. Services communicate over the network via APIs or events.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Users │ │ Orders │ │ Payment │
│ Service │────→│ Service │────→│ Service │
│ │ │ │ │ │
│ [Own DB] │ │ [Own DB] │ │ [Own DB] │
└──────────┘ └──────────┘ └──────────┘
↑ ↑ ↑
└────────────────┼────────────────┘
│
[API Gateway]
│
[Client]
Advantages:
- Independent deployment — the payment team deploys without coordinating with the orders team
- Independent scaling — scale the search service to 20 instances while the user profile service runs on 2
- Technology flexibility — the recommendation service can use Python with ML libraries while the real-time service uses Rust
- Fault isolation — the payment service crashing does not take down the product catalog
- Team autonomy — each team owns their service end-to-end (code, data, deployment, monitoring)
Disadvantages:
- Distributed system complexity — network failures, latency, partial failures, eventual consistency
- Operational overhead — each service needs its own CI/CD pipeline, monitoring, alerting, logging
- Data consistency is hard — no cross-service ACID transactions. You use sagas, eventual consistency, or compensating transactions
- Debugging is hard — a single user request may span 10 services. You need distributed tracing (Jaeger, Zipkin)
- Latency overhead — in-process calls (nanoseconds) become network calls (milliseconds)
- Integration testing is painful — testing the interaction between 20 services requires a complex test environment
Detailed Comparison
| Factor | Monolith | Microservices | |--------|----------|---------------| | Deployment | All-or-nothing | Independent per service | | Scaling | Entire application | Individual services | | Development speed (small team) | Faster | Slower (overhead) | | Development speed (large org) | Slower (coordination) | Faster (autonomy) | | Data consistency | ACID transactions | Eventual consistency, sagas | | Debugging | Stack trace | Distributed tracing | | Latency | In-process (ns) | Network calls (ms) | | Operational cost | Low | High | | Technology diversity | Single stack | Polyglot | | Team independence | Low | High | | Failure blast radius | Entire application | Single service (if designed well) |
Conway's Law
"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure." — Melvin Conway, 1967
This is not just an observation — it is a force of nature in software. If you have three teams, you will get three services (or three major modules). If all engineers sit together and communicate freely, you will naturally build a monolith. If teams are distributed and autonomous, they will naturally build separate services.
The practical implication: Do not choose microservices and then try to run them with a monolith team structure. And do not try to force a monolith on an organization of 20 autonomous teams. Align your architecture with your organization.
The inverse Conway maneuver: Some organizations deliberately restructure teams to produce the architecture they want. Want microservices? Create small, autonomous teams each owning a bounded context. The architecture follows.
When to Migrate from Monolith to Microservices
Migration is justified when:
- Teams are stepping on each other. Merge conflicts are constant, deploys require coordination across 5+ teams, and a bug in one team's code blocks everyone's deployment.
- Components have different scaling needs. The search feature needs 50 servers while the admin panel needs 1, but you are scaling everything together.
- Deployment frequency is suffering. You want to deploy daily but the monolith requires a 2-hour regression test, so you deploy weekly.
- Fault isolation is critical. A bug in the recommendation engine should not crash the checkout flow.
- You have the operational maturity. You have CI/CD, automated testing, monitoring, distributed tracing, and engineers who understand distributed systems.
Do NOT migrate when:
- You have fewer than 20-30 engineers. The operational overhead of microservices will slow you down.
- You do not have automated deployment pipelines. Manual deployments times N services equals pain.
- You are trying to solve a code quality problem. Microservices do not fix bad code; they just distribute it.
- You are in the early stages of product development. You do not yet know where the service boundaries should be.
The Migration Path
The Strangler Fig Pattern
The safest migration strategy. Named after strangler fig trees that grow around a host tree and eventually replace it.
Phase 1: Monolith handles everything
┌────────────────────────┐
│ Monolith │
│ [Users] [Orders] [Pay]│
└────────────────────────┘
Phase 2: New service handles one domain, monolith proxies to it
┌────────────────────────┐ ┌──────────┐
│ Monolith │────→│ Orders │
│ [Users] [-----] [Pay] │ │ Service │
└────────────────────────┘ └──────────┘
Phase 3: More domains extracted
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Users │ │ Orders │ │ Payment │
│ Service │ │ Service │ │ Service │
└──────────┘ └──────────┘ └──────────┘
Steps:
- Identify a bounded context with clear boundaries (e.g., the orders domain)
- Build the new service alongside the monolith
- Route traffic for that domain to the new service (via API gateway or proxy)
- Once stable, remove the dead code from the monolith
- Repeat for the next domain
Data Extraction
The hardest part of migration is extracting data. In a monolith, modules often share database tables through JOINs. Extracting a service means:
- Identify which tables belong to the new service
- Create a new database for the service
- Migrate data to the new database
- Replace direct JOINs with API calls between services
- Set up change data capture or events for data that was previously joined
This is where most migrations stall. Shared database tables are the strongest coupling in a monolith.
API Gateways
An API gateway is a single entry point for all client requests to a microservices backend. Without one, clients must know the address of every service and handle cross-cutting concerns themselves.
Mobile App ──→ ┌──────────────┐ ──→ User Service
Web App ──→ │ API Gateway │ ──→ Order Service
Partner ──→ └──────────────┘ ──→ Payment Service
Responsibilities:
- Routing: Direct
/api/users/*to the user service,/api/orders/*to the order service - Authentication: Verify JWTs or API keys once at the gateway instead of in every service
- Rate limiting: Protect backend services from abuse (token bucket, sliding window)
- Request transformation: Translate between external API format and internal service formats
- Response aggregation: Combine responses from multiple services into one client response
- SSL termination: Handle HTTPS at the edge so internal traffic can be plain HTTP
- Observability: Centralized request logging, metrics, and tracing header injection
Popular API gateways:
- Kong: Open source, plugin-based, built on NGINX. Widely adopted.
- AWS API Gateway: Fully managed, integrates with Lambda and other AWS services
- Envoy: High-performance proxy often used as both API gateway and service mesh data plane
- NGINX: Can serve as a simple API gateway with reverse proxy configuration
BFF Pattern (Backend for Frontend)
Instead of one generic API gateway, create a dedicated backend for each client type (mobile BFF, web BFF, partner BFF). Each BFF tailors responses for its client — the mobile BFF returns smaller payloads, the web BFF returns richer data. This avoids the "one-size-fits-none" problem of a generic API.
Service Mesh
A service mesh is a dedicated infrastructure layer for service-to-service communication. It handles networking concerns (load balancing, retries, circuit breaking, mutual TLS, observability) without changing application code.
┌────────────────────┐ ┌────────────────────┐
│ Service A │ │ Service B │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ App Code │ │ │ │ App Code │ │
│ └──────┬───────┘ │ │ └──────┬───────┘ │
│ │ │ │ │ │
│ ┌──────┴───────┐ │ network │ ┌──────┴───────┐ │
│ │ Sidecar Proxy│◄─┼──────────►┼─►│ Sidecar Proxy│ │
│ └──────────────┘ │ │ └──────────────┘ │
└────────────────────┘ └────────────────────┘
▲ ▲
└───────── Control Plane ───────┘
How It Works
Every service instance gets a sidecar proxy (a small network proxy running alongside the application). All inbound and outbound traffic passes through the sidecar. The sidecar handles:
- Mutual TLS (mTLS): Encrypts all service-to-service traffic and verifies identities. Zero-trust networking without application changes.
- Load balancing: Intelligent, client-side load balancing with health checking
- Retries and timeouts: Automatic retries with exponential backoff for transient failures
- Circuit breaking: If Service B is failing, the sidecar stops sending traffic to it (preventing cascade failures)
- Observability: Every request is traced, metriced, and logged by the sidecar
A control plane manages all sidecars: distributing configuration, collecting telemetry, enforcing policies.
Istio vs Linkerd
| Feature | Istio | Linkerd | |---------|-------|---------| | Proxy | Envoy (C++, feature-rich) | linkerd2-proxy (Rust, lightweight) | | Complexity | High (many configuration options) | Low (opinionated, fewer knobs) | | Resource overhead | Higher (Envoy is heavier) | Lower (Rust proxy is minimal) | | Features | Comprehensive (traffic management, security, observability) | Focused (reliability, observability, mTLS) | | Learning curve | Steep | Gentle | | Best for | Complex environments needing fine-grained control | Teams wanting a service mesh without the complexity |
When to use a service mesh:
- You have 20+ services and managing retries, timeouts, and mTLS in application code is unsustainable
- You need zero-trust networking (mTLS everywhere) without changing application code
- You want consistent observability across services written in different languages
When NOT to use a service mesh:
- You have fewer than 10 services — the overhead is not justified
- Your team is not comfortable with Kubernetes (service meshes assume Kubernetes)
- You can handle cross-cutting concerns with a shared library
Real-World Migration Stories
Netflix: The 7-Year Migration
Netflix began migrating from a monolithic Java application to microservices in 2008. The migration took approximately 7 years to complete (finishing around 2015).
Why they migrated: A database corruption incident in 2008 took down their entire service for 3 days. The monolith had a single Oracle database as a single point of failure. They could not scale individual components, and a failure anywhere brought down everything.
How they did it:
- Used the strangler fig pattern — new features were built as services, old features were gradually extracted
- Built an entire platform of tools: Zuul (API gateway), Eureka (service discovery), Hystrix (circuit breaker), Ribbon (client-side load balancing)
- Moved from Oracle to a mix of Cassandra, DynamoDB, and other purpose-built databases
- Developed the "Simian Army" — Chaos Monkey and related tools that randomly kill services in production to ensure resilience
The cost: Years of engineering effort, building an entire microservices platform from scratch, and operational complexity that requires a dedicated platform engineering team.
The payoff: Netflix now deploys thousands of times per day, scales individual services independently, and has achieved the resilience to survive entire AWS region failures. They serve 250+ million subscribers with 99.99% uptime.
Lesson: Netflix had the engineering talent, the organizational scale (thousands of engineers), and the business need (global streaming at massive scale) to justify this migration. Most companies do not.
Amazon's API Mandate
In 2002, Jeff Bezos issued a now-famous mandate (paraphrased):
- All teams will expose their data and functionality through service interfaces
- Teams must communicate with each other through these interfaces
- All service interfaces must be designed to be externalizable (usable by the outside world)
- Anyone who does not do this will be fired
This mandate forced Amazon from a monolithic architecture to service-oriented architecture years before "microservices" was a term. The result was AWS itself — the internal services Amazon built (compute, storage, databases) were so well-designed for external use that they became products.
Key insight: Amazon's migration was driven by organizational structure as much as technical need. The mandate aligned the architecture with autonomous "two-pizza teams" (teams small enough to be fed by two pizzas). Conway's Law in action.
Shopify: The Modular Monolith
Not every migration story ends in microservices. Shopify took a different path: instead of splitting into microservices, they restructured their monolithic Ruby on Rails application into a modular monolith.
What they did:
- Defined clear module boundaries within the monolith (using a system they call "components")
- Enforced that modules communicate through defined interfaces, not by reaching into each other's internals
- Each module owns its database tables — no cross-module JOINs
- Modules can be extracted into services later if needed, but many never will be
Why this worked for Shopify:
- They get most of the organizational benefits of microservices (team ownership, clear boundaries) without the operational cost
- In-process communication means no network latency between modules
- ACID transactions across modules remain possible
- Deployment is still one artifact, but modules are independently testable
Lesson: The modular monolith is an underappreciated middle ground. It gives you clean boundaries and team ownership without the distributed system tax.
Common Pitfalls
- Distributed monolith. You split into microservices but they all share a database, deploy together, and cannot function independently. You have the worst of both worlds: the complexity of microservices with the coupling of a monolith.
- Too many services too early. Starting a greenfield project with 15 microservices when you have 5 engineers. You will spend more time on infrastructure than product.
- Ignoring data ownership. If two services share database tables, they are not independent services. Data ownership is the foundation of microservice independence.
- Synchronous chains. Service A calls B, which calls C, which calls D. Latency adds up, and any failure in the chain fails the entire request. Use asynchronous events where possible.
- No API versioning. Changing a service's API without versioning breaks all consumers. Version your APIs from day one.
- Skipping the platform. Microservices require platform capabilities: service discovery, centralized logging, distributed tracing, CI/CD per service. Without these, operations become unmanageable.
Key Takeaways
- Start with a monolith. Extract microservices when you have concrete evidence that you need them (team scaling, independent scaling, deployment frequency).
- Conway's Law is real. Align your architecture with your organizational structure, or change your organization to match the architecture you want.
- The modular monolith is a viable middle ground that is often overlooked.
- The strangler fig pattern is the safest migration strategy. Never attempt a big-bang rewrite.
- Data extraction is the hardest part of any migration. Shared databases are the strongest coupling.
- API gateways and service meshes are essential infrastructure for microservices at scale, but add them when the pain justifies the complexity.
- Operational maturity (CI/CD, monitoring, tracing) is a prerequisite for microservices, not something you figure out after.