Cloud-Native Architecture
The Twelve-Factor App
The twelve-factor methodology defines best practices for building cloud-native applications that are portable, scalable, and maintainable.
| Factor | Principle | Cloud Application | |--------|-----------|-------------------| | I. Codebase | One codebase, many deploys | Git repo with CI/CD pipelines | | II. Dependencies | Explicitly declare and isolate | Package managers, container images | | III. Config | Store config in environment | Env vars, Parameter Store, Secrets Manager | | IV. Backing Services | Treat as attached resources | RDS, ElastiCache, SQS as swappable URLs | | V. Build, Release, Run | Strictly separate stages | CI builds image, CD deploys to env | | VI. Processes | Execute as stateless processes | No sticky sessions, external session store | | VII. Port Binding | Export services via port binding | Container exposes port, LB routes traffic | | VIII. Concurrency | Scale out via the process model | Horizontal scaling, not vertical | | IX. Disposability | Fast startup and graceful shutdown | SIGTERM handling, health checks | | X. Dev/Prod Parity | Keep environments similar | IaC ensures identical infrastructure | | XI. Logs | Treat logs as event streams | Write to stdout, aggregate externally | | XII. Admin Processes | Run admin tasks as one-off processes | Kubernetes Jobs, Lambda invocations |
Microservices in the Cloud
Decomposition Strategies
Monolith Microservices
┌─────────────────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ User Module │ │ User │ │Order │ │ Pay │
│ Order Module │ ──────► │ Svc │ │ Svc │ │ Svc │
│ Payment Module │ └──┬───┘ └──┬───┘ └──┬───┘
│ Inventory Mod │ │ │ │
└─────────────────┘ ┌──┴───┐ ┌──┴───┐ ┌──┴───┐
Single DB │ DB │ │ DB │ │ DB │
└──────┘ └──────┘ └──────┘
Database per service
Communication Patterns
Synchronous (request-response):
- REST over HTTP/HTTPS
- gRPC for high-performance internal communication
- GraphQL for flexible client queries
Asynchronous (event-driven):
- Message queues (SQS, Cloud Tasks) for point-to-point
- Pub/sub topics (SNS, Pub/Sub) for fan-out
- Event streaming (Kinesis, Kafka) for ordered, replayable events
Service Discovery
| Approach | Implementation | Pros/Cons | |----------|---------------|-----------| | DNS-based | Route 53, Cloud DNS | Simple; TTL caching delays | | Service registry | Consul, Eureka | Rich metadata; added complexity | | Platform-native | K8s Services, Cloud Map | Integrated; platform-specific | | Service mesh | Envoy sidecar | Transparent; resource overhead |
Service Mesh
A service mesh manages service-to-service communication with a dedicated infrastructure layer.
Architecture
┌─────────────────────────────────────────────┐
│ Control Plane │
│ (Istio/Linkerd: config, certs, policies) │
└──────┬──────────────┬──────────────┬────────┘
│ │ │
┌──────▼──────┐ ┌─────▼──────┐ ┌────▼───────┐
│ ┌─────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │ Service │ │ │ │Service │ │ │ │Service │ │
│ │ A │ │ │ │ B │ │ │ │ C │ │
│ └────┬────┘ │ │ └───┬────┘ │ │ └───┬────┘ │
│ ┌────▼────┐ │ │ ┌───▼────┐ │ │ ┌───▼────┐ │
│ │ Envoy │◄├─┤►│ Envoy │◄├─┤►│ Envoy │ │
│ │ Sidecar │ │ │ │Sidecar │ │ │ │Sidecar │ │
│ └─────────┘ │ │ └────────┘ │ │ └────────┘ │
└─────────────┘ └────────────┘ └────────────┘
Data Plane (proxies handle all traffic)
Service Mesh Capabilities
- mTLS: Automatic mutual TLS between all services
- Traffic management: Canary deployments, traffic splitting, retries

- Observability: Distributed tracing, metrics, access logs without code changes
- Resilience: Circuit breaking, rate limiting, timeouts, fault injection
Mesh Options
| Mesh | Proxy | Complexity | Cloud Integration | |------|-------|-----------|-------------------| | Istio | Envoy | High | AWS App Mesh, GKE built-in | | Linkerd | linkerd2-proxy | Medium | Lightweight, Rust-based | | Consul Connect | Envoy | Medium | HashiCorp ecosystem | | AWS App Mesh | Envoy | Low | Deep AWS service integration |
Event-Driven Architecture
Amazon EventBridge
Event Sources EventBridge Targets
┌─────────┐ ┌──────────┐ ┌─────────┐
│ AWS Svc │──────────────►│ │───────────►│ Lambda │
│ (S3,EC2)│ │ Event │ └─────────┘
└─────────┘ │ Bus │ ┌─────────┐
┌─────────┐ │ │───────────►│ SQS │
│ SaaS │──────────────►│ Rules │ └─────────┘
│(Stripe) │ │ match │ ┌─────────┐
└─────────┘ │ and │───────────►│Step Fn │
┌─────────┐ │ route │ └─────────┘
│ Custom │──────────────►│ │ ┌─────────┐
│ App │ │ │───────────►│ API │
└─────────┘ └──────────┘ └─────────┘
- Event buses: Default, custom, and SaaS partner buses
- Rules: Match events by pattern (source, detail-type, fields)
- Schema registry: Discover and validate event schemas
- Archive and replay: Store events and replay for debugging
Google Cloud Pub/Sub
- Fully managed messaging with at-least-once delivery
- Push and pull subscription modes
- Dead-letter topics for failed message handling
- Ordering keys for message sequencing within a partition
- BigQuery subscriptions for direct analytics ingestion
Event Patterns
| Pattern | Description | Example | |---------|-------------|---------| | Event notification | Inform consumers of state change | Order placed, user signed up | | Event-carried state | Include full state in event | Avoid callback to source | | Event sourcing | Store state as sequence of events | Audit trail, temporal queries | | CQRS | Separate read and write models | Scale reads independently |
Stateless Design
Principles
All application instances must be interchangeable. State belongs in external stores.
Stateful (avoid) Stateless (prefer)
┌─────────┐ ┌─────────┐
│ Server │ │ Server │──► Redis (sessions)
│ sessions│ │ (no │──► S3 (uploads)
│ uploads │ │ local │──► RDS (data)
│ cache │ │ state) │──► ElastiCache
└─────────┘ └─────────┘
✗ Can't scale horizontally ✓ Any instance handles any request
✗ Sticky sessions required ✓ Load balancer distributes freely
✗ Instance failure = data loss ✓ Instance failure = no data loss
Externalized State Stores
| State Type | Service | Access Pattern | |-----------|---------|----------------| | Session data | Redis, Memcached | Low-latency key-value | | File uploads | S3, GCS | Presigned URLs | | Configuration | Parameter Store, Consul | Key-value with versioning | | Feature flags | LaunchDarkly, ConfigCat | Real-time evaluation | | Distributed locks | Redis (Redlock), DynamoDB | Conditional writes |
Health Checks and Readiness
Health Check Types
# Kubernetes health probes
livenessProbe: # Is the process alive?
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3 # Restart after 3 failures
readinessProbe: # Can it serve traffic?
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe: # Has it finished starting?
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10 # Up to 300s to start
Load Balancer Health Checks
- ALB checks target health before routing traffic
- Unhealthy targets are removed from rotation
- Grace period allows time for initialization before checking
- Deregistration delay drains connections before removal
Cloud-Native Patterns
Sidecar Pattern
A helper container deployed alongside the main application container.
Pod
┌──────────────────────────────┐
│ ┌──────────┐ ┌──────────┐ │
│ │ App │ │ Sidecar │ │
│ │Container │ │(log agent│ │
│ │ │ │ proxy, │ │
│ │ │ │ auth) │ │
│ └──────────┘ └──────────┘ │
│ shared volumes/network │
└──────────────────────────────┘
Use cases: Log collection (Fluentd), service mesh proxy (Envoy), secrets injection (Vault agent).
Ambassador Pattern
A proxy that acts as an intermediary for outbound connections.
- Handles connection pooling, retries, and circuit breaking
- Centralizes client-side logic outside the application
- Implemented as a sidecar proxy (e.g., Envoy with custom config)
Additional Cloud-Native Patterns
| Pattern | Purpose | Implementation | |---------|---------|----------------| | Strangler Fig | Incremental migration from monolith | API Gateway routes to old/new | | Bulkhead | Isolate failures between components | Separate thread pools, services | | Circuit Breaker | Prevent cascading failures | Envoy, Hystrix, resilience4j | | Saga | Distributed transactions | Step Functions, Temporal | | Outbox | Reliable event publishing | DB + CDC (Debezium) | | Backend for Frontend | Tailored APIs per client type | Separate BFF services |
Key Takeaways
- The twelve-factor methodology provides a blueprint for cloud-native application design
- Microservices enable independent deployment and scaling but add distributed systems complexity
- Service meshes handle cross-cutting concerns (mTLS, retries, observability) transparently
- Event-driven architecture decouples producers and consumers for better scalability
- Stateless design is foundational; all persistent state must live in external stores
- Cloud-native patterns (sidecar, circuit breaker, saga) solve recurring distributed problems