5 min read
On this page

Cloud-Native Architecture

The Twelve-Factor App

The twelve-factor methodology defines best practices for building cloud-native applications that are portable, scalable, and maintainable.

Factor Principle Cloud Application
I. Codebase One codebase, many deploys Git repo with CI/CD pipelines
II. Dependencies Explicitly declare and isolate Package managers, container images
III. Config Store config in environment Env vars, Parameter Store, Secrets Manager
IV. Backing Services Treat as attached resources RDS, ElastiCache, SQS as swappable URLs
V. Build, Release, Run Strictly separate stages CI builds image, CD deploys to env
VI. Processes Execute as stateless processes No sticky sessions, external session store
VII. Port Binding Export services via port binding Container exposes port, LB routes traffic
VIII. Concurrency Scale out via the process model Horizontal scaling, not vertical
IX. Disposability Fast startup and graceful shutdown SIGTERM handling, health checks
X. Dev/Prod Parity Keep environments similar IaC ensures identical infrastructure
XI. Logs Treat logs as event streams Write to stdout, aggregate externally
XII. Admin Processes Run admin tasks as one-off processes Kubernetes Jobs, Lambda invocations

Microservices in the Cloud

Decomposition Strategies

Monolith                          Microservices
┌─────────────────┐               ┌──────┐ ┌──────┐ ┌──────┐
│  User Module    │               │ User │ │Order │ │ Pay  │
│  Order Module   │    ──────►    │ Svc  │ │ Svc  │ │ Svc  │
│  Payment Module │               └──┬───┘ └──┬───┘ └──┬───┘
│  Inventory Mod  │                  │        │        │
└─────────────────┘               ┌──┴───┐ ┌──┴───┐ ┌──┴───┐
 Single DB                        │ DB   │ │ DB   │ │ DB   │
                                  └──────┘ └──────┘ └──────┘
                                  Database per service

Communication Patterns

Synchronous (request-response):

  • REST over HTTP/HTTPS
  • gRPC for high-performance internal communication
  • GraphQL for flexible client queries

Asynchronous (event-driven):

  • Message queues (SQS, Cloud Tasks) for point-to-point
  • Pub/sub topics (SNS, Pub/Sub) for fan-out
  • Event streaming (Kinesis, Kafka) for ordered, replayable events

Service Discovery

Approach Implementation Pros/Cons
DNS-based Route 53, Cloud DNS Simple; TTL caching delays
Service registry Consul, Eureka Rich metadata; added complexity
Platform-native K8s Services, Cloud Map Integrated; platform-specific
Service mesh Envoy sidecar Transparent; resource overhead

Service Mesh

A service mesh manages service-to-service communication with a dedicated infrastructure layer.

Architecture

┌─────────────────────────────────────────────┐
│                 Control Plane                │
│  (Istio/Linkerd: config, certs, policies)   │
└──────┬──────────────┬──────────────┬────────┘
       │              │              │
┌──────▼──────┐ ┌─────▼──────┐ ┌────▼───────┐
│ ┌─────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │ Service │ │ │ │Service │ │ │ │Service │ │
│ │    A    │ │ │ │   B    │ │ │ │   C    │ │
│ └────┬────┘ │ │ └───┬────┘ │ │ └───┬────┘ │
│ ┌────▼────┐ │ │ ┌───▼────┐ │ │ ┌───▼────┐ │
│ │ Envoy   │◄├─┤►│ Envoy  │◄├─┤►│ Envoy  │ │
│ │ Sidecar │ │ │ │Sidecar │ │ │ │Sidecar │ │
│ └─────────┘ │ │ └────────┘ │ │ └────────┘ │
└─────────────┘ └────────────┘ └────────────┘
      Data Plane (proxies handle all traffic)

Service Mesh Capabilities

  • mTLS: Automatic mutual TLS between all services
  • Traffic management: Canary deployments, traffic splitting, retries

Deployment Strategies: Rolling, Blue-Green, Canary, A/B

  • Observability: Distributed tracing, metrics, access logs without code changes
  • Resilience: Circuit breaking, rate limiting, timeouts, fault injection

Mesh Options

Mesh Proxy Complexity Cloud Integration
Istio Envoy High AWS App Mesh, GKE built-in
Linkerd linkerd2-proxy Medium Lightweight, Rust-based
Consul Connect Envoy Medium HashiCorp ecosystem
AWS App Mesh Envoy Low Deep AWS service integration

Event-Driven Architecture

Amazon EventBridge

Event Sources              EventBridge              Targets
┌─────────┐               ┌──────────┐            ┌─────────┐
│ AWS Svc │──────────────►│          │───────────►│ Lambda  │
│ (S3,EC2)│               │  Event   │            └─────────┘
└─────────┘               │   Bus    │            ┌─────────┐
┌─────────┐               │          │───────────►│  SQS    │
│ SaaS    │──────────────►│  Rules   │            └─────────┘
│(Stripe) │               │  match   │            ┌─────────┐
└─────────┘               │  and     │───────────►│Step Fn  │
┌─────────┐               │  route   │            └─────────┘
│ Custom  │──────────────►│          │            ┌─────────┐
│  App    │               │          │───────────►│  API    │
└─────────┘               └──────────┘            └─────────┘
  • Event buses: Default, custom, and SaaS partner buses
  • Rules: Match events by pattern (source, detail-type, fields)
  • Schema registry: Discover and validate event schemas
  • Archive and replay: Store events and replay for debugging

Google Cloud Pub/Sub

  • Fully managed messaging with at-least-once delivery
  • Push and pull subscription modes
  • Dead-letter topics for failed message handling
  • Ordering keys for message sequencing within a partition
  • BigQuery subscriptions for direct analytics ingestion

Event Patterns

Pattern Description Example
Event notification Inform consumers of state change Order placed, user signed up
Event-carried state Include full state in event Avoid callback to source
Event sourcing Store state as sequence of events Audit trail, temporal queries
CQRS Separate read and write models Scale reads independently

Stateless Design

Principles

All application instances must be interchangeable. State belongs in external stores.

Stateful (avoid)                  Stateless (prefer)
┌─────────┐                       ┌─────────┐
│ Server  │                       │ Server  │──► Redis (sessions)
│ sessions│                       │ (no     │──► S3 (uploads)
│ uploads │                       │  local  │──► RDS (data)
│ cache   │                       │  state) │──► ElastiCache
└─────────┘                       └─────────┘
  ✗ Can't scale horizontally       ✓ Any instance handles any request
  ✗ Sticky sessions required       ✓ Load balancer distributes freely
  ✗ Instance failure = data loss   ✓ Instance failure = no data loss

Externalized State Stores

State Type Service Access Pattern
Session data Redis, Memcached Low-latency key-value
File uploads S3, GCS Presigned URLs
Configuration Parameter Store, Consul Key-value with versioning
Feature flags LaunchDarkly, ConfigCat Real-time evaluation
Distributed locks Redis (Redlock), DynamoDB Conditional writes

Health Checks and Readiness

Health Check Types

# Kubernetes health probes
livenessProbe:          # Is the process alive?
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10
  failureThreshold: 3   # Restart after 3 failures

readinessProbe:         # Can it serve traffic?
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

startupProbe:           # Has it finished starting?
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10     # Up to 300s to start

Load Balancer Health Checks

  • ALB checks target health before routing traffic
  • Unhealthy targets are removed from rotation
  • Grace period allows time for initialization before checking
  • Deregistration delay drains connections before removal

Cloud-Native Patterns

Sidecar Pattern

A helper container deployed alongside the main application container.

Pod
┌──────────────────────────────┐
│  ┌──────────┐  ┌──────────┐ │
│  │   App    │  │ Sidecar  │ │
│  │Container │  │(log agent│ │
│  │          │  │ proxy,   │ │
│  │          │  │ auth)    │ │
│  └──────────┘  └──────────┘ │
│     shared volumes/network   │
└──────────────────────────────┘

Use cases: Log collection (Fluentd), service mesh proxy (Envoy), secrets injection (Vault agent).

Ambassador Pattern

A proxy that acts as an intermediary for outbound connections.

  • Handles connection pooling, retries, and circuit breaking
  • Centralizes client-side logic outside the application
  • Implemented as a sidecar proxy (e.g., Envoy with custom config)

Additional Cloud-Native Patterns

Pattern Purpose Implementation
Strangler Fig Incremental migration from monolith API Gateway routes to old/new
Bulkhead Isolate failures between components Separate thread pools, services
Circuit Breaker Prevent cascading failures Envoy, Hystrix, resilience4j
Saga Distributed transactions Step Functions, Temporal
Outbox Reliable event publishing DB + CDC (Debezium)
Backend for Frontend Tailored APIs per client type Separate BFF services

Key Takeaways

  • The twelve-factor methodology provides a blueprint for cloud-native application design
  • Microservices enable independent deployment and scaling but add distributed systems complexity
  • Service meshes handle cross-cutting concerns (mTLS, retries, observability) transparently
  • Event-driven architecture decouples producers and consumers for better scalability
  • Stateless design is foundational; all persistent state must live in external stores
  • Cloud-native patterns (sidecar, circuit breaker, saga) solve recurring distributed problems