4 min read
On this page

Cloud Compute Services

Virtual Machines

VMs provide the most flexible compute model, offering full OS-level control over the environment.

AWS EC2

Instance Naming: m5.xlarge
                 │ │  └── Size (nano → metal)
                 │ └── Generation
                 └── Family (m=general, c=compute, r=memory, p=GPU)

Key instance families:

| Family | Optimized For | Use Case | |--------|--------------|----------| | M (General) | Balanced CPU/memory | Web servers, app servers | | C (Compute) | High CPU-to-memory ratio | Batch processing, ML inference | | R (Memory) | High memory-to-CPU ratio | In-memory caches, databases | | P/G (Accelerated) | GPU compute | ML training, rendering | | I (Storage) | High sequential I/O | Data warehousing, HDFS |

Google Compute Engine

  • Predefined machine types: e2, n2, c2, m2 families
  • Custom machine types: Specify exact vCPUs and memory (no equivalent in AWS)
  • Per-second billing with 1-minute minimum
  • Live migration: VMs migrate transparently during host maintenance

Azure Virtual Machines

  • Hybrid Benefit: Use existing Windows/SQL Server licenses for discounts
  • Virtual Machine Scale Sets: Managed auto-scaling groups
  • Confidential VMs: Hardware-based trusted execution environments

Containers

Container Orchestration Services

                    ┌─────────────────────────┐
                    │  Container Orchestration │
                    ├────────────┬────────────┤
                    │ Managed K8s│  Managed   │
                    │            │ Containers │
                    ├────────────┼────────────┤
           AWS      │ EKS        │ ECS        │
           GCP      │ GKE        │ Cloud Run  │
           Azure    │ AKS        │ Container  │
                    │            │ Apps       │
                    └────────────┴────────────┘

Amazon ECS and EKS

ECS (Elastic Container Service):

  • AWS-native orchestrator with task definitions
  • Tight integration with ALB, IAM, CloudWatch
  • Simpler than Kubernetes for AWS-only workloads

EKS (Elastic Kubernetes Service):

  • Managed Kubernetes control plane
  • Compatible with standard Kubernetes tooling (kubectl, Helm)
  • Supports Fargate profiles for serverless pod execution

Google GKE

  • Autopilot mode: Fully managed node provisioning and scaling
  • Release channels: Rapid, Regular, Stable for version management
  • Advanced networking: GKE Dataplane V2 (eBPF-based)
  • Strongest Kubernetes offering among cloud providers (Google created K8s)

Serverless Containers

AWS Fargate:

  • Run containers without managing servers
  • Specify CPU and memory per task; AWS handles placement
  • Works with both ECS and EKS

Google Cloud Run:

  • Fully managed container platform with scale-to-zero
  • Any container that listens on a port; no Dockerfile restrictions
  • Request-based billing with concurrency control
# Cloud Run service definition
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-service
spec:
  template:
    spec:
      containers:
        - image: gcr.io/project/my-app:latest
          resources:
            limits:
              memory: 512Mi
              cpu: "1"
      containerConcurrency: 80

Serverless Compute

AWS Lambda

  • Runtime: Node.js, Python, Java, Go, .NET, Ruby, custom runtimes
  • Limits: 15-minute timeout, 10 GB memory, 10 GB ephemeral storage
  • Triggers: API Gateway, S3, SQS, DynamoDB Streams, EventBridge
  • Pricing: $0.20 per 1M requests + duration × memory allocated

Google Cloud Functions

  • 1st gen: HTTP and event-triggered, single concurrency
  • 2nd gen: Built on Cloud Run, supports concurrency, longer timeout (60 min)
  • Event sources: Cloud Storage, Pub/Sub, Firestore, Firebase

Azure Functions

  • Durable Functions: Stateful workflows with orchestrator patterns
  • Consumption plan: True serverless with scale-to-zero
  • Premium plan: Pre-warmed instances to avoid cold starts

Auto-Scaling

Scaling Strategies

Reactive Scaling                    Predictive Scaling
─────────────────                   ──────────────────
Monitor metrics ──► Scale           Analyze historical ──► Pre-scale
(CPU, memory,       when            patterns               before
 request count)     threshold                              demand
                    breached

Auto-Scaling Configuration

| Parameter | Description | Typical Value | |-----------|-------------|---------------| | Min instances | Floor for capacity | 2 (for HA) | | Max instances | Ceiling to control cost | Workload-dependent | | Target metric | What drives scaling | CPU 60-70% | | Cooldown period | Delay between actions | 300 seconds | | Scale-in protection | Prevent premature scale-down | Enabled for stateful |

Horizontal Pod Autoscaler (Kubernetes)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65

Spot and Preemptible Instances

Pricing Comparison

| Type | Discount | Interruption Notice | Best For | |------|----------|-------------------|----------| | On-demand | 0% | N/A | Baseline, unpredictable | | Reserved (1yr) | ~30-40% | N/A | Steady-state workloads | | Reserved (3yr) | ~50-60% | N/A | Long-term commitments | | Spot / Preemptible | 60-90% | 2 min (AWS) / 30s (GCP) | Batch, fault-tolerant |

Spot Instance Strategies

  1. Diversify instance types: Use multiple families and sizes
  2. Spread across AZs: Different spot pools reduce interruption risk
  3. Use Spot Fleet / Managed Instance Groups: Automatic replacement
  4. Checkpointing: Save progress periodically for resumable workloads
  5. Graceful shutdown handlers: Catch interruption signals
# AWS Spot interruption handler (Lambda)
def handler(event, context):
    instance_id = event['detail']['instance-id']
    action = event['detail']['instance-action']
    # Drain connections, save state, deregister from load balancer
    drain_instance(instance_id)
    checkpoint_work(instance_id)

Cold Start Optimization

Cold starts occur when a new execution environment must be initialized.

Cold Start Components

Total Cold Start Time
├── Infrastructure provisioning (cloud provider)
├── Runtime initialization (language VM startup)
├── Dependency loading (libraries, frameworks)
└── Application initialization (DB connections, config)

Optimization Techniques

| Technique | Impact | Trade-off | |-----------|--------|-----------| | Provisioned concurrency | Eliminates cold start | Higher cost | | Smaller deployment packages | Faster loading | May limit functionality | | Lazy initialization | Faster init | First-request latency | | Language choice (Go, Rust) | Faster startup than JVM | Developer preference | | SnapStart (Java on Lambda) | Checkpoint/restore | Java-specific | | Keep-alive pinging | Keeps instances warm | Added invocation cost |

Cold Start Latency by Runtime

Rust/Go       ████  ~50-100ms
Python        ████████  ~200-400ms
Node.js       ████████  ~200-400ms
.NET          ████████████  ~400-800ms
Java          ████████████████  ~800-3000ms
Java+SnapStart ██████  ~100-200ms

Choosing a Compute Model

Need full OS control?
  └─ Yes → Virtual Machines
  └─ No → Need container portability?
            └─ Yes → Need orchestration?
                      └─ Yes → Managed Kubernetes (EKS/GKE/AKS)
                      └─ No → Serverless containers (Cloud Run/Fargate)
            └─ No → Event-driven, short-lived?
                      └─ Yes → FaaS (Lambda/Cloud Functions)
                      └─ No → PaaS (App Engine/Elastic Beanstalk)

Key Takeaways

  • VMs offer maximum control; containers balance portability with efficiency
  • Managed Kubernetes (EKS, GKE, AKS) handles control plane operations
  • Serverless containers (Cloud Run, Fargate) remove node management entirely
  • Spot instances can reduce costs by 60-90% for fault-tolerant workloads
  • Cold start optimization is critical for latency-sensitive serverless functions
  • Auto-scaling should combine reactive metrics with predictive patterns