Cloud Compute Services

Virtual Machines

VMs provide the most flexible compute model, offering full OS-level control over the environment.

AWS EC2

Instance Naming: m5.xlarge
                 │ │  └── Size (nano → metal)
                 │ └── Generation
                 └── Family (m=general, c=compute, r=memory, p=GPU)

Key instance families:

Family	Optimized For	Use Case
M (General)	Balanced CPU/memory	Web servers, app servers
C (Compute)	High CPU-to-memory ratio	Batch processing, ML inference
R (Memory)	High memory-to-CPU ratio	In-memory caches, databases
P/G (Accelerated)	GPU compute	ML training, rendering
I (Storage)	High sequential I/O	Data warehousing, HDFS

Google Compute Engine

Predefined machine types: e2, n2, c2, m2 families
Custom machine types: Specify exact vCPUs and memory (no equivalent in AWS)
Per-second billing with 1-minute minimum
Live migration: VMs migrate transparently during host maintenance

Azure Virtual Machines

Hybrid Benefit: Use existing Windows/SQL Server licenses for discounts
Virtual Machine Scale Sets: Managed auto-scaling groups
Confidential VMs: Hardware-based trusted execution environments

Containers

Container Orchestration Services

                    ┌─────────────────────────┐
                    │  Container Orchestration │
                    ├────────────┬────────────┤
                    │ Managed K8s│  Managed   │
                    │            │ Containers │
                    ├────────────┼────────────┤
           AWS      │ EKS        │ ECS        │
           GCP      │ GKE        │ Cloud Run  │
           Azure    │ AKS        │ Container  │
                    │            │ Apps       │
                    └────────────┴────────────┘

Amazon ECS and EKS

ECS (Elastic Container Service):

AWS-native orchestrator with task definitions
Tight integration with ALB, IAM, CloudWatch
Simpler than Kubernetes for AWS-only workloads

EKS (Elastic Kubernetes Service):

Managed Kubernetes control plane
Compatible with standard Kubernetes tooling (kubectl, Helm)
Supports Fargate profiles for serverless pod execution

Google GKE

Autopilot mode: Fully managed node provisioning and scaling
Release channels: Rapid, Regular, Stable for version management
Advanced networking: GKE Dataplane V2 (eBPF-based)
Strongest Kubernetes offering among cloud providers (Google created K8s)

Serverless Containers

AWS Fargate:

Run containers without managing servers
Specify CPU and memory per task; AWS handles placement
Works with both ECS and EKS

Google Cloud Run:

Fully managed container platform with scale-to-zero
Any container that listens on a port; no Dockerfile restrictions
Request-based billing with concurrency control

# Cloud Run service definition
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-service
spec:
  template:
    spec:
      containers:
        - image: gcr.io/project/my-app:latest
          resources:
            limits:
              memory: 512Mi
              cpu: "1"
      containerConcurrency: 80

Serverless Compute

AWS Lambda

Runtime: Node.js, Python, Java, Go, .NET, Ruby, custom runtimes
Limits: 15-minute timeout, 10 GB memory, 10 GB ephemeral storage
Triggers: API Gateway, S3, SQS, DynamoDB Streams, EventBridge
Pricing: $0.20 per 1M requests + duration × memory allocated

Google Cloud Functions

1st gen: HTTP and event-triggered, single concurrency
2nd gen: Built on Cloud Run, supports concurrency, longer timeout (60 min)
Event sources: Cloud Storage, Pub/Sub, Firestore, Firebase

Azure Functions

Durable Functions: Stateful workflows with orchestrator patterns
Consumption plan: True serverless with scale-to-zero
Premium plan: Pre-warmed instances to avoid cold starts

Auto-Scaling

Scaling Strategies

Reactive Scaling                    Predictive Scaling
─────────────────                   ──────────────────
Monitor metrics ──► Scale           Analyze historical ──► Pre-scale
(CPU, memory,       when            patterns               before
 request count)     threshold                              demand
                    breached

Auto-Scaling Configuration

Parameter	Description	Typical Value
Min instances	Floor for capacity	2 (for HA)
Max instances	Ceiling to control cost	Workload-dependent
Target metric	What drives scaling	CPU 60-70%
Cooldown period	Delay between actions	300 seconds
Scale-in protection	Prevent premature scale-down	Enabled for stateful

Horizontal Pod Autoscaler (Kubernetes)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65

Spot and Preemptible Instances

Pricing Comparison

Type	Discount	Interruption Notice	Best For
On-demand	0%	N/A	Baseline, unpredictable
Reserved (1yr)	~30-40%	N/A	Steady-state workloads
Reserved (3yr)	~50-60%	N/A	Long-term commitments
Spot / Preemptible	60-90%	2 min (AWS) / 30s (GCP)	Batch, fault-tolerant

Spot Instance Strategies

Diversify instance types: Use multiple families and sizes
Spread across AZs: Different spot pools reduce interruption risk
Use Spot Fleet / Managed Instance Groups: Automatic replacement
Checkpointing: Save progress periodically for resumable workloads
Graceful shutdown handlers: Catch interruption signals

# AWS Spot interruption handler (Lambda)
def handler(event, context):
    instance_id = event['detail']['instance-id']
    action = event['detail']['instance-action']
    # Drain connections, save state, deregister from load balancer
    drain_instance(instance_id)
    checkpoint_work(instance_id)

Cold Start Optimization

Cold starts occur when a new execution environment must be initialized.

Cold Start Components

Total Cold Start Time
├── Infrastructure provisioning (cloud provider)
├── Runtime initialization (language VM startup)
├── Dependency loading (libraries, frameworks)
└── Application initialization (DB connections, config)

Optimization Techniques

Technique	Impact	Trade-off
Provisioned concurrency	Eliminates cold start	Higher cost
Smaller deployment packages	Faster loading	May limit functionality
Lazy initialization	Faster init	First-request latency
Language choice (Go, Rust)	Faster startup than JVM	Developer preference
SnapStart (Java on Lambda)	Checkpoint/restore	Java-specific
Keep-alive pinging	Keeps instances warm	Added invocation cost

Cold Start Latency by Runtime

Rust/Go       ████  ~50-100ms
Python        ████████  ~200-400ms
Node.js       ████████  ~200-400ms
.NET          ████████████  ~400-800ms
Java          ████████████████  ~800-3000ms
Java+SnapStart ██████  ~100-200ms

Choosing a Compute Model

Need full OS control?
  └─ Yes → Virtual Machines
  └─ No → Need container portability?
            └─ Yes → Need orchestration?
                      └─ Yes → Managed Kubernetes (EKS/GKE/AKS)
                      └─ No → Serverless containers (Cloud Run/Fargate)
            └─ No → Event-driven, short-lived?
                      └─ Yes → FaaS (Lambda/Cloud Functions)
                      └─ No → PaaS (App Engine/Elastic Beanstalk)

Key Takeaways

VMs offer maximum control; containers balance portability with efficiency
Managed Kubernetes (EKS, GKE, AKS) handles control plane operations
Serverless containers (Cloud Run, Fargate) remove node management entirely
Spot instances can reduce costs by 60-90% for fault-tolerant workloads
Cold start optimization is critical for latency-sensitive serverless functions
Auto-scaling should combine reactive metrics with predictive patterns