Cloud Compute Services
Virtual Machines
VMs provide the most flexible compute model, offering full OS-level control over the environment.
AWS EC2
Instance Naming: m5.xlarge
│ │ └── Size (nano → metal)
│ └── Generation
└── Family (m=general, c=compute, r=memory, p=GPU)
Key instance families:
| Family | Optimized For | Use Case | |--------|--------------|----------| | M (General) | Balanced CPU/memory | Web servers, app servers | | C (Compute) | High CPU-to-memory ratio | Batch processing, ML inference | | R (Memory) | High memory-to-CPU ratio | In-memory caches, databases | | P/G (Accelerated) | GPU compute | ML training, rendering | | I (Storage) | High sequential I/O | Data warehousing, HDFS |
Google Compute Engine
- Predefined machine types: e2, n2, c2, m2 families
- Custom machine types: Specify exact vCPUs and memory (no equivalent in AWS)
- Per-second billing with 1-minute minimum
- Live migration: VMs migrate transparently during host maintenance
Azure Virtual Machines
- Hybrid Benefit: Use existing Windows/SQL Server licenses for discounts
- Virtual Machine Scale Sets: Managed auto-scaling groups
- Confidential VMs: Hardware-based trusted execution environments
Containers
Container Orchestration Services
┌─────────────────────────┐
│ Container Orchestration │
├────────────┬────────────┤
│ Managed K8s│ Managed │
│ │ Containers │
├────────────┼────────────┤
AWS │ EKS │ ECS │
GCP │ GKE │ Cloud Run │
Azure │ AKS │ Container │
│ │ Apps │
└────────────┴────────────┘
Amazon ECS and EKS
ECS (Elastic Container Service):
- AWS-native orchestrator with task definitions
- Tight integration with ALB, IAM, CloudWatch
- Simpler than Kubernetes for AWS-only workloads
EKS (Elastic Kubernetes Service):
- Managed Kubernetes control plane
- Compatible with standard Kubernetes tooling (kubectl, Helm)
- Supports Fargate profiles for serverless pod execution
Google GKE
- Autopilot mode: Fully managed node provisioning and scaling
- Release channels: Rapid, Regular, Stable for version management
- Advanced networking: GKE Dataplane V2 (eBPF-based)
- Strongest Kubernetes offering among cloud providers (Google created K8s)
Serverless Containers
AWS Fargate:
- Run containers without managing servers
- Specify CPU and memory per task; AWS handles placement
- Works with both ECS and EKS
Google Cloud Run:
- Fully managed container platform with scale-to-zero
- Any container that listens on a port; no Dockerfile restrictions
- Request-based billing with concurrency control
# Cloud Run service definition
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-service
spec:
template:
spec:
containers:
- image: gcr.io/project/my-app:latest
resources:
limits:
memory: 512Mi
cpu: "1"
containerConcurrency: 80
Serverless Compute
AWS Lambda
- Runtime: Node.js, Python, Java, Go, .NET, Ruby, custom runtimes
- Limits: 15-minute timeout, 10 GB memory, 10 GB ephemeral storage
- Triggers: API Gateway, S3, SQS, DynamoDB Streams, EventBridge
- Pricing: $0.20 per 1M requests + duration × memory allocated
Google Cloud Functions
- 1st gen: HTTP and event-triggered, single concurrency
- 2nd gen: Built on Cloud Run, supports concurrency, longer timeout (60 min)
- Event sources: Cloud Storage, Pub/Sub, Firestore, Firebase
Azure Functions
- Durable Functions: Stateful workflows with orchestrator patterns
- Consumption plan: True serverless with scale-to-zero
- Premium plan: Pre-warmed instances to avoid cold starts
Auto-Scaling
Scaling Strategies
Reactive Scaling Predictive Scaling
───────────────── ──────────────────
Monitor metrics ──► Scale Analyze historical ──► Pre-scale
(CPU, memory, when patterns before
request count) threshold demand
breached
Auto-Scaling Configuration
| Parameter | Description | Typical Value | |-----------|-------------|---------------| | Min instances | Floor for capacity | 2 (for HA) | | Max instances | Ceiling to control cost | Workload-dependent | | Target metric | What drives scaling | CPU 60-70% | | Cooldown period | Delay between actions | 300 seconds | | Scale-in protection | Prevent premature scale-down | Enabled for stateful |
Horizontal Pod Autoscaler (Kubernetes)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
Spot and Preemptible Instances
Pricing Comparison
| Type | Discount | Interruption Notice | Best For | |------|----------|-------------------|----------| | On-demand | 0% | N/A | Baseline, unpredictable | | Reserved (1yr) | ~30-40% | N/A | Steady-state workloads | | Reserved (3yr) | ~50-60% | N/A | Long-term commitments | | Spot / Preemptible | 60-90% | 2 min (AWS) / 30s (GCP) | Batch, fault-tolerant |
Spot Instance Strategies
- Diversify instance types: Use multiple families and sizes
- Spread across AZs: Different spot pools reduce interruption risk
- Use Spot Fleet / Managed Instance Groups: Automatic replacement
- Checkpointing: Save progress periodically for resumable workloads
- Graceful shutdown handlers: Catch interruption signals
# AWS Spot interruption handler (Lambda)
def handler(event, context):
instance_id = event['detail']['instance-id']
action = event['detail']['instance-action']
# Drain connections, save state, deregister from load balancer
drain_instance(instance_id)
checkpoint_work(instance_id)
Cold Start Optimization
Cold starts occur when a new execution environment must be initialized.
Cold Start Components
Total Cold Start Time
├── Infrastructure provisioning (cloud provider)
├── Runtime initialization (language VM startup)
├── Dependency loading (libraries, frameworks)
└── Application initialization (DB connections, config)
Optimization Techniques
| Technique | Impact | Trade-off | |-----------|--------|-----------| | Provisioned concurrency | Eliminates cold start | Higher cost | | Smaller deployment packages | Faster loading | May limit functionality | | Lazy initialization | Faster init | First-request latency | | Language choice (Go, Rust) | Faster startup than JVM | Developer preference | | SnapStart (Java on Lambda) | Checkpoint/restore | Java-specific | | Keep-alive pinging | Keeps instances warm | Added invocation cost |
Cold Start Latency by Runtime
Rust/Go ████ ~50-100ms
Python ████████ ~200-400ms
Node.js ████████ ~200-400ms
.NET ████████████ ~400-800ms
Java ████████████████ ~800-3000ms
Java+SnapStart ██████ ~100-200ms
Choosing a Compute Model
Need full OS control?
└─ Yes → Virtual Machines
└─ No → Need container portability?
└─ Yes → Need orchestration?
└─ Yes → Managed Kubernetes (EKS/GKE/AKS)
└─ No → Serverless containers (Cloud Run/Fargate)
└─ No → Event-driven, short-lived?
└─ Yes → FaaS (Lambda/Cloud Functions)
└─ No → PaaS (App Engine/Elastic Beanstalk)
Key Takeaways
- VMs offer maximum control; containers balance portability with efficiency
- Managed Kubernetes (EKS, GKE, AKS) handles control plane operations
- Serverless containers (Cloud Run, Fargate) remove node management entirely
- Spot instances can reduce costs by 60-90% for fault-tolerant workloads
- Cold start optimization is critical for latency-sensitive serverless functions
- Auto-scaling should combine reactive metrics with predictive patterns