Cloud Storage Services
Object Storage
Object storage provides flat-namespace, key-value access to unstructured data at massive scale.
Amazon S3
s3://bucket-name/prefix/object-key
│ │ │
│ │ └── Unique identifier (acts like a path)
│ └── Logical grouping (not a real directory)
└── Globally unique container
- Durability: 99.999999999% (11 nines) across multiple AZs
- Consistency: Strong read-after-write consistency (since Dec 2020)
- Max object size: 5 TB (multipart upload for objects > 100 MB)
- Features: Versioning, replication, event notifications, object lock
Google Cloud Storage
- Unified API across all storage classes
- Turbo replication for cross-region copies under 15 minutes
- Autoclass automatically transitions objects between storage classes
- Strongly consistent for all operations
Azure Blob Storage
- Tiers: Hot, Cool, Cold, Archive
- Access tiers can be set at object level within a container
- Immutable storage with WORM (Write Once Read Many) policies
Common Object Storage Patterns
# S3 operations with boto3
import boto3
s3 = boto3.client('s3')
# Upload with server-side encryption
s3.put_object(
Bucket='my-bucket',
Key='data/report.csv',
Body=data,
ServerSideEncryption='aws:kms',
StorageClass='INTELLIGENT_TIERING'
)
# Generate presigned URL for temporary access
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'my-bucket', 'Key': 'data/report.csv'},
ExpiresIn=3600
)
Block Storage
Block storage provides raw volumes that attach to compute instances, functioning like physical hard drives.
AWS EBS (Elastic Block Store)
| Volume Type | IOPS | Throughput | Use Case | |-------------|------|------------|----------| | gp3 (General) | 3,000-16,000 | 125-1,000 MB/s | Most workloads | | io2 (Provisioned) | Up to 256,000 | 4,000 MB/s | Databases | | st1 (Throughput) | 500 baseline | 500 MB/s | Big data, logs | | sc1 (Cold) | 250 baseline | 250 MB/s | Infrequent access |
- Snapshots stored in S3 (incremental, point-in-time)
- Multi-attach for io2 volumes (shared across instances)
- Encrypted by default with AWS-managed or customer-managed keys
GCP Persistent Disks
- pd-standard: HDD-backed, cost-effective
- pd-balanced: SSD-backed, balanced performance
- pd-ssd: SSD-backed, high IOPS
- pd-extreme: Highest performance for demanding databases
- Can attach to multiple instances in read-only mode
File Storage
Network file systems providing shared access with POSIX semantics.
AWS EFS (Elastic File System)
- NFS v4.1 protocol, automatically scales to petabytes
- Standard and One Zone storage classes
- Throughput modes: Bursting, Provisioned, Elastic
- Supports Lambda mount targets for serverless access
Other File Services
- Google Filestore: Managed NFS for GKE and Compute Engine
- Azure Files: SMB and NFS protocol support
- AWS FSx: Managed Lustre, NetApp ONTAP, Windows File Server
Storage Classes and Lifecycle
S3 Storage Classes
Access Frequency vs Cost:
High ┤ S3 Standard
│ S3 Intelligent-Tiering
│ S3 Standard-IA
│ S3 One Zone-IA
│ S3 Glacier Instant Retrieval
│ S3 Glacier Flexible Retrieval
Low ┤ S3 Glacier Deep Archive
└──────────────────────────────────────
Low ◄── Storage Cost ──► High savings
| Class | Min Duration | Retrieval Time | Use Case | |-------|-------------|----------------|----------| | Standard | None | Instant | Active data | | Intelligent-Tiering | None | Instant | Unknown patterns | | Standard-IA | 30 days | Instant | Infrequent but fast | | One Zone-IA | 30 days | Instant | Reproducible data | | Glacier Instant | 90 days | Milliseconds | Archive, fast access | | Glacier Flexible | 90 days | 1-12 hours | Compliance archives | | Deep Archive | 180 days | 12-48 hours | Long-term retention |
Lifecycle Policies
{
"Rules": [{
"ID": "ArchiveOldData",
"Status": "Enabled",
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER" },
{ "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
],
"Expiration": { "Days": 2555 },
"NoncurrentVersionExpiration": { "NoncurrentDays": 90 }
}]
}
Cloud Databases
Key-Value and Document Stores
Amazon DynamoDB:
- Single-digit millisecond latency at any scale
- Partition key + optional sort key for data modeling
- On-demand or provisioned capacity modes
- DynamoDB Streams for change data capture
- Global tables for multi-region active-active replication
Azure Cosmos DB:
- Multi-model: document, key-value, graph, column-family
- Five consistency levels (strong → eventual)
- Turnkey global distribution with multi-region writes
- Request Units (RU/s) as abstracted throughput currency
Relational Databases
Amazon Aurora:
- MySQL/PostgreSQL compatible, up to 5x throughput improvement
- Storage auto-scales in 10 GB increments up to 128 TB
- Up to 15 read replicas with sub-10ms replication lag
- Aurora Serverless v2: Scales capacity in fine-grained increments
Google Cloud Spanner:
- Globally distributed, strongly consistent relational database
- Horizontal scaling with no manual sharding
- 99.999% availability SLA (five nines)
- Uses TrueTime API for global transaction ordering
Database Selection Guide
Need global strong consistency?
└─ Yes → Spanner or Cosmos DB (strong mode)
└─ No → Need SQL?
└─ Yes → Need >1 region writes?
└─ Yes → Spanner, CockroachDB
└─ No → Aurora, Cloud SQL, Azure SQL
└─ No → Need <10ms latency at scale?
└─ Yes → DynamoDB, Cosmos DB
└─ No → Firestore, MongoDB Atlas
Caching Services
Amazon ElastiCache
Redis mode:
- In-memory data structures (strings, hashes, lists, sets, sorted sets)
- Persistence options: RDB snapshots, AOF logging
- Cluster mode for sharding across multiple nodes
- Use cases: Session store, leaderboards, real-time analytics
Memcached mode:
- Simple key-value caching with multi-threaded architecture
- No persistence, no replication
- Use cases: Simple caching, session management
Google Memorystore
- Managed Redis and Memcached
- Automatic failover and patching
- VPC-native networking for low-latency access
Caching Patterns
┌─────────┐ Cache ┌─────────┐
│ App ├──── Hit ───────►│ Cache │
│ │◄───────────────┤ │
│ │ └─────────┘
│ │ Cache
│ ├──── Miss ──────►┌─────────┐
│ │◄───────────────┤ DB │
│ ├─── Update ────►│ │
│ │ Cache └─────────┘
└─────────┘
| Pattern | Description | Consistency | |---------|-------------|-------------| | Cache-aside | App manages cache reads/writes | Eventually consistent | | Read-through | Cache loads from DB on miss | Eventually consistent | | Write-through | Write to cache and DB synchronously | Strong | | Write-behind | Write to cache, async write to DB | Eventually consistent | | TTL-based | Expire entries after time period | Time-bounded staleness |
Data Transfer Costs
Cloud egress fees are a significant and often underestimated cost factor.
| Transfer Type | Approximate Cost | |--------------|-----------------| | Ingress (upload) | Free | | Same AZ | Free | | Cross-AZ | 0.02-0.09/GB | | Internet egress | $0.05-0.12/GB |
Minimizing Transfer Costs
- Keep compute and storage in the same region/AZ
- Use VPC endpoints for AWS service access without NAT
- Compress data before transfer
- Cache aggressively with CDN for repeated reads
- Use dedicated interconnects for high-volume hybrid transfers
Key Takeaways
- Object storage provides virtually unlimited, highly durable storage for unstructured data
- Storage class tiering can reduce costs by 50-90% for infrequent access patterns
- Block storage performance varies dramatically by type; match IOPS needs to volume type
- DynamoDB and Cosmos DB offer single-digit ms latency; Spanner provides global consistency
- Caching reduces database load and latency; choose the pattern based on consistency needs
- Data egress costs require deliberate architecture to minimize cross-boundary transfers