Cloud Storage Services

Object Storage

Object storage provides flat-namespace, key-value access to unstructured data at massive scale.

Amazon S3

s3://bucket-name/prefix/object-key
       │            │        │
       │            │        └── Unique identifier (acts like a path)
       │            └── Logical grouping (not a real directory)
       └── Globally unique container

Durability: 99.999999999% (11 nines) across multiple AZs
Consistency: Strong read-after-write consistency (since Dec 2020)
Max object size: 5 TB (multipart upload for objects > 100 MB)
Features: Versioning, replication, event notifications, object lock

Google Cloud Storage

Unified API across all storage classes
Turbo replication for cross-region copies under 15 minutes
Autoclass automatically transitions objects between storage classes
Strongly consistent for all operations

Azure Blob Storage

Tiers: Hot, Cool, Cold, Archive
Access tiers can be set at object level within a container
Immutable storage with WORM (Write Once Read Many) policies

Common Object Storage Patterns

# S3 operations with boto3
import boto3

s3 = boto3.client('s3')

# Upload with server-side encryption
s3.put_object(
    Bucket='my-bucket',
    Key='data/report.csv',
    Body=data,
    ServerSideEncryption='aws:kms',
    StorageClass='INTELLIGENT_TIERING'
)

# Generate presigned URL for temporary access
url = s3.generate_presigned_url(
    'get_object',
    Params={'Bucket': 'my-bucket', 'Key': 'data/report.csv'},
    ExpiresIn=3600
)

Block Storage

Block storage provides raw volumes that attach to compute instances, functioning like physical hard drives.

AWS EBS (Elastic Block Store)

Volume Type	IOPS	Throughput	Use Case
gp3 (General)	3,000-16,000	125-1,000 MB/s	Most workloads
io2 (Provisioned)	Up to 256,000	4,000 MB/s	Databases
st1 (Throughput)	500 baseline	500 MB/s	Big data, logs
sc1 (Cold)	250 baseline	250 MB/s	Infrequent access

Snapshots stored in S3 (incremental, point-in-time)
Multi-attach for io2 volumes (shared across instances)
Encrypted by default with AWS-managed or customer-managed keys

GCP Persistent Disks

pd-standard: HDD-backed, cost-effective
pd-balanced: SSD-backed, balanced performance
pd-ssd: SSD-backed, high IOPS
pd-extreme: Highest performance for demanding databases
Can attach to multiple instances in read-only mode

File Storage

Network file systems providing shared access with POSIX semantics.

AWS EFS (Elastic File System)

NFS v4.1 protocol, automatically scales to petabytes
Standard and One Zone storage classes
Throughput modes: Bursting, Provisioned, Elastic
Supports Lambda mount targets for serverless access

Other File Services

Google Filestore: Managed NFS for GKE and Compute Engine
Azure Files: SMB and NFS protocol support
AWS FSx: Managed Lustre, NetApp ONTAP, Windows File Server

Storage Classes and Lifecycle

S3 Storage Classes

Access Frequency vs Cost:

High   ┤ S3 Standard
       │   S3 Intelligent-Tiering
       │     S3 Standard-IA
       │       S3 One Zone-IA
       │         S3 Glacier Instant Retrieval
       │           S3 Glacier Flexible Retrieval
Low    ┤             S3 Glacier Deep Archive
       └──────────────────────────────────────
         Low ◄── Storage Cost ──► High savings

Class	Min Duration	Retrieval Time	Use Case
Standard	None	Instant	Active data
Intelligent-Tiering	None	Instant	Unknown patterns
Standard-IA	30 days	Instant	Infrequent but fast
One Zone-IA	30 days	Instant	Reproducible data
Glacier Instant	90 days	Milliseconds	Archive, fast access
Glacier Flexible	90 days	1-12 hours	Compliance archives
Deep Archive	180 days	12-48 hours	Long-term retention

Lifecycle Policies

{
  "Rules": [{
    "ID": "ArchiveOldData",
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER" },
      { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
    ],
    "Expiration": { "Days": 2555 },
    "NoncurrentVersionExpiration": { "NoncurrentDays": 90 }
  }]
}

Cloud Databases

Key-Value and Document Stores

Amazon DynamoDB:

Single-digit millisecond latency at any scale
Partition key + optional sort key for data modeling
On-demand or provisioned capacity modes
DynamoDB Streams for change data capture
Global tables for multi-region active-active replication

Azure Cosmos DB:

Multi-model: document, key-value, graph, column-family
Five consistency levels (strong → eventual)
Turnkey global distribution with multi-region writes
Request Units (RU/s) as abstracted throughput currency

Relational Databases

Amazon Aurora:

MySQL/PostgreSQL compatible, up to 5x throughput improvement
Storage auto-scales in 10 GB increments up to 128 TB
Up to 15 read replicas with sub-10ms replication lag
Aurora Serverless v2: Scales capacity in fine-grained increments

Google Cloud Spanner:

Globally distributed, strongly consistent relational database
Horizontal scaling with no manual sharding
99.999% availability SLA (five nines)
Uses TrueTime API for global transaction ordering

Database Selection Guide

Need global strong consistency?
  └─ Yes → Spanner or Cosmos DB (strong mode)
  └─ No → Need SQL?
            └─ Yes → Need >1 region writes?
                      └─ Yes → Spanner, CockroachDB
                      └─ No → Aurora, Cloud SQL, Azure SQL
            └─ No → Need <10ms latency at scale?
                      └─ Yes → DynamoDB, Cosmos DB
                      └─ No → Firestore, MongoDB Atlas

Caching Services

Amazon ElastiCache

Redis mode:

In-memory data structures (strings, hashes, lists, sets, sorted sets)
Persistence options: RDB snapshots, AOF logging
Cluster mode for sharding across multiple nodes
Use cases: Session store, leaderboards, real-time analytics

Memcached mode:

Simple key-value caching with multi-threaded architecture
No persistence, no replication
Use cases: Simple caching, session management

Google Memorystore

Managed Redis and Memcached
Automatic failover and patching
VPC-native networking for low-latency access

Caching Patterns

┌─────────┐     Cache      ┌─────────┐
│  App    ├──── Hit ───────►│  Cache  │
│         │◄───────────────┤         │
│         │                └─────────┘
│         │     Cache
│         ├──── Miss ──────►┌─────────┐
│         │◄───────────────┤   DB    │
│         ├─── Update ────►│         │
│         │    Cache       └─────────┘
└─────────┘

Pattern	Description	Consistency
Cache-aside	App manages cache reads/writes	Eventually consistent
Read-through	Cache loads from DB on miss	Eventually consistent
Write-through	Write to cache and DB synchronously	Strong
Write-behind	Write to cache, async write to DB	Eventually consistent
TTL-based	Expire entries after time period	Time-bounded staleness

Data Transfer Costs

Cloud egress fees are a significant and often underestimated cost factor.

Transfer Type	Approximate Cost
Ingress (upload)	Free
Same AZ	Free
Cross-AZ	$0.01/GB
Cross-region	$0.02-0.09/GB
Internet egress	$0.05-0.12/GB

Minimizing Transfer Costs

Keep compute and storage in the same region/AZ
Use VPC endpoints for AWS service access without NAT
Compress data before transfer
Cache aggressively with CDN for repeated reads
Use dedicated interconnects for high-volume hybrid transfers

Key Takeaways

Object storage provides virtually unlimited, highly durable storage for unstructured data
Storage class tiering can reduce costs by 50-90% for infrequent access patterns
Block storage performance varies dramatically by type; match IOPS needs to volume type
DynamoDB and Cosmos DB offer single-digit ms latency; Spanner provides global consistency
Caching reduces database load and latency; choose the pattern based on consistency needs
Data egress costs require deliberate architecture to minimize cross-boundary transfers