5 min read
On this page

Cloud Storage Services

Object Storage

Object storage provides flat-namespace, key-value access to unstructured data at massive scale.

Amazon S3

s3://bucket-name/prefix/object-key
       │            │        │
       │            │        └── Unique identifier (acts like a path)
       │            └── Logical grouping (not a real directory)
       └── Globally unique container
  • Durability: 99.999999999% (11 nines) across multiple AZs
  • Consistency: Strong read-after-write consistency (since Dec 2020)
  • Max object size: 5 TB (multipart upload for objects > 100 MB)
  • Features: Versioning, replication, event notifications, object lock

Google Cloud Storage

  • Unified API across all storage classes
  • Turbo replication for cross-region copies under 15 minutes
  • Autoclass automatically transitions objects between storage classes
  • Strongly consistent for all operations

Azure Blob Storage

  • Tiers: Hot, Cool, Cold, Archive
  • Access tiers can be set at object level within a container
  • Immutable storage with WORM (Write Once Read Many) policies

Common Object Storage Patterns

# S3 operations with boto3
import boto3

s3 = boto3.client('s3')

# Upload with server-side encryption
s3.put_object(
    Bucket='my-bucket',
    Key='data/report.csv',
    Body=data,
    ServerSideEncryption='aws:kms',
    StorageClass='INTELLIGENT_TIERING'
)

# Generate presigned URL for temporary access
url = s3.generate_presigned_url(
    'get_object',
    Params={'Bucket': 'my-bucket', 'Key': 'data/report.csv'},
    ExpiresIn=3600
)

Block Storage

Block storage provides raw volumes that attach to compute instances, functioning like physical hard drives.

AWS EBS (Elastic Block Store)

Volume Type IOPS Throughput Use Case
gp3 (General) 3,000-16,000 125-1,000 MB/s Most workloads
io2 (Provisioned) Up to 256,000 4,000 MB/s Databases
st1 (Throughput) 500 baseline 500 MB/s Big data, logs
sc1 (Cold) 250 baseline 250 MB/s Infrequent access
  • Snapshots stored in S3 (incremental, point-in-time)
  • Multi-attach for io2 volumes (shared across instances)
  • Encrypted by default with AWS-managed or customer-managed keys

GCP Persistent Disks

  • pd-standard: HDD-backed, cost-effective
  • pd-balanced: SSD-backed, balanced performance
  • pd-ssd: SSD-backed, high IOPS
  • pd-extreme: Highest performance for demanding databases
  • Can attach to multiple instances in read-only mode

File Storage

Network file systems providing shared access with POSIX semantics.

AWS EFS (Elastic File System)

  • NFS v4.1 protocol, automatically scales to petabytes
  • Standard and One Zone storage classes
  • Throughput modes: Bursting, Provisioned, Elastic
  • Supports Lambda mount targets for serverless access

Other File Services

  • Google Filestore: Managed NFS for GKE and Compute Engine
  • Azure Files: SMB and NFS protocol support
  • AWS FSx: Managed Lustre, NetApp ONTAP, Windows File Server

Storage Classes and Lifecycle

S3 Storage Classes

Access Frequency vs Cost:

High   ┤ S3 Standard
       │   S3 Intelligent-Tiering
       │     S3 Standard-IA
       │       S3 One Zone-IA
       │         S3 Glacier Instant Retrieval
       │           S3 Glacier Flexible Retrieval
Low    ┤             S3 Glacier Deep Archive
       └──────────────────────────────────────
         Low ◄── Storage Cost ──► High savings
Class Min Duration Retrieval Time Use Case
Standard None Instant Active data
Intelligent-Tiering None Instant Unknown patterns
Standard-IA 30 days Instant Infrequent but fast
One Zone-IA 30 days Instant Reproducible data
Glacier Instant 90 days Milliseconds Archive, fast access
Glacier Flexible 90 days 1-12 hours Compliance archives
Deep Archive 180 days 12-48 hours Long-term retention

Lifecycle Policies

{
  "Rules": [{
    "ID": "ArchiveOldData",
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER" },
      { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
    ],
    "Expiration": { "Days": 2555 },
    "NoncurrentVersionExpiration": { "NoncurrentDays": 90 }
  }]
}

Cloud Databases

Key-Value and Document Stores

Amazon DynamoDB:

  • Single-digit millisecond latency at any scale
  • Partition key + optional sort key for data modeling
  • On-demand or provisioned capacity modes
  • DynamoDB Streams for change data capture
  • Global tables for multi-region active-active replication

Azure Cosmos DB:

  • Multi-model: document, key-value, graph, column-family
  • Five consistency levels (strong → eventual)
  • Turnkey global distribution with multi-region writes
  • Request Units (RU/s) as abstracted throughput currency

Relational Databases

Amazon Aurora:

  • MySQL/PostgreSQL compatible, up to 5x throughput improvement
  • Storage auto-scales in 10 GB increments up to 128 TB
  • Up to 15 read replicas with sub-10ms replication lag
  • Aurora Serverless v2: Scales capacity in fine-grained increments

Google Cloud Spanner:

  • Globally distributed, strongly consistent relational database
  • Horizontal scaling with no manual sharding
  • 99.999% availability SLA (five nines)
  • Uses TrueTime API for global transaction ordering

Database Selection Guide

Need global strong consistency?
  └─ Yes → Spanner or Cosmos DB (strong mode)
  └─ No → Need SQL?
            └─ Yes → Need >1 region writes?
                      └─ Yes → Spanner, CockroachDB
                      └─ No → Aurora, Cloud SQL, Azure SQL
            └─ No → Need <10ms latency at scale?
                      └─ Yes → DynamoDB, Cosmos DB
                      └─ No → Firestore, MongoDB Atlas

Caching Services

Amazon ElastiCache

Redis mode:

  • In-memory data structures (strings, hashes, lists, sets, sorted sets)
  • Persistence options: RDB snapshots, AOF logging
  • Cluster mode for sharding across multiple nodes
  • Use cases: Session store, leaderboards, real-time analytics

Memcached mode:

  • Simple key-value caching with multi-threaded architecture
  • No persistence, no replication
  • Use cases: Simple caching, session management

Google Memorystore

  • Managed Redis and Memcached
  • Automatic failover and patching
  • VPC-native networking for low-latency access

Caching Patterns

┌─────────┐     Cache      ┌─────────┐
│  App    ├──── Hit ───────►│  Cache  │
│         │◄───────────────┤         │
│         │                └─────────┘
│         │     Cache
│         ├──── Miss ──────►┌─────────┐
│         │◄───────────────┤   DB    │
│         ├─── Update ────►│         │
│         │    Cache       └─────────┘
└─────────┘
Pattern Description Consistency
Cache-aside App manages cache reads/writes Eventually consistent
Read-through Cache loads from DB on miss Eventually consistent
Write-through Write to cache and DB synchronously Strong
Write-behind Write to cache, async write to DB Eventually consistent
TTL-based Expire entries after time period Time-bounded staleness

Data Transfer Costs

Cloud egress fees are a significant and often underestimated cost factor.

Transfer Type Approximate Cost
Ingress (upload) Free
Same AZ Free
Cross-AZ $0.01/GB
Cross-region $0.02-0.09/GB
Internet egress $0.05-0.12/GB

Minimizing Transfer Costs

  1. Keep compute and storage in the same region/AZ
  2. Use VPC endpoints for AWS service access without NAT
  3. Compress data before transfer
  4. Cache aggressively with CDN for repeated reads
  5. Use dedicated interconnects for high-volume hybrid transfers

Key Takeaways

  • Object storage provides virtually unlimited, highly durable storage for unstructured data
  • Storage class tiering can reduce costs by 50-90% for infrequent access patterns
  • Block storage performance varies dramatically by type; match IOPS needs to volume type
  • DynamoDB and Cosmos DB offer single-digit ms latency; Spanner provides global consistency
  • Caching reduces database load and latency; choose the pattern based on consistency needs
  • Data egress costs require deliberate architecture to minimize cross-boundary transfers