5 min read
On this page

Cloud Storage Services

Object Storage

Object storage provides flat-namespace, key-value access to unstructured data at massive scale.

Amazon S3

s3://bucket-name/prefix/object-key
       │            │        │
       │            │        └── Unique identifier (acts like a path)
       │            └── Logical grouping (not a real directory)
       └── Globally unique container
  • Durability: 99.999999999% (11 nines) across multiple AZs
  • Consistency: Strong read-after-write consistency (since Dec 2020)
  • Max object size: 5 TB (multipart upload for objects > 100 MB)
  • Features: Versioning, replication, event notifications, object lock

Google Cloud Storage

  • Unified API across all storage classes
  • Turbo replication for cross-region copies under 15 minutes
  • Autoclass automatically transitions objects between storage classes
  • Strongly consistent for all operations

Azure Blob Storage

  • Tiers: Hot, Cool, Cold, Archive
  • Access tiers can be set at object level within a container
  • Immutable storage with WORM (Write Once Read Many) policies

Common Object Storage Patterns

# S3 operations with boto3
import boto3

s3 = boto3.client('s3')

# Upload with server-side encryption
s3.put_object(
    Bucket='my-bucket',
    Key='data/report.csv',
    Body=data,
    ServerSideEncryption='aws:kms',
    StorageClass='INTELLIGENT_TIERING'
)

# Generate presigned URL for temporary access
url = s3.generate_presigned_url(
    'get_object',
    Params={'Bucket': 'my-bucket', 'Key': 'data/report.csv'},
    ExpiresIn=3600
)

Block Storage

Block storage provides raw volumes that attach to compute instances, functioning like physical hard drives.

AWS EBS (Elastic Block Store)

| Volume Type | IOPS | Throughput | Use Case | |-------------|------|------------|----------| | gp3 (General) | 3,000-16,000 | 125-1,000 MB/s | Most workloads | | io2 (Provisioned) | Up to 256,000 | 4,000 MB/s | Databases | | st1 (Throughput) | 500 baseline | 500 MB/s | Big data, logs | | sc1 (Cold) | 250 baseline | 250 MB/s | Infrequent access |

  • Snapshots stored in S3 (incremental, point-in-time)
  • Multi-attach for io2 volumes (shared across instances)
  • Encrypted by default with AWS-managed or customer-managed keys

GCP Persistent Disks

  • pd-standard: HDD-backed, cost-effective
  • pd-balanced: SSD-backed, balanced performance
  • pd-ssd: SSD-backed, high IOPS
  • pd-extreme: Highest performance for demanding databases
  • Can attach to multiple instances in read-only mode

File Storage

Network file systems providing shared access with POSIX semantics.

AWS EFS (Elastic File System)

  • NFS v4.1 protocol, automatically scales to petabytes
  • Standard and One Zone storage classes
  • Throughput modes: Bursting, Provisioned, Elastic
  • Supports Lambda mount targets for serverless access

Other File Services

  • Google Filestore: Managed NFS for GKE and Compute Engine
  • Azure Files: SMB and NFS protocol support
  • AWS FSx: Managed Lustre, NetApp ONTAP, Windows File Server

Storage Classes and Lifecycle

S3 Storage Classes

Access Frequency vs Cost:

High   ┤ S3 Standard
       │   S3 Intelligent-Tiering
       │     S3 Standard-IA
       │       S3 One Zone-IA
       │         S3 Glacier Instant Retrieval
       │           S3 Glacier Flexible Retrieval
Low    ┤             S3 Glacier Deep Archive
       └──────────────────────────────────────
         Low ◄── Storage Cost ──► High savings

| Class | Min Duration | Retrieval Time | Use Case | |-------|-------------|----------------|----------| | Standard | None | Instant | Active data | | Intelligent-Tiering | None | Instant | Unknown patterns | | Standard-IA | 30 days | Instant | Infrequent but fast | | One Zone-IA | 30 days | Instant | Reproducible data | | Glacier Instant | 90 days | Milliseconds | Archive, fast access | | Glacier Flexible | 90 days | 1-12 hours | Compliance archives | | Deep Archive | 180 days | 12-48 hours | Long-term retention |

Lifecycle Policies

{
  "Rules": [{
    "ID": "ArchiveOldData",
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER" },
      { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
    ],
    "Expiration": { "Days": 2555 },
    "NoncurrentVersionExpiration": { "NoncurrentDays": 90 }
  }]
}

Cloud Databases

Key-Value and Document Stores

Amazon DynamoDB:

  • Single-digit millisecond latency at any scale
  • Partition key + optional sort key for data modeling
  • On-demand or provisioned capacity modes
  • DynamoDB Streams for change data capture
  • Global tables for multi-region active-active replication

Azure Cosmos DB:

  • Multi-model: document, key-value, graph, column-family
  • Five consistency levels (strong → eventual)
  • Turnkey global distribution with multi-region writes
  • Request Units (RU/s) as abstracted throughput currency

Relational Databases

Amazon Aurora:

  • MySQL/PostgreSQL compatible, up to 5x throughput improvement
  • Storage auto-scales in 10 GB increments up to 128 TB
  • Up to 15 read replicas with sub-10ms replication lag
  • Aurora Serverless v2: Scales capacity in fine-grained increments

Google Cloud Spanner:

  • Globally distributed, strongly consistent relational database
  • Horizontal scaling with no manual sharding
  • 99.999% availability SLA (five nines)
  • Uses TrueTime API for global transaction ordering

Database Selection Guide

Need global strong consistency?
  └─ Yes → Spanner or Cosmos DB (strong mode)
  └─ No → Need SQL?
            └─ Yes → Need >1 region writes?
                      └─ Yes → Spanner, CockroachDB
                      └─ No → Aurora, Cloud SQL, Azure SQL
            └─ No → Need <10ms latency at scale?
                      └─ Yes → DynamoDB, Cosmos DB
                      └─ No → Firestore, MongoDB Atlas

Caching Services

Amazon ElastiCache

Redis mode:

  • In-memory data structures (strings, hashes, lists, sets, sorted sets)
  • Persistence options: RDB snapshots, AOF logging
  • Cluster mode for sharding across multiple nodes
  • Use cases: Session store, leaderboards, real-time analytics

Memcached mode:

  • Simple key-value caching with multi-threaded architecture
  • No persistence, no replication
  • Use cases: Simple caching, session management

Google Memorystore

  • Managed Redis and Memcached
  • Automatic failover and patching
  • VPC-native networking for low-latency access

Caching Patterns

┌─────────┐     Cache      ┌─────────┐
│  App    ├──── Hit ───────►│  Cache  │
│         │◄───────────────┤         │
│         │                └─────────┘
│         │     Cache
│         ├──── Miss ──────►┌─────────┐
│         │◄───────────────┤   DB    │
│         ├─── Update ────►│         │
│         │    Cache       └─────────┘
└─────────┘

| Pattern | Description | Consistency | |---------|-------------|-------------| | Cache-aside | App manages cache reads/writes | Eventually consistent | | Read-through | Cache loads from DB on miss | Eventually consistent | | Write-through | Write to cache and DB synchronously | Strong | | Write-behind | Write to cache, async write to DB | Eventually consistent | | TTL-based | Expire entries after time period | Time-bounded staleness |

Data Transfer Costs

Cloud egress fees are a significant and often underestimated cost factor.

| Transfer Type | Approximate Cost | |--------------|-----------------| | Ingress (upload) | Free | | Same AZ | Free | | Cross-AZ | 0.01/GBCrossregion0.01/GB | | Cross-region | 0.02-0.09/GB | | Internet egress | $0.05-0.12/GB |

Minimizing Transfer Costs

  1. Keep compute and storage in the same region/AZ
  2. Use VPC endpoints for AWS service access without NAT
  3. Compress data before transfer
  4. Cache aggressively with CDN for repeated reads
  5. Use dedicated interconnects for high-volume hybrid transfers

Key Takeaways

  • Object storage provides virtually unlimited, highly durable storage for unstructured data
  • Storage class tiering can reduce costs by 50-90% for infrequent access patterns
  • Block storage performance varies dramatically by type; match IOPS needs to volume type
  • DynamoDB and Cosmos DB offer single-digit ms latency; Spanner provides global consistency
  • Caching reduces database load and latency; choose the pattern based on consistency needs
  • Data egress costs require deliberate architecture to minimize cross-boundary transfers