NoSQL Databases

Overview

NoSQL databases relax relational constraints (schema, joins, ACID) to achieve horizontal scalability, flexible schemas, and optimized access patterns. The four main categories are key-value, document, wide-column, and graph stores, each suited to different data models and query patterns.

Key-Value Stores

Redis

Redis is an in-memory key-value store supporting rich data structures beyond simple strings.

Core Data Structures

# Strings - counters, caching
SET user:1001:name "Alice"
INCR page:views:home              # atomic increment

# Hashes - objects
HSET user:1001 name "Alice" age 30 city "NYC"
HGET user:1001 name               # "Alice"

# Lists - queues, timelines
LPUSH queue:tasks "task-42"
RPOP queue:tasks                   # dequeue from right

# Sets - unique collections, intersections
SADD user:1001:tags "python" "databases"
SINTER user:1001:tags user:1002:tags  # common tags

# Sorted Sets - leaderboards, range queries
ZADD leaderboard 9500 "player:A" 8700 "player:B"
ZREVRANGE leaderboard 0 9 WITHSCORES  # top 10

# Streams - event log (append-only)
XADD events * type "click" page "/home"
XREAD COUNT 10 STREAMS events 0

Persistence

RDB (snapshotting): periodic point-in-time snapshots via fork + copy-on-write
AOF (append-only file): logs every write command; rewritten periodically to compact
Hybrid: RDB for fast recovery + AOF for durability between snapshots

Redis Cluster

Cluster uses 16384 hash slots:
  slot = CRC16(key) % 16384

Node A: slots 0-5460
Node B: slots 5461-10922
Node C: slots 10923-16383

Each node has replicas for failover.
Hash tags: {user}.profile and {user}.settings -> same slot

RocksDB

An embedded LSM-tree key-value engine (Facebook fork of LevelDB). Used as the storage layer for many databases (CockroachDB, TiKV, YugabyteDB).

// RocksDB API
rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
options.compression = rocksdb::kLZ4Compression;
options.write_buffer_size = 64 << 20;  // 64MB memtable

rocksdb::DB::Open(options, "/tmp/mydb", &db);
db->Put(WriteOptions(), "key1", "value1");

std::string value;
db->Get(ReadOptions(), "key1", &value);

// Column families for logical separation
db->CreateColumnFamily(ColumnFamilyOptions(), "metadata", &cf_handle);

Document Stores

MongoDB

MongoDB stores data as BSON (Binary JSON) documents with flexible schemas.

Data Model

// Document with nested structure
db.orders.insertOne({
  _id: ObjectId("..."),
  customer: { name: "Alice", email: "alice@example.com" },
  items: [
    { sku: "WIDGET-1", qty: 3, price: 9.99 },
    { sku: "GADGET-2", qty: 1, price: 24.99 }
  ],
  total: 54.96,
  status: "shipped",
  created_at: ISODate("2025-11-15T10:30:00Z")
});

Aggregation Pipeline

// Revenue by product category, last 30 days
db.orders.aggregate([
  { $match: { created_at: { $gte: ISODate("2025-10-15") } } },
  { $unwind: "$items" },
  { $lookup: {
      from: "products",
      localField: "items.sku",
      foreignField: "sku",
      as: "product"
  }},
  { $unwind: "$product" },
  { $group: {
      _id: "$product.category",
      revenue: { $sum: { $multiply: ["$items.qty", "$items.price"] } },
      order_count: { $sum: 1 }
  }},
  { $sort: { revenue: -1 } }
]);

Sharding

Shard Key Selection:
  - High cardinality (many distinct values)
  - Even distribution (avoid hotspots)
  - Query isolation (queries target single shard)

sh.shardCollection("mydb.orders", { customer_id: "hashed" })

Chunk: contiguous range of shard key values (~128MB default)
Balancer: moves chunks between shards to equalize load

Config servers (replica set): store metadata, chunk mappings
mongos: query router, merges results from shards

Wide-Column Stores

Apache Cassandra

Cassandra provides a partitioned wide-column model optimized for high write throughput across multiple data centers.

Data Model

CREATE KEYSPACE analytics WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3, 'dc2': 3
};

CREATE TABLE analytics.events (
  sensor_id    UUID,
  event_date   DATE,
  event_time   TIMESTAMP,
  temperature  DOUBLE,
  humidity     DOUBLE,
  PRIMARY KEY ((sensor_id, event_date), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC)
  AND compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_size': 1,
    'compaction_window_unit': 'DAYS'
  };

-- Partition key: (sensor_id, event_date) -> data locality
-- Clustering key: event_time -> sorted within partition

Partitioning and Replication

Token Ring (Murmur3 hash):
  token = hash(partition_key) -> determines owning node

  Node A: (-inf, -3074457345618258602]
  Node B: (-3074457345618258602, 3074457345618258602]
  Node C: (3074457345618258602, inf)

Consistency Levels:
  ONE    - fastest, eventual consistency
  QUORUM - majority of replicas (strong for RF=3)
  ALL    - all replicas must respond
  LOCAL_QUORUM - quorum within local datacenter

Compaction Strategies

Strategy	Best For	Mechanism
STCS	Write-heavy	Merges similarly-sized SSTables
LCS	Read-heavy	Leveled, bounded space amplification
TWCS	Time-series	Groups SSTables by time window, drops expired

Graph Databases

Neo4j

Neo4j implements the labeled property graph model with native graph storage and the Cypher query language.

Data Model

Nodes:       (p:Person {name: "Alice", age: 30})
Relationships: -[:FRIENDS_WITH {since: 2020}]->
Labels:      Person, Company, Project
Properties:  key-value pairs on nodes and relationships

Cypher Query Language

// Create graph data
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person {name: "Bob"})
CREATE (acme:Company {name: "Acme Corp"})
CREATE (alice)-[:WORKS_AT {since: 2020}]->(acme)
CREATE (bob)-[:WORKS_AT {since: 2019}]->(acme)
CREATE (alice)-[:FRIENDS_WITH]->(bob);

// Find friends of friends (2-hop traversal)
MATCH (p:Person {name: "Alice"})-[:FRIENDS_WITH*2]->(fof:Person)
WHERE fof <> p
RETURN DISTINCT fof.name;

// Shortest path
MATCH path = shortestPath(
  (a:Person {name: "Alice"})-[*..10]-(b:Person {name: "Dave"})
)
RETURN path, length(path);

// PageRank-style influence (GDS library)
CALL gds.pageRank.stream('social-graph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10;

Storage Internals

Neo4j uses index-free adjacency: each node physically stores pointers to its relationships, enabling O(1) relationship traversal regardless of total graph size.

Node Record (fixed 15 bytes):
  [in-use flag | first_rel_id | first_prop_id | labels | flags]

Relationship Record (fixed 34 bytes):
  [start_node | end_node | rel_type |
   next_rel_start | next_rel_end | first_prop_id]

Choosing a NoSQL Database

Requirement	Recommended Type	Example
Session/cache store	Key-Value	Redis
Flexible document queries	Document	MongoDB
High write throughput, time-series	Wide-Column	Cassandra
Relationship traversal	Graph	Neo4j
Embedded storage engine	Key-Value	RocksDB

CAP Theorem Positioning

         Consistency
            /\
           /  \
     CP   /    \  CA
         /      \
        /________\
  Availability -- Partition Tolerance

CP: MongoDB (primary reads), HBase
AP: Cassandra, DynamoDB, Riak
CA: Single-node RDBMS (no partitions)

In practice, all distributed systems must tolerate partitions, so the real trade-off is between consistency and availability during a partition event.