NoSQL Databases
Overview
NoSQL databases relax relational constraints (schema, joins, ACID) to achieve horizontal scalability, flexible schemas, and optimized access patterns. The four main categories are key-value, document, wide-column, and graph stores, each suited to different data models and query patterns.
Key-Value Stores
Redis
Redis is an in-memory key-value store supporting rich data structures beyond simple strings.
Core Data Structures
# Strings - counters, caching
SET user:1001:name "Alice"
INCR page:views:home # atomic increment
# Hashes - objects
HSET user:1001 name "Alice" age 30 city "NYC"
HGET user:1001 name # "Alice"
# Lists - queues, timelines
LPUSH queue:tasks "task-42"
RPOP queue:tasks # dequeue from right
# Sets - unique collections, intersections
SADD user:1001:tags "python" "databases"
SINTER user:1001:tags user:1002:tags # common tags
# Sorted Sets - leaderboards, range queries
ZADD leaderboard 9500 "player:A" 8700 "player:B"
ZREVRANGE leaderboard 0 9 WITHSCORES # top 10
# Streams - event log (append-only)
XADD events * type "click" page "/home"
XREAD COUNT 10 STREAMS events 0
Persistence
- RDB (snapshotting): periodic point-in-time snapshots via fork + copy-on-write
- AOF (append-only file): logs every write command; rewritten periodically to compact
- Hybrid: RDB for fast recovery + AOF for durability between snapshots
Redis Cluster
Cluster uses 16384 hash slots:
slot = CRC16(key) % 16384
Node A: slots 0-5460
Node B: slots 5461-10922
Node C: slots 10923-16383
Each node has replicas for failover.
Hash tags: {user}.profile and {user}.settings -> same slot
RocksDB
An embedded LSM-tree key-value engine (Facebook fork of LevelDB). Used as the storage layer for many databases (CockroachDB, TiKV, YugabyteDB).
// RocksDB API
rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
options.compression = rocksdb::kLZ4Compression;
options.write_buffer_size = 64 << 20; // 64MB memtable
rocksdb::DB::Open(options, "/tmp/mydb", &db);
db->Put(WriteOptions(), "key1", "value1");
std::string value;
db->Get(ReadOptions(), "key1", &value);
// Column families for logical separation
db->CreateColumnFamily(ColumnFamilyOptions(), "metadata", &cf_handle);
Document Stores
MongoDB
MongoDB stores data as BSON (Binary JSON) documents with flexible schemas.
Data Model
// Document with nested structure
db.orders.insertOne({
_id: ObjectId("..."),
customer: { name: "Alice", email: "alice@example.com" },
items: [
{ sku: "WIDGET-1", qty: 3, price: 9.99 },
{ sku: "GADGET-2", qty: 1, price: 24.99 }
],
total: 54.96,
status: "shipped",
created_at: ISODate("2025-11-15T10:30:00Z")
});
Aggregation Pipeline
// Revenue by product category, last 30 days
db.orders.aggregate([
{ $match: { created_at: { $gte: ISODate("2025-10-15") } } },
{ $unwind: "$items" },
{ $lookup: {
from: "products",
localField: "items.sku",
foreignField: "sku",
as: "product"
}},
{ $unwind: "$product" },
{ $group: {
_id: "$product.category",
revenue: { $sum: { $multiply: ["$items.qty", "$items.price"] } },
order_count: { $sum: 1 }
}},
{ $sort: { revenue: -1 } }
]);
Sharding
Shard Key Selection:
- High cardinality (many distinct values)
- Even distribution (avoid hotspots)
- Query isolation (queries target single shard)
sh.shardCollection("mydb.orders", { customer_id: "hashed" })
Chunk: contiguous range of shard key values (~128MB default)
Balancer: moves chunks between shards to equalize load
Config servers (replica set): store metadata, chunk mappings
mongos: query router, merges results from shards
Wide-Column Stores
Apache Cassandra
Cassandra provides a partitioned wide-column model optimized for high write throughput across multiple data centers.
Data Model
CREATE KEYSPACE analytics WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3, 'dc2': 3
};
CREATE TABLE analytics.events (
sensor_id UUID,
event_date DATE,
event_time TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY ((sensor_id, event_date), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC)
AND compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_size': 1,
'compaction_window_unit': 'DAYS'
};
-- Partition key: (sensor_id, event_date) -> data locality
-- Clustering key: event_time -> sorted within partition
Partitioning and Replication
Token Ring (Murmur3 hash):
token = hash(partition_key) -> determines owning node
Node A: (-inf, -3074457345618258602]
Node B: (-3074457345618258602, 3074457345618258602]
Node C: (3074457345618258602, inf)
Consistency Levels:
ONE - fastest, eventual consistency
QUORUM - majority of replicas (strong for RF=3)
ALL - all replicas must respond
LOCAL_QUORUM - quorum within local datacenter
Compaction Strategies
| Strategy | Best For | Mechanism | |----------|----------|-----------| | STCS | Write-heavy | Merges similarly-sized SSTables | | LCS | Read-heavy | Leveled, bounded space amplification | | TWCS | Time-series | Groups SSTables by time window, drops expired |
Graph Databases
Neo4j
Neo4j implements the labeled property graph model with native graph storage and the Cypher query language.
Data Model
Nodes: (p:Person {name: "Alice", age: 30})
Relationships: -[:FRIENDS_WITH {since: 2020}]->
Labels: Person, Company, Project
Properties: key-value pairs on nodes and relationships
Cypher Query Language
// Create graph data
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person {name: "Bob"})
CREATE (acme:Company {name: "Acme Corp"})
CREATE (alice)-[:WORKS_AT {since: 2020}]->(acme)
CREATE (bob)-[:WORKS_AT {since: 2019}]->(acme)
CREATE (alice)-[:FRIENDS_WITH]->(bob);
// Find friends of friends (2-hop traversal)
MATCH (p:Person {name: "Alice"})-[:FRIENDS_WITH*2]->(fof:Person)
WHERE fof <> p
RETURN DISTINCT fof.name;
// Shortest path
MATCH path = shortestPath(
(a:Person {name: "Alice"})-[*..10]-(b:Person {name: "Dave"})
)
RETURN path, length(path);
// PageRank-style influence (GDS library)
CALL gds.pageRank.stream('social-graph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10;
Storage Internals
Neo4j uses index-free adjacency: each node physically stores pointers to its relationships, enabling O(1) relationship traversal regardless of total graph size.
Node Record (fixed 15 bytes):
[in-use flag | first_rel_id | first_prop_id | labels | flags]
Relationship Record (fixed 34 bytes):
[start_node | end_node | rel_type |
next_rel_start | next_rel_end | first_prop_id]
Choosing a NoSQL Database
| Requirement | Recommended Type | Example | |-------------|-----------------|---------| | Session/cache store | Key-Value | Redis | | Flexible document queries | Document | MongoDB | | High write throughput, time-series | Wide-Column | Cassandra | | Relationship traversal | Graph | Neo4j | | Embedded storage engine | Key-Value | RocksDB |
CAP Theorem Positioning
Consistency
/\
/ \
CP / \ CA
/ \
/________\
Availability -- Partition Tolerance
CP: MongoDB (primary reads), HBase
AP: Cassandra, DynamoDB, Riak
CA: Single-node RDBMS (no partitions)
In practice, all distributed systems must tolerate partitions, so the real trade-off is between consistency and availability during a partition event.