2 min read
On this page

NoSQL Databases

Overview

NoSQL databases relax relational constraints (schema, joins, ACID) to achieve horizontal scalability, flexible schemas, and optimized access patterns. The four main categories are key-value, document, wide-column, and graph stores, each suited to different data models and query patterns.


Key-Value Stores

Redis

Redis is an in-memory key-value store supporting rich data structures beyond simple strings.

Core Data Structures

# Strings - counters, caching
SET user:1001:name "Alice"
INCR page:views:home              # atomic increment

# Hashes - objects
HSET user:1001 name "Alice" age 30 city "NYC"
HGET user:1001 name               # "Alice"

# Lists - queues, timelines
LPUSH queue:tasks "task-42"
RPOP queue:tasks                   # dequeue from right

# Sets - unique collections, intersections
SADD user:1001:tags "python" "databases"
SINTER user:1001:tags user:1002:tags  # common tags

# Sorted Sets - leaderboards, range queries
ZADD leaderboard 9500 "player:A" 8700 "player:B"
ZREVRANGE leaderboard 0 9 WITHSCORES  # top 10

# Streams - event log (append-only)
XADD events * type "click" page "/home"
XREAD COUNT 10 STREAMS events 0

Persistence

  • RDB (snapshotting): periodic point-in-time snapshots via fork + copy-on-write
  • AOF (append-only file): logs every write command; rewritten periodically to compact
  • Hybrid: RDB for fast recovery + AOF for durability between snapshots

Redis Cluster

Cluster uses 16384 hash slots:
  slot = CRC16(key) % 16384

Node A: slots 0-5460
Node B: slots 5461-10922
Node C: slots 10923-16383

Each node has replicas for failover.
Hash tags: {user}.profile and {user}.settings -> same slot

RocksDB

An embedded LSM-tree key-value engine (Facebook fork of LevelDB). Used as the storage layer for many databases (CockroachDB, TiKV, YugabyteDB).

// RocksDB API
rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
options.compression = rocksdb::kLZ4Compression;
options.write_buffer_size = 64 << 20;  // 64MB memtable

rocksdb::DB::Open(options, "/tmp/mydb", &db);
db->Put(WriteOptions(), "key1", "value1");

std::string value;
db->Get(ReadOptions(), "key1", &value);

// Column families for logical separation
db->CreateColumnFamily(ColumnFamilyOptions(), "metadata", &cf_handle);

Document Stores

MongoDB

MongoDB stores data as BSON (Binary JSON) documents with flexible schemas.

Data Model

// Document with nested structure
db.orders.insertOne({
  _id: ObjectId("..."),
  customer: { name: "Alice", email: "alice@example.com" },
  items: [
    { sku: "WIDGET-1", qty: 3, price: 9.99 },
    { sku: "GADGET-2", qty: 1, price: 24.99 }
  ],
  total: 54.96,
  status: "shipped",
  created_at: ISODate("2025-11-15T10:30:00Z")
});

Aggregation Pipeline

// Revenue by product category, last 30 days
db.orders.aggregate([
  { $match: { created_at: { $gte: ISODate("2025-10-15") } } },
  { $unwind: "$items" },
  { $lookup: {
      from: "products",
      localField: "items.sku",
      foreignField: "sku",
      as: "product"
  }},
  { $unwind: "$product" },
  { $group: {
      _id: "$product.category",
      revenue: { $sum: { $multiply: ["$items.qty", "$items.price"] } },
      order_count: { $sum: 1 }
  }},
  { $sort: { revenue: -1 } }
]);

Sharding

Shard Key Selection:
  - High cardinality (many distinct values)
  - Even distribution (avoid hotspots)
  - Query isolation (queries target single shard)

sh.shardCollection("mydb.orders", { customer_id: "hashed" })

Chunk: contiguous range of shard key values (~128MB default)
Balancer: moves chunks between shards to equalize load

Config servers (replica set): store metadata, chunk mappings
mongos: query router, merges results from shards

Wide-Column Stores

Apache Cassandra

Cassandra provides a partitioned wide-column model optimized for high write throughput across multiple data centers.

Data Model

CREATE KEYSPACE analytics WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3, 'dc2': 3
};

CREATE TABLE analytics.events (
  sensor_id    UUID,
  event_date   DATE,
  event_time   TIMESTAMP,
  temperature  DOUBLE,
  humidity     DOUBLE,
  PRIMARY KEY ((sensor_id, event_date), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC)
  AND compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_size': 1,
    'compaction_window_unit': 'DAYS'
  };

-- Partition key: (sensor_id, event_date) -> data locality
-- Clustering key: event_time -> sorted within partition

Partitioning and Replication

Token Ring (Murmur3 hash):
  token = hash(partition_key) -> determines owning node

  Node A: (-inf, -3074457345618258602]
  Node B: (-3074457345618258602, 3074457345618258602]
  Node C: (3074457345618258602, inf)

Consistency Levels:
  ONE    - fastest, eventual consistency
  QUORUM - majority of replicas (strong for RF=3)
  ALL    - all replicas must respond
  LOCAL_QUORUM - quorum within local datacenter

Compaction Strategies

| Strategy | Best For | Mechanism | |----------|----------|-----------| | STCS | Write-heavy | Merges similarly-sized SSTables | | LCS | Read-heavy | Leveled, bounded space amplification | | TWCS | Time-series | Groups SSTables by time window, drops expired |


Graph Databases

Neo4j

Neo4j implements the labeled property graph model with native graph storage and the Cypher query language.

Data Model

Nodes:       (p:Person {name: "Alice", age: 30})
Relationships: -[:FRIENDS_WITH {since: 2020}]->
Labels:      Person, Company, Project
Properties:  key-value pairs on nodes and relationships

Cypher Query Language

// Create graph data
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person {name: "Bob"})
CREATE (acme:Company {name: "Acme Corp"})
CREATE (alice)-[:WORKS_AT {since: 2020}]->(acme)
CREATE (bob)-[:WORKS_AT {since: 2019}]->(acme)
CREATE (alice)-[:FRIENDS_WITH]->(bob);

// Find friends of friends (2-hop traversal)
MATCH (p:Person {name: "Alice"})-[:FRIENDS_WITH*2]->(fof:Person)
WHERE fof <> p
RETURN DISTINCT fof.name;

// Shortest path
MATCH path = shortestPath(
  (a:Person {name: "Alice"})-[*..10]-(b:Person {name: "Dave"})
)
RETURN path, length(path);

// PageRank-style influence (GDS library)
CALL gds.pageRank.stream('social-graph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10;

Storage Internals

Neo4j uses index-free adjacency: each node physically stores pointers to its relationships, enabling O(1) relationship traversal regardless of total graph size.

Node Record (fixed 15 bytes):
  [in-use flag | first_rel_id | first_prop_id | labels | flags]

Relationship Record (fixed 34 bytes):
  [start_node | end_node | rel_type |
   next_rel_start | next_rel_end | first_prop_id]

Choosing a NoSQL Database

| Requirement | Recommended Type | Example | |-------------|-----------------|---------| | Session/cache store | Key-Value | Redis | | Flexible document queries | Document | MongoDB | | High write throughput, time-series | Wide-Column | Cassandra | | Relationship traversal | Graph | Neo4j | | Embedded storage engine | Key-Value | RocksDB |


CAP Theorem Positioning

         Consistency
            /\
           /  \
     CP   /    \  CA
         /      \
        /________\
  Availability -- Partition Tolerance

CP: MongoDB (primary reads), HBase
AP: Cassandra, DynamoDB, Riak
CA: Single-node RDBMS (no partitions)

In practice, all distributed systems must tolerate partitions, so the real trade-off is between consistency and availability during a partition event.