Time-Series Databases
Overview
Time-series databases (TSDBs) are optimized for data indexed by time: metrics, IoT sensor readings, financial ticks, observability telemetry, and event streams. The defining workload characteristics are high-volume append-only writes, time-range queries, and aggregations over windows. TSDBs exploit temporal locality and monotonic timestamps for specialized storage, compression, and query optimization.
Time-Series Data Model
Core Concepts
A time series is identified by:
metric name + set of label/tag key-value pairs
Example:
cpu_usage{host="web-01", region="us-east", core="0"}
Data point: (timestamp, value)
(2025-11-15T10:00:00Z, 72.5)
(2025-11-15T10:00:10Z, 68.3)
(2025-11-15T10:00:20Z, 71.1)
Series cardinality = unique combinations of labels
High cardinality is the primary scaling challenge
100 hosts x 8 cores x 5 metrics = 4,000 series
With user_id label: millions of series (problematic)
Schema Approaches
-- Wide table (one column per metric)
CREATE TABLE metrics_wide (
timestamp TIMESTAMPTZ,
host TEXT,
cpu_usage DOUBLE PRECISION,
mem_usage DOUBLE PRECISION,
disk_io DOUBLE PRECISION
);
-- Narrow/long table (EAV-style, flexible)
CREATE TABLE metrics_narrow (
timestamp TIMESTAMPTZ,
metric_name TEXT,
tags JSONB,
value DOUBLE PRECISION
);
-- TimescaleDB hypertable (auto-partitioned wide table)
SELECT create_hypertable('metrics_wide', 'timestamp',
chunk_time_interval => INTERVAL '1 day');
Write Optimization
Time-series writes are append-only and arrive in roughly timestamp order, enabling significant optimizations.
Write Path
Incoming data points
|
v
Write buffer / WAL (durability)
|
v
In-memory buffer (sorted by series + time)
|
v (flush periodically or when full)
Compressed on-disk blocks (immutable, time-partitioned)
Key optimizations:
- Batch writes: buffer thousands of points before flush
- Columnar layout: group all timestamps together, all values together
- Time partitioning: one file/chunk per time window
- Out-of-order handling: small reorder buffer or separate WAL
Time-Based Partitioning
Partition by time window:
2025-11-15_00:00 to 2025-11-15_06:00 -> chunk_1
2025-11-15_06:00 to 2025-11-15_12:00 -> chunk_2
2025-11-15_12:00 to 2025-11-15_18:00 -> chunk_3
Benefits:
- Efficient time-range queries (skip irrelevant chunks)
- Easy retention (drop old chunks entirely)
- Each chunk independently compressed
- Recent chunks in memory, old chunks on cheaper storage
Compression Techniques
Gorilla Compression (Facebook, 2015)
Designed for floating-point time-series values. Exploits the fact that consecutive values in a series are often similar (XOR produces mostly zeros).
Timestamp compression (delta-of-delta):
Timestamps: [1000, 1010, 1020, 1030, 1021]
Deltas: [1000, 10, 10, 10, -9 ]
Delta-of-delta: [1000, 10, 0, 0, -19 ]
Encoding:
DoD = 0: store '0' (1 bit)
DoD in [-63,64]: store '10' + 7 bits
DoD in [-255,256]: store '110' + 9 bits
Larger: store '1110' + 12 bits or '1111' + 32 bits
Value compression (XOR-based):
Values: [72.5, 72.3, 72.4, 72.5]
XOR with previous:
72.5 XOR 72.3 -> few bits differ
Encoding:
XOR = 0: store '0' (1 bit, same value)
XOR != 0: store leading zeros count + meaningful bits
Result: ~1.37 bytes per data point (vs 16 bytes uncompressed)
Delta-of-Delta Encoding
Optimal for regularly-spaced timestamps where most delta-of-delta values are zero.
Regular 10s intervals:
Timestamps: [t, t+10, t+20, t+30, t+40]
Deltas: [-, 10, 10, 10, 10 ]
Delta-of-delta: [-, -, 0, 0, 0 ]
Storage: base + delta + N zero-bits
Integer Compression
For integer metrics (counters, gauges), additional techniques apply:
Simple-8b: pack multiple small integers into a 64-bit word
Selector (4 bits) indicates packing scheme:
selector=7: 8 values x 7 bits each + 4 bit selector = 60 bits
selector=1: 60 values x 1 bit each
ZigZag encoding: map signed integers to unsigned
0 -> 0, -1 -> 1, 1 -> 2, -2 -> 3, 2 -> 4, ...
Followed by varint encoding
Downsampling and Retention
Downsampling
Reduce resolution of historical data to save storage while preserving trends.
Raw data (10s interval): kept for 7 days
5-minute aggregates: kept for 30 days
1-hour aggregates: kept for 1 year
1-day aggregates: kept indefinitely
Aggregate functions preserved:
min, max, sum, count, avg (derived from sum/count)
Example (Prometheus recording rules):
groups:
- name: downsampling
interval: 5m
rules:
- record: http_requests:rate5m
expr: rate(http_requests_total[5m])
Retention Policies
-- InfluxDB retention policy
CREATE RETENTION POLICY "one_week" ON "metrics"
DURATION 7d REPLICATION 1 DEFAULT;
CREATE RETENTION POLICY "one_year" ON "metrics"
DURATION 365d REPLICATION 1;
-- Continuous query for downsampling
CREATE CONTINUOUS QUERY "downsample_5m" ON "metrics"
BEGIN
SELECT mean(value) AS value
INTO "one_year"."cpu_usage_5m"
FROM "one_week"."cpu_usage"
GROUP BY time(5m), *
END;
-- TimescaleDB: compression + retention
SELECT add_retention_policy('metrics', INTERVAL '30 days');
SELECT add_compression_policy('metrics', INTERVAL '7 days');
Production Systems
InfluxDB
- Storage engine (v3/Edge): Apache Arrow + Parquet-based columnar storage
- Query language: InfluxQL (SQL-like) and Flux (functional)
- Data model: Measurement, tag set, field set, timestamp
- Compaction: TSM (Time-Structured Merge Tree) in v1/v2
-- InfluxQL
SELECT MEAN("cpu_usage")
FROM "system_metrics"
WHERE "host" = 'web-01' AND time > now() - 1h
GROUP BY time(5m), "host"
FILL(previous)
TimescaleDB
PostgreSQL extension that automatically partitions tables into time-based chunks (hypertables). Full SQL compatibility with time-series optimizations.
-- Create hypertable
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time');
-- Time-series functions
SELECT time_bucket('15 minutes', time) AS bucket,
sensor_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp,
first(temperature, time) AS first_temp,
last(temperature, time) AS last_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY bucket, sensor_id
ORDER BY bucket DESC;
-- Continuous aggregates (materialized, incrementally updated)
CREATE MATERIALIZED VIEW hourly_temps
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS bucket,
sensor_id,
AVG(temperature), MIN(temperature), MAX(temperature)
FROM sensor_data
GROUP BY bucket, sensor_id;
Prometheus
Pull-based monitoring system with a custom TSDB optimized for metrics and alerting.
Storage Architecture:
Head Block (in-memory, last 2 hours):
- Write-Ahead Log for durability
- Compressed chunks (Gorilla encoding)
- Inverted index: label -> series IDs
Persistent Blocks (on disk, 2-hour aligned):
chunks/ - compressed time-series data
index - label-to-series inverted index
meta.json - block metadata
tombstones - deleted series markers
Compaction: merge small blocks into larger ones (up to 31 days)
# PromQL queries
# Rate of HTTP requests over 5 minutes
rate(http_requests_total{job="webserver"}[5m])
# 99th percentile latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Alert: high error rate
alert: HighErrorRate
expr: rate(http_errors_total[5m]) / rate(http_requests_total[5m]) > 0.05
for: 10m
QuestDB
High-performance TSDB written in Java/C++ with zero-GC execution, SIMD-accelerated queries, and SQL support. Designed for sub-millisecond ingestion latency.
-- QuestDB: optimized time-series SQL
SELECT timestamp, symbol,
avg(price) OVER (PARTITION BY symbol ORDER BY timestamp
RANGE BETWEEN 1 HOUR PRECEDING AND CURRENT ROW)
AS moving_avg
FROM trades
WHERE timestamp IN '2025-11-15'
SAMPLE BY 1m ALIGN TO CALENDAR;
-- LATEST ON: efficiently find most recent value per series
SELECT * FROM sensors
LATEST ON timestamp PARTITION BY sensor_id;
Comparison
| Feature | InfluxDB | TimescaleDB | Prometheus | QuestDB | |---------|----------|-------------|------------|---------| | Query language | InfluxQL/Flux/SQL | SQL | PromQL | SQL | | Storage | Columnar (Arrow) | PostgreSQL chunks | Custom TSDB | Column-based | | Best for | IoT, metrics | Relational + TS | Monitoring/alerting | Financial, high-freq | | High cardinality | Moderate | Good | Poor | Good | | SQL support | v3 only | Full | No | Full | | Ecosystem | Telegraf, Grafana | PostgreSQL extensions | Alertmanager, Grafana | Grafana |
Design Considerations
- Cardinality management: Avoid unbounded label values (user IDs, UUIDs) in tag/label sets
- Chunk sizing: Balance between query granularity and overhead (TimescaleDB: 25% of memory as chunk interval)
- Compression trade-off: Higher compression (ZSTD) saves storage but costs CPU during reads
- Out-of-order writes: Most TSDBs handle these via buffering, but excessive out-of-order data degrades performance
- Tiered storage: Hot (memory/SSD) for recent data, warm (HDD) for weeks-months, cold (object storage) for archival