Time-Series Databases

Overview

Time-series databases (TSDBs) are optimized for data indexed by time: metrics, IoT sensor readings, financial ticks, observability telemetry, and event streams. The defining workload characteristics are high-volume append-only writes, time-range queries, and aggregations over windows. TSDBs exploit temporal locality and monotonic timestamps for specialized storage, compression, and query optimization.

Time-Series Data Model

Core Concepts

A time series is identified by:
  metric name + set of label/tag key-value pairs

Example:
  cpu_usage{host="web-01", region="us-east", core="0"}

Data point: (timestamp, value)
  (2025-11-15T10:00:00Z, 72.5)
  (2025-11-15T10:00:10Z, 68.3)
  (2025-11-15T10:00:20Z, 71.1)

Series cardinality = unique combinations of labels
  High cardinality is the primary scaling challenge
  100 hosts x 8 cores x 5 metrics = 4,000 series
  With user_id label: millions of series (problematic)

Schema Approaches

-- Wide table (one column per metric)
CREATE TABLE metrics_wide (
    timestamp   TIMESTAMPTZ,
    host        TEXT,
    cpu_usage   DOUBLE PRECISION,
    mem_usage   DOUBLE PRECISION,
    disk_io     DOUBLE PRECISION
);

-- Narrow/long table (EAV-style, flexible)
CREATE TABLE metrics_narrow (
    timestamp    TIMESTAMPTZ,
    metric_name  TEXT,
    tags         JSONB,
    value        DOUBLE PRECISION
);

-- TimescaleDB hypertable (auto-partitioned wide table)
SELECT create_hypertable('metrics_wide', 'timestamp',
                          chunk_time_interval => INTERVAL '1 day');

Write Optimization

Time-series writes are append-only and arrive in roughly timestamp order, enabling significant optimizations.

Write Path

Incoming data points
    |
    v
Write buffer / WAL (durability)
    |
    v
In-memory buffer (sorted by series + time)
    |
    v  (flush periodically or when full)
Compressed on-disk blocks (immutable, time-partitioned)

Key optimizations:
  - Batch writes: buffer thousands of points before flush
  - Columnar layout: group all timestamps together, all values together
  - Time partitioning: one file/chunk per time window
  - Out-of-order handling: small reorder buffer or separate WAL

Time-Based Partitioning

Partition by time window:
  2025-11-15_00:00 to 2025-11-15_06:00  -> chunk_1
  2025-11-15_06:00 to 2025-11-15_12:00  -> chunk_2
  2025-11-15_12:00 to 2025-11-15_18:00  -> chunk_3

Benefits:
  - Efficient time-range queries (skip irrelevant chunks)
  - Easy retention (drop old chunks entirely)
  - Each chunk independently compressed
  - Recent chunks in memory, old chunks on cheaper storage

Compression Techniques

Gorilla Compression (Facebook, 2015)

Designed for floating-point time-series values. Exploits the fact that consecutive values in a series are often similar (XOR produces mostly zeros).

Timestamp compression (delta-of-delta):
  Timestamps:     [1000, 1010, 1020, 1030, 1021]
  Deltas:         [1000, 10,   10,   10,   -9  ]
  Delta-of-delta: [1000, 10,   0,    0,    -19 ]

  Encoding:
    DoD = 0:         store '0' (1 bit)
    DoD in [-63,64]: store '10' + 7 bits
    DoD in [-255,256]: store '110' + 9 bits
    Larger:          store '1110' + 12 bits or '1111' + 32 bits

Value compression (XOR-based):
  Values:  [72.5, 72.3, 72.4, 72.5]
  XOR with previous:
    72.5 XOR 72.3 -> few bits differ
  Encoding:
    XOR = 0: store '0' (1 bit, same value)
    XOR != 0: store leading zeros count + meaningful bits

Result: ~1.37 bytes per data point (vs 16 bytes uncompressed)

Delta-of-Delta Encoding

Optimal for regularly-spaced timestamps where most delta-of-delta values are zero.

Regular 10s intervals:
  Timestamps:     [t, t+10, t+20, t+30, t+40]
  Deltas:         [-, 10,   10,   10,   10  ]
  Delta-of-delta: [-, -,    0,    0,    0   ]
  Storage:        base + delta + N zero-bits

Integer Compression

For integer metrics (counters, gauges), additional techniques apply:

Simple-8b: pack multiple small integers into a 64-bit word
  Selector (4 bits) indicates packing scheme:
    selector=7: 8 values x 7 bits each + 4 bit selector = 60 bits
    selector=1: 60 values x 1 bit each

ZigZag encoding: map signed integers to unsigned
  0 -> 0, -1 -> 1, 1 -> 2, -2 -> 3, 2 -> 4, ...
  Followed by varint encoding

Downsampling and Retention

Downsampling

Reduce resolution of historical data to save storage while preserving trends.

Raw data (10s interval):     kept for 7 days
5-minute aggregates:         kept for 30 days
1-hour aggregates:           kept for 1 year
1-day aggregates:            kept indefinitely

Aggregate functions preserved:
  min, max, sum, count, avg (derived from sum/count)

Example (Prometheus recording rules):
  groups:
    - name: downsampling
      interval: 5m
      rules:
        - record: http_requests:rate5m
          expr: rate(http_requests_total[5m])

Retention Policies

-- InfluxDB retention policy
CREATE RETENTION POLICY "one_week" ON "metrics"
  DURATION 7d REPLICATION 1 DEFAULT;

CREATE RETENTION POLICY "one_year" ON "metrics"
  DURATION 365d REPLICATION 1;

-- Continuous query for downsampling
CREATE CONTINUOUS QUERY "downsample_5m" ON "metrics"
BEGIN
  SELECT mean(value) AS value
  INTO "one_year"."cpu_usage_5m"
  FROM "one_week"."cpu_usage"
  GROUP BY time(5m), *
END;

-- TimescaleDB: compression + retention
SELECT add_retention_policy('metrics', INTERVAL '30 days');
SELECT add_compression_policy('metrics', INTERVAL '7 days');

Production Systems

InfluxDB

Storage engine (v3/Edge): Apache Arrow + Parquet-based columnar storage
Query language: InfluxQL (SQL-like) and Flux (functional)
Data model: Measurement, tag set, field set, timestamp
Compaction: TSM (Time-Structured Merge Tree) in v1/v2

-- InfluxQL
SELECT MEAN("cpu_usage")
FROM "system_metrics"
WHERE "host" = 'web-01' AND time > now() - 1h
GROUP BY time(5m), "host"
FILL(previous)

TimescaleDB

PostgreSQL extension that automatically partitions tables into time-based chunks (hypertables). Full SQL compatibility with time-series optimizations.

-- Create hypertable
CREATE TABLE sensor_data (
    time        TIMESTAMPTZ NOT NULL,
    sensor_id   INTEGER,
    temperature DOUBLE PRECISION,
    humidity    DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time');

-- Time-series functions
SELECT time_bucket('15 minutes', time) AS bucket,
       sensor_id,
       AVG(temperature) AS avg_temp,
       MAX(temperature) AS max_temp,
       first(temperature, time) AS first_temp,
       last(temperature, time) AS last_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY bucket, sensor_id
ORDER BY bucket DESC;

-- Continuous aggregates (materialized, incrementally updated)
CREATE MATERIALIZED VIEW hourly_temps
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS bucket,
       sensor_id,
       AVG(temperature), MIN(temperature), MAX(temperature)
FROM sensor_data
GROUP BY bucket, sensor_id;

Prometheus

Pull-based monitoring system with a custom TSDB optimized for metrics and alerting.

Storage Architecture:
  Head Block (in-memory, last 2 hours):
    - Write-Ahead Log for durability
    - Compressed chunks (Gorilla encoding)
    - Inverted index: label -> series IDs

  Persistent Blocks (on disk, 2-hour aligned):
    chunks/     - compressed time-series data
    index       - label-to-series inverted index
    meta.json   - block metadata
    tombstones  - deleted series markers

  Compaction: merge small blocks into larger ones (up to 31 days)

# PromQL queries
# Rate of HTTP requests over 5 minutes
rate(http_requests_total{job="webserver"}[5m])

# 99th percentile latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# Alert: high error rate
alert: HighErrorRate
expr: rate(http_errors_total[5m]) / rate(http_requests_total[5m]) > 0.05
for: 10m

QuestDB

High-performance TSDB written in Java/C++ with zero-GC execution, SIMD-accelerated queries, and SQL support. Designed for sub-millisecond ingestion latency.

-- QuestDB: optimized time-series SQL
SELECT timestamp, symbol,
       avg(price) OVER (PARTITION BY symbol ORDER BY timestamp
                        RANGE BETWEEN 1 HOUR PRECEDING AND CURRENT ROW)
         AS moving_avg
FROM trades
WHERE timestamp IN '2025-11-15'
SAMPLE BY 1m ALIGN TO CALENDAR;

-- LATEST ON: efficiently find most recent value per series
SELECT * FROM sensors
LATEST ON timestamp PARTITION BY sensor_id;

Comparison

Feature	InfluxDB	TimescaleDB	Prometheus	QuestDB
Query language	InfluxQL/Flux/SQL	SQL	PromQL	SQL
Storage	Columnar (Arrow)	PostgreSQL chunks	Custom TSDB	Column-based
Best for	IoT, metrics	Relational + TS	Monitoring/alerting	Financial, high-freq
High cardinality	Moderate	Good	Poor	Good
SQL support	v3 only	Full	No	Full
Ecosystem	Telegraf, Grafana	PostgreSQL extensions	Alertmanager, Grafana	Grafana

Design Considerations

Cardinality management: Avoid unbounded label values (user IDs, UUIDs) in tag/label sets
Chunk sizing: Balance between query granularity and overhead (TimescaleDB: 25% of memory as chunk interval)
Compression trade-off: Higher compression (ZSTD) saves storage but costs CPU during reads
Out-of-order writes: Most TSDBs handle these via buffering, but excessive out-of-order data degrades performance
Tiered storage: Hot (memory/SSD) for recent data, warm (HDD) for weeks-months, cold (object storage) for archival