4 min read
On this page

Analytical Databases

Overview

Analytical databases are optimized for complex queries over large datasets, prioritizing read throughput and aggregation performance over single-row transactional latency. The ecosystem spans traditional data warehouses, modern columnar engines, MPP cloud services, and the emerging lakehouse architecture.


OLAP vs OLTP

OLAP vs OLTP Architecture Comparison

| Characteristic | OLTP | OLAP | |---------------|------|------| | Workload | Short transactions, CRUD | Complex aggregations, scans | | Query pattern | Point lookups, small range | Full table scans, joins | | Data volume per query | Rows to thousands | Millions to billions | | Users | Application servers | Analysts, dashboards | | Freshness | Real-time | Minutes to hours (or real-time) | | Schema | Normalized (3NF) | Denormalized (star/snowflake) | | Optimization | Index lookup latency | Scan throughput, compression |


Dimensional Modeling

Star Schema

A central fact table references dimension tables via foreign keys. Simple, fast for queries with few joins.

-- Fact table: one row per transaction event
CREATE TABLE fact_sales (
    sale_id       BIGINT,
    date_key      INT REFERENCES dim_date(date_key),
    product_key   INT REFERENCES dim_product(product_key),
    store_key     INT REFERENCES dim_store(store_key),
    customer_key  INT REFERENCES dim_customer(customer_key),
    quantity      INT,
    unit_price    DECIMAL(10,2),
    total_amount  DECIMAL(12,2)
);

-- Dimension table
CREATE TABLE dim_product (
    product_key   INT PRIMARY KEY,
    product_name  VARCHAR(200),
    category      VARCHAR(100),
    subcategory   VARCHAR(100),
    brand         VARCHAR(100)
);

Snowflake Schema

Dimensions are further normalized into sub-dimensions (e.g., dim_product -> dim_category -> dim_department). Saves storage but adds joins.

Star:       fact_sales -> dim_product (category column inside)
Snowflake:  fact_sales -> dim_product -> dim_category -> dim_department

ETL / ELT

ETL (Extract, Transform, Load)

Data is extracted from sources, transformed in a staging area (cleaning, joining, aggregating), and loaded into the warehouse.

Sources (OLTP DBs, APIs, files)
    |  Extract
    v
Staging Area (transform: clean, deduplicate, conform)
    |  Load
    v
Data Warehouse (star schema)

ELT (Extract, Load, Transform)

Raw data is loaded into the warehouse (or data lake) first, then transformed using the warehouse's compute engine. Preferred in cloud-native architectures.

Sources -> Load (raw) -> Data Lake/Warehouse -> Transform (dbt, SQL)
                                                    |
                                                    v
                                              Curated Tables

dbt (data build tool) has become the standard for ELT transformations, managing SQL-based transformations as version-controlled code.


Materialized Views

Pre-computed query results that accelerate repeated analytical queries at the cost of storage and maintenance.

-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
    d.year, d.month, p.category,
    SUM(f.total_amount) AS revenue,
    COUNT(*) AS num_sales
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY d.year, d.month, p.category;

-- Refresh strategies
REFRESH MATERIALIZED VIEW monthly_revenue;                -- full refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;   -- non-blocking

-- ClickHouse: incrementally maintained with aggregating merge tree
CREATE MATERIALIZED VIEW mv_revenue
ENGINE = AggregatingMergeTree()
ORDER BY (year, month, category)
AS SELECT
    toYear(event_date) AS year,
    toMonth(event_date) AS month,
    category,
    sumState(amount) AS revenue  -- partial aggregate state
FROM sales
GROUP BY year, month, category;

Columnar Analytical Engines

ClickHouse

Open-source columnar OLAP database with exceptional single-node scan performance.

-- MergeTree: primary engine family
CREATE TABLE events (
    event_date  Date,
    user_id     UInt64,
    event_type  LowCardinality(String),  -- dictionary encoded
    duration_ms UInt32
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, user_id, event_date)
SETTINGS index_granularity = 8192;

-- Sparse primary index: one entry per 8192 rows
-- Enables skipping entire granules during scans
-- Compression: LZ4 by default, ZSTD for cold data

DuckDB

In-process analytical database (the "SQLite for analytics"). Vectorized execution, columnar storage, zero-copy integration with Pandas/Arrow.

import duckdb

# Query Parquet files directly without loading
result = duckdb.sql("""
    SELECT category, SUM(amount) AS total
    FROM 'sales/*.parquet'
    WHERE sale_date >= '2025-01-01'
    GROUP BY category
    ORDER BY total DESC
""").fetchdf()  # returns Pandas DataFrame

# Zero-copy from Pandas
import pandas as pd
df = pd.read_csv("large_file.csv")
duckdb.sql("SELECT AVG(price) FROM df WHERE qty > 10")

Apache Druid

Real-time analytical database designed for sub-second OLAP queries on event data. Uses a combination of columnar storage, inverted indexes, and pre-aggregation.


MPP (Massively Parallel Processing) Systems

Amazon Redshift

  • Columnar storage on shared-nothing cluster of compute nodes
  • Distribution styles: KEY, EVEN, ALL (broadcast small tables)
  • Zone maps: min/max metadata per block for scan skipping
  • Redshift Spectrum: query S3 data without loading
  • RA3 nodes: managed storage decoupled from compute

Google BigQuery

  • Serverless: no cluster management, auto-scaling
  • Dremel execution engine: tree-structured dispatch
  • Capacitor columnar format on Colossus distributed filesystem
  • Slot-based pricing: query cost proportional to data scanned
  • BI Engine: in-memory acceleration for dashboards
-- BigQuery: query petabytes with standard SQL
SELECT
    EXTRACT(MONTH FROM trip_start) AS month,
    AVG(trip_distance) AS avg_distance,
    COUNT(*) AS num_trips
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE EXTRACT(YEAR FROM trip_start) = 2024
GROUP BY month
ORDER BY month;
-- Scans only relevant columns and partitions

Snowflake

  • Separation of storage (S3/GCS/Azure Blob), compute (virtual warehouses), and cloud services (metadata, optimization)
  • Virtual warehouses: independent compute clusters that scale independently
  • Automatic clustering: maintains sort order for efficient pruning
  • Time travel: access historical data up to 90 days
  • Zero-copy cloning: instant database/table copies via metadata
Architecture:
  Cloud Services Layer: query parsing, optimization, metadata
       |
  Compute Layer: Virtual Warehouses (T-shirt sizing: XS to 6XL)
       |
  Storage Layer: Columnar micro-partitions on object storage
       (immutable, 50-500MB compressed, automatically clustered)

Lakehouse Architecture

The lakehouse combines the raw storage economics of data lakes with the management features (ACID, schema enforcement, indexing) of data warehouses.

Delta Lake

Open-source storage layer on top of Parquet files, adding ACID transactions via a transaction log.

Delta Table Structure:
  /delta_table/
    /_delta_log/
      00000000000000000000.json  # initial commit
      00000000000000000001.json  # add/remove file actions
      00000000000000000010.checkpoint.parquet  # periodic checkpoint
    /part-00000.parquet
    /part-00001.parquet

Transaction Log Entry:
  {"add": {"path": "part-00002.parquet", "size": 1048576,
           "partitionValues": {"date": "2025-11-15"},
           "stats": {"numRecords": 50000, "minValues": {...}}}}

Key features: ACID transactions, schema evolution, time travel, Z-ordering for multi-dimensional clustering, MERGE (upsert) support.

Apache Iceberg

Table format with snapshot isolation, hidden partitioning, and schema evolution. Avoids file listing overhead via manifest files.

Iceberg Metadata Hierarchy:
  Metadata File (current snapshot pointer)
    -> Snapshot (list of manifest lists)
      -> Manifest List (list of manifests)
        -> Manifest File (list of data files + column stats)
          -> Data Files (Parquet/ORC/Avro)

Hidden Partitioning:
  -- No need for partition columns in queries
  ALTER TABLE events ADD PARTITION FIELD hours(event_time);
  -- Iceberg automatically maps event_time to partition
  -- Query: WHERE event_time > X  (Iceberg prunes partitions)

Apache Hudi

Optimized for incremental processing with copy-on-write and merge-on-read table types. Supports record-level upserts and deletes on data lakes.


Comparison of Lakehouse Formats

| Feature | Delta Lake | Iceberg | Hudi | |---------|-----------|---------|------| | ACID transactions | Yes | Yes | Yes | | Schema evolution | Yes | Yes (full) | Yes | | Time travel | Yes | Yes | Yes | | Hidden partitioning | No | Yes | No | | Record-level updates | MERGE | MERGE | Native upsert | | Partition evolution | Limited | Full | Limited | | Primary ecosystem | Databricks | Multi-engine | AWS/Uber |


Choosing an Analytical System

| Scenario | Recommended | |----------|------------| | Embedded analytics, local files | DuckDB | | Real-time event analytics | ClickHouse, Druid | | Cloud data warehouse (managed) | Snowflake, BigQuery, Redshift | | Data lake with warehouse features | Delta Lake + Spark, Iceberg + Trino | | Sub-second dashboards | Druid, ClickHouse, BigQuery BI Engine |