Analytical Databases

Overview

Analytical databases are optimized for complex queries over large datasets, prioritizing read throughput and aggregation performance over single-row transactional latency. The ecosystem spans traditional data warehouses, modern columnar engines, MPP cloud services, and the emerging lakehouse architecture.

OLAP vs OLTP

OLAP vs OLTP Architecture Comparison

Characteristic	OLTP	OLAP
Workload	Short transactions, CRUD	Complex aggregations, scans
Query pattern	Point lookups, small range	Full table scans, joins
Data volume per query	Rows to thousands	Millions to billions
Users	Application servers	Analysts, dashboards
Freshness	Real-time	Minutes to hours (or real-time)
Schema	Normalized (3NF)	Denormalized (star/snowflake)
Optimization	Index lookup latency	Scan throughput, compression

Dimensional Modeling

Star Schema

A central fact table references dimension tables via foreign keys. Simple, fast for queries with few joins.

-- Fact table: one row per transaction event
CREATE TABLE fact_sales (
    sale_id       BIGINT,
    date_key      INT REFERENCES dim_date(date_key),
    product_key   INT REFERENCES dim_product(product_key),
    store_key     INT REFERENCES dim_store(store_key),
    customer_key  INT REFERENCES dim_customer(customer_key),
    quantity      INT,
    unit_price    DECIMAL(10,2),
    total_amount  DECIMAL(12,2)
);

-- Dimension table
CREATE TABLE dim_product (
    product_key   INT PRIMARY KEY,
    product_name  VARCHAR(200),
    category      VARCHAR(100),
    subcategory   VARCHAR(100),
    brand         VARCHAR(100)
);

Snowflake Schema

Dimensions are further normalized into sub-dimensions (e.g., dim_product -> dim_category -> dim_department). Saves storage but adds joins.

Star:       fact_sales -> dim_product (category column inside)
Snowflake:  fact_sales -> dim_product -> dim_category -> dim_department

ETL / ELT

ETL (Extract, Transform, Load)

Data is extracted from sources, transformed in a staging area (cleaning, joining, aggregating), and loaded into the warehouse.

Sources (OLTP DBs, APIs, files)
    |  Extract
    v
Staging Area (transform: clean, deduplicate, conform)
    |  Load
    v
Data Warehouse (star schema)

ELT (Extract, Load, Transform)

Raw data is loaded into the warehouse (or data lake) first, then transformed using the warehouse's compute engine. Preferred in cloud-native architectures.

Sources -> Load (raw) -> Data Lake/Warehouse -> Transform (dbt, SQL)
                                                    |
                                                    v
                                              Curated Tables

dbt (data build tool) has become the standard for ELT transformations, managing SQL-based transformations as version-controlled code.

Materialized Views

Pre-computed query results that accelerate repeated analytical queries at the cost of storage and maintenance.

-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
    d.year, d.month, p.category,
    SUM(f.total_amount) AS revenue,
    COUNT(*) AS num_sales
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY d.year, d.month, p.category;

-- Refresh strategies
REFRESH MATERIALIZED VIEW monthly_revenue;                -- full refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;   -- non-blocking

-- ClickHouse: incrementally maintained with aggregating merge tree
CREATE MATERIALIZED VIEW mv_revenue
ENGINE = AggregatingMergeTree()
ORDER BY (year, month, category)
AS SELECT
    toYear(event_date) AS year,
    toMonth(event_date) AS month,
    category,
    sumState(amount) AS revenue  -- partial aggregate state
FROM sales
GROUP BY year, month, category;

Columnar Analytical Engines

ClickHouse

Open-source columnar OLAP database with exceptional single-node scan performance.

-- MergeTree: primary engine family
CREATE TABLE events (
    event_date  Date,
    user_id     UInt64,
    event_type  LowCardinality(String),  -- dictionary encoded
    duration_ms UInt32
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, user_id, event_date)
SETTINGS index_granularity = 8192;

-- Sparse primary index: one entry per 8192 rows
-- Enables skipping entire granules during scans
-- Compression: LZ4 by default, ZSTD for cold data

DuckDB

In-process analytical database (the "SQLite for analytics"). Vectorized execution, columnar storage, zero-copy integration with Pandas/Arrow.

import duckdb

# Query Parquet files directly without loading
result = duckdb.sql("""
    SELECT category, SUM(amount) AS total
    FROM 'sales/*.parquet'
    WHERE sale_date >= '2025-01-01'
    GROUP BY category
    ORDER BY total DESC
""").fetchdf()  # returns Pandas DataFrame

# Zero-copy from Pandas
import pandas as pd
df = pd.read_csv("large_file.csv")
duckdb.sql("SELECT AVG(price) FROM df WHERE qty > 10")

Apache Druid

Real-time analytical database designed for sub-second OLAP queries on event data. Uses a combination of columnar storage, inverted indexes, and pre-aggregation.

MPP (Massively Parallel Processing) Systems

Amazon Redshift

Columnar storage on shared-nothing cluster of compute nodes
Distribution styles: KEY, EVEN, ALL (broadcast small tables)
Zone maps: min/max metadata per block for scan skipping
Redshift Spectrum: query S3 data without loading
RA3 nodes: managed storage decoupled from compute

Google BigQuery

Serverless: no cluster management, auto-scaling
Dremel execution engine: tree-structured dispatch
Capacitor columnar format on Colossus distributed filesystem
Slot-based pricing: query cost proportional to data scanned
BI Engine: in-memory acceleration for dashboards

-- BigQuery: query petabytes with standard SQL
SELECT
    EXTRACT(MONTH FROM trip_start) AS month,
    AVG(trip_distance) AS avg_distance,
    COUNT(*) AS num_trips
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE EXTRACT(YEAR FROM trip_start) = 2024
GROUP BY month
ORDER BY month;
-- Scans only relevant columns and partitions

Snowflake

Separation of storage (S3/GCS/Azure Blob), compute (virtual warehouses), and cloud services (metadata, optimization)
Virtual warehouses: independent compute clusters that scale independently
Automatic clustering: maintains sort order for efficient pruning
Time travel: access historical data up to 90 days
Zero-copy cloning: instant database/table copies via metadata

Architecture:
  Cloud Services Layer: query parsing, optimization, metadata
       |
  Compute Layer: Virtual Warehouses (T-shirt sizing: XS to 6XL)
       |
  Storage Layer: Columnar micro-partitions on object storage
       (immutable, 50-500MB compressed, automatically clustered)

Lakehouse Architecture

The lakehouse combines the raw storage economics of data lakes with the management features (ACID, schema enforcement, indexing) of data warehouses.

Delta Lake

Open-source storage layer on top of Parquet files, adding ACID transactions via a transaction log.

Delta Table Structure:
  /delta_table/
    /_delta_log/
      00000000000000000000.json  # initial commit
      00000000000000000001.json  # add/remove file actions
      00000000000000000010.checkpoint.parquet  # periodic checkpoint
    /part-00000.parquet
    /part-00001.parquet

Transaction Log Entry:
  {"add": {"path": "part-00002.parquet", "size": 1048576,
           "partitionValues": {"date": "2025-11-15"},
           "stats": {"numRecords": 50000, "minValues": {...}}}}

Key features: ACID transactions, schema evolution, time travel, Z-ordering for multi-dimensional clustering, MERGE (upsert) support.

Apache Iceberg

Table format with snapshot isolation, hidden partitioning, and schema evolution. Avoids file listing overhead via manifest files.

Iceberg Metadata Hierarchy:
  Metadata File (current snapshot pointer)
    -> Snapshot (list of manifest lists)
      -> Manifest List (list of manifests)
        -> Manifest File (list of data files + column stats)
          -> Data Files (Parquet/ORC/Avro)

Hidden Partitioning:
  -- No need for partition columns in queries
  ALTER TABLE events ADD PARTITION FIELD hours(event_time);
  -- Iceberg automatically maps event_time to partition
  -- Query: WHERE event_time > X  (Iceberg prunes partitions)

Apache Hudi

Optimized for incremental processing with copy-on-write and merge-on-read table types. Supports record-level upserts and deletes on data lakes.

Comparison of Lakehouse Formats

Feature	Delta Lake	Iceberg	Hudi
ACID transactions	Yes	Yes	Yes
Schema evolution	Yes	Yes (full)	Yes
Time travel	Yes	Yes	Yes
Hidden partitioning	No	Yes	No
Record-level updates	MERGE	MERGE	Native upsert
Partition evolution	Limited	Full	Limited
Primary ecosystem	Databricks	Multi-engine	AWS/Uber

Choosing an Analytical System

Scenario	Recommended
Embedded analytics, local files	DuckDB
Real-time event analytics	ClickHouse, Druid
Cloud data warehouse (managed)	Snowflake, BigQuery, Redshift
Data lake with warehouse features	Delta Lake + Spark, Iceberg + Trino
Sub-second dashboards	Druid, ClickHouse, BigQuery BI Engine