4 min read
On this page

Query Processing

Query processing transforms a SQL query into an efficient execution plan. Understanding this pipeline helps write better queries and diagnose performance issues.

Query Processing Pipeline

SQL Query
    ↓
[Parser] → Parse tree (syntax check)
    ↓
[Validator] → Validated parse tree (semantic check: tables/columns exist, permissions)
    ↓
[Optimizer] → Execution plan (choose the best strategy)
    ↓
[Executor] → Result set (execute the plan, return data)

Query Execution Plans

An execution plan is a tree of operators that process data.

EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) AS posts
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.is_active = TRUE
GROUP BY u.name
HAVING COUNT(p.id) > 5
ORDER BY posts DESC;
Sort (cost=... rows=... time=...)
  └── Filter: count > 5
      └── HashAggregate (GROUP BY u.name)
          └── Hash Join (u.id = p.user_id)
              ├── Seq Scan on users (filter: is_active = TRUE)
              └── Seq Scan on posts

Reading EXPLAIN Output

Key metrics:

  • Cost: Estimated I/O + CPU cost (arbitrary units). First number = startup cost. Second = total cost.
  • Rows: Estimated number of rows produced.
  • Width: Average row size in bytes.
  • Actual time: Real execution time (with ANALYZE).
  • Loops: Number of times the operator was executed.

Scan types:

  • Seq Scan: Full table scan. Reads every row. O(n).
  • Index Scan: Use a B-tree index. Read index → fetch rows. O(log n + k).
  • Index Only Scan: Answer from the index alone (covering index). No table access.
  • Bitmap Index Scan: Build a bitmap of matching pages → Bitmap Heap Scan fetches those pages.

Join Operators

Nested Loop Join

For each row in the outer table, scan the inner table for matches.

for each row r in R:
    for each row s in S:
        if r.key == s.key:
            output (r, s)

Cost: O(|R| × |S|) without index. O(|R| × log|S|) with index on S.

Best for: Small outer table. Inner table has an index. Inequality joins.

Sort-Merge Join

  1. Sort both tables on the join key.
  2. Merge the sorted results (like merge step of merge sort).
Sort R on key.  Sort S on key.
Merge: advance pointers in both sorted lists, output matches.

Cost: O(|R| log|R| + |S| log|S| + |R| + |S|). If already sorted (index, ORDER BY): just the merge step.

Best for: Large tables that are already sorted (or need sorted output). Equi-joins.

Hash Join

  1. Build phase: Read the smaller table (build input). Hash each row into a hash table on the join key.
  2. Probe phase: Read the larger table (probe input). For each row, look up matching rows in the hash table.
Build: hash_table = {}
  for each row r in R_small:
    hash_table[hash(r.key)].append(r)

Probe:
  for each row s in S_large:
    for each r in hash_table[hash(s.key)]:
      if r.key == s.key:
        output (r, s)

Cost: O(|R| + |S|) if hash table fits in memory. If not: grace hash join (partition both tables by hash, then join each partition).

Best for: Large equi-joins where neither table is sorted. The most common join method in practice.

Index Nested Loop Join

Nested loop where the inner table lookup uses an index.

for each row r in R:
    use index on S to find matching rows  // O(log|S|) per lookup
    output matches

Cost: O(|R| × log|S|).

Best for: Outer table is small. Inner table has an index on the join key.

Cost Estimation

Selectivity

Selectivity = fraction of rows satisfying a predicate.

σ_{age > 25}(Students):  selectivity ≈ (max_age - 25) / (max_age - min_age)
σ_{name = 'Alice'}(Students):  selectivity ≈ 1 / n_distinct(name)

Cardinality estimate = selectivity × total rows.

Statistics

The optimizer relies on statistics about the data:

  • Row count: Number of rows in each table.
  • Column statistics: Number of distinct values, min/max, null fraction.
  • Histograms: Distribution of values. Equi-width or equi-depth (most-common values + histogram for the rest).
  • Correlation: How closely physical order matches index order.
-- PostgreSQL: update statistics
ANALYZE users;

-- View statistics
SELECT * FROM pg_stats WHERE tablename = 'users';

Stale statistics → bad estimates → bad plans. Run ANALYZE after bulk operations.

Sampling

For large tables, statistics are computed from a random sample (not a full scan). PostgreSQL samples ~30,000 rows by default.

Materialized Views

Pre-computed query results stored as a table.

CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT date_trunc('month', order_date) AS month, SUM(amount) AS revenue
FROM orders GROUP BY month;

-- Refresh when data changes
REFRESH MATERIALIZED VIEW monthly_revenue;
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue; -- without locking reads

Trade-off: Fast reads (pre-computed) but stale data (until refreshed) and storage cost.

Query Plan Caching

Most databases cache prepared statement plans. Avoid re-parsing and re-optimizing for repeated queries.

PREPARE get_user(INT) AS SELECT * FROM users WHERE id = $1;
EXECUTE get_user(42);

Plan invalidation: Cached plans are invalidated when schema changes, statistics are updated, or the plan is deemed suboptimal for a new parameter value.

Parallel Query Execution

Modern databases parallelize query execution across multiple CPU cores.

                    Gather
                   /      \
            Partial Agg  Partial Agg
               |              |
          Seq Scan (part 1)  Seq Scan (part 2)

PostgreSQL: Parallel seq scan, parallel hash join, parallel sort, parallel aggregate. Controlled by max_parallel_workers_per_gather.

When parallel helps: Large scans, aggregations, hash joins. When it doesn't: small tables, index lookups, heavy locking.

Applications in CS

  • Performance tuning: EXPLAIN ANALYZE is the #1 tool. Understanding operator costs helps choose between index scan vs seq scan, hash join vs merge join.
  • Query writing: Knowing how the database executes queries helps write more efficient SQL (avoid correlated subqueries when possible, use EXISTS vs IN appropriately).
  • Schema design: Index selection and table structure should be guided by query patterns.
  • Database administration: Monitor slow queries, update statistics, tune parallel execution settings.