Query Processing
Query processing transforms a SQL query into an efficient execution plan. Understanding this pipeline helps write better queries and diagnose performance issues.
Query Processing Pipeline
SQL Query
↓
[Parser] → Parse tree (syntax check)
↓
[Validator] → Validated parse tree (semantic check: tables/columns exist, permissions)
↓
[Optimizer] → Execution plan (choose the best strategy)
↓
[Executor] → Result set (execute the plan, return data)
Query Execution Plans
An execution plan is a tree of operators that process data.
EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) AS posts
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.is_active = TRUE
GROUP BY u.name
HAVING COUNT(p.id) > 5
ORDER BY posts DESC;
Sort (cost=... rows=... time=...)
└── Filter: count > 5
└── HashAggregate (GROUP BY u.name)
└── Hash Join (u.id = p.user_id)
├── Seq Scan on users (filter: is_active = TRUE)
└── Seq Scan on posts
Reading EXPLAIN Output
Key metrics:
- Cost: Estimated I/O + CPU cost (arbitrary units). First number = startup cost. Second = total cost.
- Rows: Estimated number of rows produced.
- Width: Average row size in bytes.
- Actual time: Real execution time (with ANALYZE).
- Loops: Number of times the operator was executed.
Scan types:
- Seq Scan: Full table scan. Reads every row. O(n).
- Index Scan: Use a B-tree index. Read index → fetch rows. O(log n + k).
- Index Only Scan: Answer from the index alone (covering index). No table access.
- Bitmap Index Scan: Build a bitmap of matching pages → Bitmap Heap Scan fetches those pages.
Join Operators
Nested Loop Join
For each row in the outer table, scan the inner table for matches.
for each row r in R:
for each row s in S:
if r.key == s.key:
output (r, s)
Cost: O(|R| × |S|) without index. O(|R| × log|S|) with index on S.
Best for: Small outer table. Inner table has an index. Inequality joins.
Sort-Merge Join
- Sort both tables on the join key.
- Merge the sorted results (like merge step of merge sort).
Sort R on key. Sort S on key.
Merge: advance pointers in both sorted lists, output matches.
Cost: O(|R| log|R| + |S| log|S| + |R| + |S|). If already sorted (index, ORDER BY): just the merge step.
Best for: Large tables that are already sorted (or need sorted output). Equi-joins.
Hash Join
- Build phase: Read the smaller table (build input). Hash each row into a hash table on the join key.
- Probe phase: Read the larger table (probe input). For each row, look up matching rows in the hash table.
Build: hash_table = {}
for each row r in R_small:
hash_table[hash(r.key)].append(r)
Probe:
for each row s in S_large:
for each r in hash_table[hash(s.key)]:
if r.key == s.key:
output (r, s)
Cost: O(|R| + |S|) if hash table fits in memory. If not: grace hash join (partition both tables by hash, then join each partition).
Best for: Large equi-joins where neither table is sorted. The most common join method in practice.
Index Nested Loop Join
Nested loop where the inner table lookup uses an index.
for each row r in R:
use index on S to find matching rows // O(log|S|) per lookup
output matches
Cost: O(|R| × log|S|).
Best for: Outer table is small. Inner table has an index on the join key.
Cost Estimation
Selectivity
Selectivity = fraction of rows satisfying a predicate.
σ_{age > 25}(Students): selectivity ≈ (max_age - 25) / (max_age - min_age)
σ_{name = 'Alice'}(Students): selectivity ≈ 1 / n_distinct(name)
Cardinality estimate = selectivity × total rows.
Statistics
The optimizer relies on statistics about the data:
- Row count: Number of rows in each table.
- Column statistics: Number of distinct values, min/max, null fraction.
- Histograms: Distribution of values. Equi-width or equi-depth (most-common values + histogram for the rest).
- Correlation: How closely physical order matches index order.
-- PostgreSQL: update statistics
ANALYZE users;
-- View statistics
SELECT * FROM pg_stats WHERE tablename = 'users';
Stale statistics → bad estimates → bad plans. Run ANALYZE after bulk operations.
Sampling
For large tables, statistics are computed from a random sample (not a full scan). PostgreSQL samples ~30,000 rows by default.
Materialized Views
Pre-computed query results stored as a table.
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT date_trunc('month', order_date) AS month, SUM(amount) AS revenue
FROM orders GROUP BY month;
-- Refresh when data changes
REFRESH MATERIALIZED VIEW monthly_revenue;
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue; -- without locking reads
Trade-off: Fast reads (pre-computed) but stale data (until refreshed) and storage cost.
Query Plan Caching
Most databases cache prepared statement plans. Avoid re-parsing and re-optimizing for repeated queries.
PREPARE get_user(INT) AS SELECT * FROM users WHERE id = $1;
EXECUTE get_user(42);
Plan invalidation: Cached plans are invalidated when schema changes, statistics are updated, or the plan is deemed suboptimal for a new parameter value.
Parallel Query Execution
Modern databases parallelize query execution across multiple CPU cores.
Gather
/ \
Partial Agg Partial Agg
| |
Seq Scan (part 1) Seq Scan (part 2)
PostgreSQL: Parallel seq scan, parallel hash join, parallel sort, parallel aggregate. Controlled by max_parallel_workers_per_gather.
When parallel helps: Large scans, aggregations, hash joins. When it doesn't: small tables, index lookups, heavy locking.
Applications in CS
- Performance tuning: EXPLAIN ANALYZE is the #1 tool. Understanding operator costs helps choose between index scan vs seq scan, hash join vs merge join.
- Query writing: Knowing how the database executes queries helps write more efficient SQL (avoid correlated subqueries when possible, use EXISTS vs IN appropriately).
- Schema design: Index selection and table structure should be guided by query patterns.
- Database administration: Monitor slow queries, update statistics, tune parallel execution settings.