B-Tree Indexes
The Default Index Type
When you run CREATE INDEX without specifying a type, Postgres creates a B-tree index. It is the right choice for the vast majority of queries: equality, range, sorting, and prefix matching. Understanding how B-trees work in Postgres explains when they help, when they do not, and how to make them work harder.
-- These all create B-tree indexes
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_created ON orders (created_at);
CREATE UNIQUE INDEX idx_users_email_uniq ON users (email);
How B-Trees Work in Postgres
A B-tree is a balanced tree structure where every leaf page is the same depth from the root. Postgres B-trees are technically B+trees: all data entries live in leaf pages, and internal pages contain only separator keys and child pointers.
The structure:
- Root page: A single page at the top. Contains pointers to child pages.
- Internal pages: Intermediate levels. Each entry is a key and a pointer to a child page.
- Leaf pages: The bottom level. Each entry is a key and a pointer (TID) to a heap tuple. Leaf pages are doubly linked for range scans.
For a table with 10 million rows, a B-tree index typically has 3-4 levels. Finding a row requires 3-4 page reads. With shared buffers, the upper levels stay cached, so most lookups need only 1-2 disk reads.
What B-Trees Support
-- Equality
SELECT * FROM users WHERE email = 'alice@example.com';
-- Range
SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';
-- Sorting
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20;
-- Prefix matching (text)
SELECT * FROM users WHERE name LIKE 'Ali%';
-- IS NULL / IS NOT NULL
SELECT * FROM users WHERE deleted_at IS NULL;
-- MIN/MAX (index-only)
SELECT min(created_at) FROM orders;
What B-Trees Do Not Support
-- Suffix or infix matching (use pg_trgm GIN index instead)
SELECT * FROM users WHERE name LIKE '%smith%';
-- Array containment (use GIN)
SELECT * FROM products WHERE tags @> ARRAY['postgres'];
-- JSONB containment (use GIN)
SELECT * FROM events WHERE payload @> '{"type": "click"}';
Single-Column Indexes
The simplest and most common index:
CREATE INDEX idx_orders_user_id ON orders (user_id);
Use this when you frequently filter or join on a single column. Check if the index is being used:
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
Index Scan using idx_orders_user_id on orders (cost=0.43..8.45 rows=5 width=64)
Index Cond: (user_id = 42)
Composite Indexes
A composite (multi-column) index covers queries that filter on multiple columns:
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
Column order matters. A composite index on (user_id, status) efficiently supports:
-- Uses the index (prefix match)
WHERE user_id = 42
-- Uses the index (both columns)
WHERE user_id = 42 AND status = 'pending'
-- Can use the index for the equality, then scan for the range
WHERE user_id = 42 AND status IN ('pending', 'processing')
But it does NOT efficiently support:
-- Cannot use the index (no leading column)
WHERE status = 'pending'
Think of a composite index like a phone book sorted by last name, then first name. You can look up all Smiths, or all John Smiths, but you cannot efficiently find all Johns regardless of last name.
Choosing Column Order
Put the most selective equality column first:
-- If queries always filter on user_id with equality and status varies
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- If queries filter on status with equality and range on created_at
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
For range queries, put the range column last. Equality filters narrow the B-tree search precisely, while range filters scan a portion of the leaf level.
Covering Indexes (INCLUDE)
A covering index includes additional columns that are not part of the search key but are needed by the query. This enables index-only scans, where Postgres reads the answer entirely from the index without touching the table.
CREATE INDEX idx_orders_user_covering ON orders (user_id)
INCLUDE (status, created_at);
-- Index-only scan: all needed columns are in the index
EXPLAIN SELECT user_id, status, created_at
FROM orders
WHERE user_id = 42;
Index Only Scan using idx_orders_user_covering on orders (cost=0.43..4.45 rows=5 width=24)
Index Cond: (user_id = 42)
INCLUDE columns are stored in leaf pages but not in internal pages, so they do not affect the tree structure or search performance. They do increase index size.
Index-Only Scans
An index-only scan avoids heap (table) access entirely. It requires:
- All columns the query needs are in the index (search key + INCLUDE).
- The visibility map confirms all tuples on the heap page are visible (so the index does not need to check the heap for MVCC visibility).
-- Check if index-only scans are working
SELECT relname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'idx_orders_user_covering';
A high idx_tup_read with low idx_tup_fetch indicates index-only scans are hitting. If idx_tup_fetch is close to idx_tup_read, Postgres is falling back to heap fetches (run VACUUM to update the visibility map).
Unique Indexes
CREATE UNIQUE INDEX idx_users_email ON users (email);
A unique index enforces uniqueness and serves as a regular B-tree index for queries. Primary keys automatically create a unique index.
Partial unique indexes enforce uniqueness over a subset of rows:
-- Email must be unique among non-deleted users
CREATE UNIQUE INDEX idx_users_email_active ON users (email)
WHERE deleted_at IS NULL;
Partial Indexes
A partial index covers only rows matching a predicate. This is one of the most powerful and underused features in Postgres.
-- Only index pending orders (a small fraction of all orders)
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
This index is tiny compared to indexing all orders. Queries that match the predicate use it:
-- Uses the partial index
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;
-- Does NOT use the partial index
SELECT * FROM orders WHERE status = 'shipped' ORDER BY created_at;
Common use cases for partial indexes:
-- Active users only
CREATE INDEX idx_users_active ON users (email) WHERE is_active;
-- Non-null values only (skip rows where the column is NULL)
CREATE INDEX idx_orders_shipped ON orders (shipped_at) WHERE shipped_at IS NOT NULL;
-- Unprocessed queue items
CREATE INDEX idx_jobs_pending ON jobs (created_at) WHERE processed_at IS NULL;
The Index That Does Nothing
Not every column benefits from an index. An index hurts when:
Low Selectivity
-- Boolean column with 50/50 distribution: index is useless
CREATE INDEX idx_users_active ON users (is_active);
-- Seq scan is faster when 50% of rows match
A B-tree index helps when the query selects a small fraction of rows (typically under 10-15%). For boolean columns, use a partial index instead:
CREATE INDEX idx_users_inactive ON users (id) WHERE NOT is_active;
Small Tables
Tables under a few thousand rows do not benefit from indexes for most queries. A sequential scan of 10 pages is faster than an index lookup plus heap fetch.
Write-Heavy Tables with Rare Reads
Every index adds overhead to INSERT, UPDATE, and DELETE. If a table receives thousands of writes per second but is rarely queried, indexes slow the writes without providing benefit.
-- Check if an index is actually used
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan;
An index with idx_scan = 0 after weeks of production use is a candidate for removal.
Index Sort Direction
B-tree indexes have a sort direction. By default, they sort ascending with nulls last:
-- Default: ASC NULLS LAST
CREATE INDEX idx_orders_created ON orders (created_at);
-- Explicit descending for "most recent first" queries
CREATE INDEX idx_orders_created_desc ON orders (created_at DESC);
For single-column indexes, sort direction rarely matters because Postgres can scan a B-tree in either direction. For composite indexes, it matters when the query sorts columns in different directions:
-- Query needs: ORDER BY user_id ASC, created_at DESC
CREATE INDEX idx_orders_user_created ON orders (user_id ASC, created_at DESC);
Common Pitfalls
- Indexing every column. Each index slows writes and consumes disk space. Index only columns that appear in WHERE, JOIN, and ORDER BY clauses of frequent queries.
- Wrong column order in composite indexes. The leading column must match the query's most selective equality filter. A composite index on (a, b) does not help queries that only filter on b.
- Ignoring partial indexes. If you only query a subset of rows (active users, pending orders, recent data), a partial index is smaller and faster.
- Not using INCLUDE for covering indexes. Adding frequently selected columns to INCLUDE avoids heap access and can dramatically speed up queries.
- Creating indexes on foreign key columns... and then also not creating them. Always index foreign key columns. But do not create a separate single-column index if the column is already the leading column of a composite index.
- Not checking if indexes are used. pg_stat_user_indexes tells you how often each index is scanned. Unused indexes are pure overhead.
Key Takeaways
- B-tree is the default and correct choice for equality, range, sorting, and prefix matching.
- Column order in composite indexes matters. Put equality columns first, range columns last.
- Covering indexes (INCLUDE) enable index-only scans that skip heap access entirely.
- Partial indexes cover only the rows you actually query, reducing size and improving performance.
- Every index costs write performance. Monitor pg_stat_user_indexes and remove unused indexes.
- Unique indexes enforce constraints and serve double duty as query indexes.