Index Maintenance
Why Indexes Need Maintenance
Indexes are not static structures. Every INSERT adds entries. Every DELETE leaves dead entries. Every UPDATE is a delete-then-insert at the index level. Over time, indexes accumulate bloat: wasted space from dead entries that have not been reclaimed. Bloated indexes are larger than necessary, slower to scan, and consume more shared buffer cache.
A disciplined approach to index maintenance keeps query performance consistent and storage costs reasonable.
Index Bloat
When Postgres deletes an index entry (because the underlying row was deleted or updated), the space is not immediately reclaimed. The entry is marked as dead, and VACUUM eventually marks the space as reusable. But reusable space is not the same as compacted space. The index pages remain allocated even when half-empty.
Measuring Index Bloat
The pgstattuple extension provides precise bloat measurements:
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatindex('idx_orders_user_id');
version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+-------------------
4 | 2 | 214302720 | 412 | 89 | 26124 | 0 | 0 | 68.42 | 12
Key metrics:
- avg_leaf_density: Percentage of leaf page space that is used. Below 70% indicates significant bloat.
- leaf_fragmentation: Percentage of leaf pages out of logical order. High fragmentation means range scans read pages non-sequentially.
A quick estimate without the extension:
SELECT
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS scans,
idx_tup_read AS tuples_read
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
Compare the index size to the table size and row count. An index that is larger than the table itself is almost certainly bloated.
What Causes Bloat
- High UPDATE rate on indexed columns. Each update creates a new index entry and leaves the old one dead.
- High DELETE rate. Deleted rows leave dead index entries until VACUUM runs.
- Autovacuum falling behind. If dead entries accumulate faster than VACUUM reclaims them, bloat grows.
- Long-running transactions. They prevent VACUUM from cleaning up dead entries visible to the transaction's snapshot.
REINDEX
REINDEX rebuilds an index from scratch, producing a compact, defragmented index:
-- Reindex a single index
REINDEX INDEX idx_orders_user_id;
-- Reindex all indexes on a table
REINDEX TABLE orders;
-- Reindex everything in a database
REINDEX DATABASE mydb;
The problem: REINDEX takes an exclusive lock on the table. No reads or writes can proceed while the index is being rebuilt. For a large index, this can take minutes to hours.
REINDEX CONCURRENTLY
Postgres 12+ supports concurrent reindexing:
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
This builds a new index alongside the old one, then swaps them atomically. The table remains fully accessible during the rebuild. The trade-off is that it takes longer (roughly 2x the work) and requires extra disk space for the temporary duplicate index.
-- Reindex all indexes on a table concurrently
REINDEX TABLE CONCURRENTLY orders;
If a concurrent reindex fails partway through, it leaves behind an invalid index with a name like idx_orders_user_id_ccnew. Clean it up:
DROP INDEX idx_orders_user_id_ccnew;
Identifying Unused Indexes
Every index slows writes. An unused index is pure overhead. The pg_stat_user_indexes view tracks how often each index is scanned:
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
An index with idx_scan = 0 has never been used since the last statistics reset. Before dropping, verify:
- Check when stats were last reset.
SELECT stats_reset FROM pg_stat_database WHERE datname = current_database();
If stats were reset yesterday, an index that runs in a weekly report would show zero scans.
- Check for unique constraint enforcement. Unique indexes enforce constraints even if they show zero scans in
pg_stat_user_indexes. They are used implicitly during INSERT and UPDATE.
SELECT indexrelid::regclass, indisunique
FROM pg_index
WHERE indrelid = 'users'::regclass;
- Check if the index supports a foreign key. Dropping an index that backs a foreign key constraint causes cascading performance problems.
Safely Dropping an Index
If you are confident the index is unused:
-- First, make it invalid (prevents new queries from using it, but keeps it for existing plans)
-- Actually, just drop it:
DROP INDEX idx_orders_old_status;
If something breaks, recreate it. Keep the CREATE INDEX statement ready.
CREATE INDEX CONCURRENTLY
Building new indexes on production tables should always use CONCURRENTLY:
-- Standard: locks the table for writes during the entire build
CREATE INDEX idx_orders_email ON orders (email);
-- Concurrent: no lock, table remains writable
CREATE INDEX CONCURRENTLY idx_orders_email ON orders (email);
CREATE INDEX CONCURRENTLY scans the table twice and does not block INSERT, UPDATE, or DELETE operations. It takes longer than a standard index build but does not cause downtime.
Limitations:
- Cannot be run inside a transaction block.
- If it fails, it leaves an invalid index that you must drop manually.
- Uses more CPU and I/O than a standard build.
-- Check for invalid indexes (failed concurrent builds)
SELECT indexrelid::regclass AS index_name, indisvalid
FROM pg_index
WHERE NOT indisvalid;
Index Size Monitoring
Track index sizes over time to detect bloat before it becomes a problem:
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
round(100.0 * pg_relation_size(indexrelid) / nullif(pg_relation_size(relid), 0), 1)
AS index_to_table_pct
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
Rules of thumb:
- A single-column B-tree index is typically 10-30% of the table size.
- If an index is larger than the table, it is bloated or the table has many indexes.
- GIN indexes are larger than B-tree indexes on the same data.
- BRIN indexes are tiny (often under 1MB even for very large tables).
Total Index Overhead
SELECT
relname AS table_name,
pg_size_pretty(pg_relation_size(oid)) AS table_size,
pg_size_pretty(pg_indexes_size(oid)) AS total_index_size,
(SELECT count(*) FROM pg_index WHERE indrelid = c.oid) AS index_count
FROM pg_class c
WHERE relkind = 'r' AND relnamespace = 'public'::regnamespace
ORDER BY pg_indexes_size(oid) DESC
LIMIT 10;
A table with 10+ indexes deserves scrutiny. Each index adds write overhead and consumes buffer cache.
The Quarterly Index Review
Schedule a regular review of indexes across your database. The process:
1. Find Unused Indexes
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan < 10 -- fewer than 10 scans since stats reset
AND NOT EXISTS (
SELECT 1 FROM pg_index
WHERE pg_index.indexrelid = pg_stat_user_indexes.indexrelid
AND indisunique
)
ORDER BY pg_relation_size(indexrelid) DESC;
2. Find Duplicate Indexes
Indexes that cover the same columns in the same order are redundant:
SELECT
a.indexrelid::regclass AS index_1,
b.indexrelid::regclass AS index_2,
pg_size_pretty(pg_relation_size(a.indexrelid)) AS size_1,
pg_size_pretty(pg_relation_size(b.indexrelid)) AS size_2
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid
AND a.indexrelid < b.indexrelid
AND a.indkey = b.indkey
WHERE a.indrelid::regclass::text NOT LIKE 'pg_%';
Also look for indexes that are strict prefixes of other indexes. An index on (user_id) is redundant if an index on (user_id, created_at) exists, since the composite index handles both use cases.
3. Find Bloated Indexes
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS current_size,
idx_scan,
CASE WHEN idx_scan > 0
THEN round(idx_tup_read::numeric / idx_scan, 0)
ELSE 0
END AS avg_tuples_per_scan
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024 -- over 100MB
ORDER BY pg_relation_size(indexrelid) DESC;
For indexes over 100MB, run pgstattuple to check leaf density. Below 60% warrants a REINDEX CONCURRENTLY.
4. Act
- Drop unused indexes (after verifying they are not enforcing constraints).
- Drop duplicate indexes.
- REINDEX CONCURRENTLY bloated indexes.
- Document what you changed and why.
pg_repack: Online Table & Index Compaction
For severe table bloat (VACUUM FULL requires an exclusive lock), pg_repack rebuilds tables and indexes online:
pg_repack -d mydb -t orders
pg_repack -d mydb -i idx_orders_user_id
pg_repack creates a new copy of the table/index, replays any changes that occurred during the copy, and swaps atomically. No exclusive lock on the table. This is the production-safe alternative to VACUUM FULL and REINDEX.
Common Pitfalls
- Never reviewing indexes after initial creation. Access patterns change. Indexes created at launch may be useless six months later.
- Using REINDEX instead of REINDEX CONCURRENTLY. Standard REINDEX locks the table. Always use CONCURRENTLY in production unless you have a maintenance window.
- Dropping unique indexes without understanding their constraint role. A unique index enforces data integrity. Dropping it allows duplicate values.
- Ignoring failed concurrent operations. Failed
CREATE INDEX CONCURRENTLYandREINDEX CONCURRENTLYleave invalid indexes behind. Monitor for them. - Not monitoring index sizes. Bloat grows silently. By the time queries slow down, indexes may be 3-5x their optimal size.
- Having too many indexes on write-heavy tables. Every INSERT updates every index. Five indexes means five index insertions per row. Evaluate whether each index justifies its write cost.
Key Takeaways
- Index bloat is inevitable on tables with updates and deletes. Monitor leaf density and index sizes.
- Use REINDEX CONCURRENTLY to rebuild bloated indexes without downtime.
- pg_stat_user_indexes reveals which indexes are used and which are wasted overhead. Check it regularly.
- CREATE INDEX CONCURRENTLY is the only safe way to add indexes to production tables.
- Schedule a quarterly index review: find unused, duplicate, and bloated indexes. Drop or rebuild as needed.
- pg_repack provides online compaction for both tables and indexes when VACUUM FULL is not an option.