6 min read
On this page

Specialized Indexes

Beyond B-Tree

B-tree indexes handle most workloads, but Postgres offers specialized index types optimized for specific access patterns. Choosing the right index type can mean the difference between a query that takes seconds and one that takes milliseconds.

GIN: Generalized Inverted Index

GIN indexes are designed for values that contain multiple elements: arrays, JSONB documents, full-text search vectors, and other composite types. A GIN index creates an entry for each element within a value, pointing back to the rows that contain that element.

GIN for JSONB

CREATE TABLE products (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL,
    attributes jsonb NOT NULL DEFAULT '{}'
);

-- Default GIN operator class
CREATE INDEX idx_products_attrs ON products USING gin(attributes);

-- Supports containment queries
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
SELECT * FROM products WHERE attributes ? 'wireless';
SELECT * FROM products WHERE attributes ?| array['wireless', 'bluetooth'];

The jsonb_path_ops operator class is smaller and faster but only supports the @> operator:

CREATE INDEX idx_products_attrs_pathops ON products
    USING gin(attributes jsonb_path_ops);

-- Supported
SELECT * FROM products WHERE attributes @> '{"color": "blue", "size": "L"}';

-- NOT supported by jsonb_path_ops (use default operator class)
SELECT * FROM products WHERE attributes ? 'color';

GIN for Arrays

CREATE TABLE articles (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    title text NOT NULL,
    tags text[] NOT NULL DEFAULT '{}'
);

CREATE INDEX idx_articles_tags ON articles USING gin(tags);

-- Array containment
SELECT * FROM articles WHERE tags @> ARRAY['postgres'];

-- Array overlap (any element matches)
SELECT * FROM articles WHERE tags && ARRAY['postgres', 'mysql'];
ALTER TABLE articles ADD COLUMN search_vector tsvector;

UPDATE articles SET search_vector =
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''));

CREATE INDEX idx_articles_fts ON articles USING gin(search_vector);

SELECT title, ts_rank(search_vector, q) AS rank
FROM articles, to_tsquery('english', 'replication & failover') AS q
WHERE search_vector @@ q
ORDER BY rank DESC;

GIN for Trigram Similarity

The pg_trgm extension enables LIKE and ILIKE queries to use GIN indexes:

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_users_name_trgm ON users USING gin(name gin_trgm_ops);

-- Now these use the index
SELECT * FROM users WHERE name LIKE '%smith%';
SELECT * FROM users WHERE name ILIKE '%smith%';
SELECT * FROM users WHERE name % 'Jon Smith';  -- similarity match

Without pg_trgm, LIKE '%smith%' always triggers a sequential scan. With a GIN trigram index, it becomes an index scan.

GIN Trade-Offs

GIN indexes are:

  • Excellent for read-heavy workloads. Lookup speed is fast.
  • Slower to build and update. Inserting into a GIN index is more expensive than B-tree because each indexed value may generate many index entries.
  • Larger than B-trees for the same data because of the inverted structure.

The fastupdate option (enabled by default) defers GIN index updates to a pending list, which is merged into the main index periodically. This speeds up writes but makes the first read after many writes slower.

-- Disable fastupdate if read latency consistency matters more than write speed
CREATE INDEX idx_articles_tags ON articles USING gin(tags) WITH (fastupdate = off);

GiST: Generalized Search Tree

GiST is a framework for building balanced tree indexes over data types that do not fit the sorted-order model of B-trees. It supports geometric data, range types, nearest-neighbor queries, and more.

GiST for Range Types

CREATE TABLE reservations (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    room_id integer NOT NULL,
    during tstzrange NOT NULL,
    EXCLUDE USING gist (room_id WITH =, during WITH &&)
);

CREATE INDEX idx_reservations_during ON reservations USING gist(during);

-- Find overlapping reservations
SELECT * FROM reservations
WHERE during && tstzrange('2024-03-15 14:00', '2024-03-15 16:00');

The EXCLUDE constraint with a GiST index prevents double-booking: no two rows can have the same room_id with overlapping during ranges. This is something B-tree indexes cannot enforce.

GiST for PostGIS Geometry

CREATE EXTENSION postgis;

CREATE TABLE locations (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL,
    geom geometry(Point, 4326) NOT NULL
);

CREATE INDEX idx_locations_geom ON locations USING gist(geom);

-- Find locations within 1km of a point
SELECT name, ST_Distance(
    geom::geography,
    ST_MakePoint(-73.9857, 40.7484)::geography
) AS distance_m
FROM locations
WHERE ST_DWithin(
    geom::geography,
    ST_MakePoint(-73.9857, 40.7484)::geography,
    1000
)
ORDER BY distance_m;

GiST for Nearest-Neighbor Queries

GiST supports the <-> distance operator for K-nearest-neighbor (KNN) searches:

-- Find the 5 closest locations (KNN uses the GiST index directly)
SELECT name, geom <-> ST_MakePoint(-73.9857, 40.7484)::geometry AS distance
FROM locations
ORDER BY geom <-> ST_MakePoint(-73.9857, 40.7484)::geometry
LIMIT 5;

This is an index-driven sort. Postgres does not compute distances for all rows and then sort. It walks the GiST index to find the nearest entries directly.

GiST for inet/cidr Types

CREATE TABLE ip_blocks (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    network cidr NOT NULL,
    owner text NOT NULL
);

CREATE INDEX idx_ip_blocks_network ON ip_blocks USING gist(network inet_ops);

-- Find which block contains an IP
SELECT owner FROM ip_blocks WHERE network >>= '192.168.1.100'::inet;

GiST Trade-Offs

GiST indexes are:

  • Lossy for some data types. The index stores bounding boxes or summaries, and Postgres must recheck the heap for exact matches.
  • Slower than B-trees for simple equality/range. Use B-tree when it suffices. GiST is for access patterns B-tree cannot handle.
  • Required for exclusion constraints. EXCLUDE USING gist is the only way to enforce non-overlapping ranges.

BRIN: Block Range Index

BRIN indexes store summary information (min/max values) for each range of physical table blocks. They are tiny compared to B-tree indexes and are effective when column values are naturally correlated with physical storage order.

BRIN for Time-Series Data

When rows are inserted chronologically, created_at values are naturally ordered on disk. BRIN is ideal for this:

CREATE TABLE sensor_readings (
    id bigint GENERATED ALWAYS AS IDENTITY,
    sensor_id integer NOT NULL,
    reading double precision NOT NULL,
    recorded_at timestamptz NOT NULL
) PARTITION BY RANGE (recorded_at);

CREATE INDEX idx_readings_time ON sensor_readings USING brin(recorded_at);

How BRIN Works

BRIN divides the table into block ranges (default: 128 pages per range). For each range, it stores the minimum and maximum value of the indexed column. When a query filters on the column, BRIN checks which block ranges could possibly contain matching rows and skips the rest.

-- BRIN index is tiny
SELECT pg_size_pretty(pg_relation_size('idx_readings_time'));
 pg_size_pretty
----------------
 48 kB

A B-tree index on the same column might be 500MB. BRIN achieves this compression because it stores one summary per 128 pages instead of one entry per row.

When BRIN Works Poorly

BRIN is useless when column values are not correlated with physical order. If you insert rows with random timestamps (not chronological), every block range contains the full timestamp range, and BRIN cannot skip anything.

-- Check physical correlation
SELECT correlation FROM pg_stats
WHERE tablename = 'sensor_readings' AND attname = 'recorded_at';
 correlation
-------------
       0.998

A correlation near 1.0 or -1.0 means BRIN will be effective. A correlation near 0 means BRIN is useless; use B-tree instead.

Tuning BRIN

-- Smaller ranges = more precise but larger index
CREATE INDEX idx_readings_time ON sensor_readings
    USING brin(recorded_at) WITH (pages_per_range = 32);

Smaller pages_per_range values make the index more selective (fewer false positives) but increase index size. The default of 128 is a good starting point.

Hash Indexes

Hash indexes support only equality comparisons. They are smaller than B-trees for pure equality lookups but do not support range queries, sorting, or partial matching.

CREATE INDEX idx_sessions_token ON sessions USING hash(session_token);

-- Uses the hash index
SELECT * FROM sessions WHERE session_token = 'abc123xyz';

-- Cannot use the hash index
SELECT * FROM sessions WHERE session_token LIKE 'abc%';
SELECT * FROM sessions ORDER BY session_token;

Since Postgres 10, hash indexes are WAL-logged and crash-safe. Before that, they were not durable. In practice, hash indexes are rarely worth it. B-tree indexes handle equality efficiently, and their additional capabilities (range, sort) usually justify the small size premium.

Hash indexes are worth considering when:

  • The column is large (long text values) and you only need equality
  • Index size is a critical constraint
  • You never need range or sort on that column

Bloom Indexes

Bloom indexes use a probabilistic data structure (Bloom filter) to support equality queries across multiple columns simultaneously:

CREATE EXTENSION bloom;

CREATE INDEX idx_logs_bloom ON logs USING bloom (source, level, host)
    WITH (length = 80, col1 = 2, col2 = 2, col3 = 2);

-- Can use the index for any combination of these columns
SELECT * FROM logs WHERE source = 'web' AND level = 'error';
SELECT * FROM logs WHERE host = 'prod-01';
SELECT * FROM logs WHERE source = 'api' AND host = 'prod-02';

A Bloom index is much smaller than creating individual B-tree indexes on each column combination. It is useful for ad-hoc filtering across many columns. However, it has false positives: the index may suggest rows match when they do not, requiring a heap recheck.

Choosing the Right Index Type

Access Pattern Index Type
Equality, range, sort B-tree
JSONB containment, key existence GIN
Array containment, overlap GIN
Full-text search GIN
LIKE '%pattern%' GIN (pg_trgm)
Geometric/spatial queries GiST
Range type overlap, exclusion GiST
Nearest-neighbor (KNN) GiST
Time-series on ordered data BRIN
Large table, monotonic column BRIN
Pure equality on large values Hash
Ad-hoc multi-column equality Bloom

Common Pitfalls

  • Using B-tree for JSONB containment queries. B-tree indexes on extracted JSONB fields work for specific field equality, but containment operators (@>, ?, ?|) require GIN.
  • Using BRIN on unordered columns. BRIN depends on physical correlation. Check pg_stats.correlation before creating a BRIN index.
  • Creating too many GIN indexes. Each GIN index adds significant write overhead. Be selective about which columns get GIN indexes.
  • Forgetting that GiST indexes can be lossy. Some GiST operator classes return false positives. Postgres rechecks automatically, but this adds overhead.
  • Using hash indexes out of habit. In almost all cases, B-tree is a better choice. Hash indexes save a small amount of space at the cost of losing range and sort support.
  • Not using pg_trgm for LIKE queries. Without it, LIKE '%pattern%' is always a sequential scan. The pg_trgm extension with a GIN index fixes this.

Key Takeaways

  • GIN is for multi-element values: JSONB, arrays, full-text search, and trigram matching. It is read-optimized and write-expensive.
  • GiST is for geometric data, range types, nearest-neighbor queries, and exclusion constraints. It handles access patterns that B-tree cannot.
  • BRIN is for large tables with naturally ordered data. It produces tiny indexes but only works when physical and logical order correlate.
  • Hash indexes are niche: pure equality on large values where B-tree size is a concern.
  • Bloom indexes support ad-hoc multi-column equality queries with a single compact index.
  • Always verify your access pattern matches the index type. The wrong index type is worse than no index.