4 min read
On this page

Essential Extensions

The Extensions That Turn Postgres into a Platform

PostgreSQL was designed to be extended. This is not a marketing claim. The extension system allows third parties to add data types, operators, index types, functions, and even entire query languages without forking the core database. The result is an ecosystem where Postgres absorbs capabilities that would normally require separate, specialized systems.

Installing an extension is a one-line operation:

CREATE EXTENSION pgvector;

That single command can turn Postgres from a relational database into a vector similarity search engine. The extension ecosystem is why the "just use Postgres" philosophy is not laziness. It is a pragmatic recognition that one database with the right extensions replaces five specialized tools.

pgvector adds vector data types and similarity search operators to Postgres. It is the reason you do not need Pinecone or Weaviate for most AI/ML embedding workloads.

CREATE EXTENSION vector;

CREATE TABLE documents (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    title text NOT NULL,
    content text NOT NULL,
    embedding vector(1536)  -- OpenAI ada-002 produces 1536 dimensions
);

-- Insert a document with its embedding
INSERT INTO documents (title, content, embedding)
VALUES ('Postgres Guide', 'How to use PostgreSQL...', '[0.1, 0.2, ...]');

-- Find the 5 most similar documents to a query embedding
SELECT title, embedding <=> '[0.1, 0.2, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;

pgvector supports three distance operators:

-- Cosine distance (most common for text embeddings)
SELECT * FROM documents ORDER BY embedding <=> query_vector LIMIT 5;

-- L2 (Euclidean) distance
SELECT * FROM documents ORDER BY embedding <-> query_vector LIMIT 5;

-- Inner product (for normalized vectors)
SELECT * FROM documents ORDER BY embedding <#> query_vector LIMIT 5;

Index types for fast approximate search:

-- IVFFlat index (faster to build, good for medium datasets)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- HNSW index (better recall, slower to build, best for most use cases)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

PostGIS: Geospatial Queries

PostGIS transforms Postgres into a full geospatial database. It adds geometry and geography types, spatial indexes, and hundreds of functions for spatial analysis.

CREATE EXTENSION postgis;

CREATE TABLE stores (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL,
    location geography(POINT, 4326) NOT NULL
);

INSERT INTO stores (name, location)
VALUES ('Downtown Coffee', ST_MakePoint(-73.9857, 40.7484));

-- Find stores within 2 km of a location
SELECT name,
       ST_Distance(location, ST_MakePoint(-73.9800, 40.7500)::geography) AS distance_m
FROM stores
WHERE ST_DWithin(location, ST_MakePoint(-73.9800, 40.7500)::geography, 2000)
ORDER BY distance_m;
-- Spatial index for fast proximity queries
CREATE INDEX idx_stores_location ON stores USING gist(location);

PostGIS handles points, lines, polygons, rasters, and topology. It supports coordinate transformations, spatial joins, and routing. If your application has any location component, PostGIS is the standard.

pg_trgm (trigram) provides similarity matching for text. It is the answer to "find me names that look like what the user typed, even if they misspelled it."

CREATE EXTENSION pg_trgm;

-- Similarity between two strings (0 to 1)
SELECT similarity('PostgreSQL', 'Postgress');
-- Returns ~0.53

-- Find similar product names
SELECT name, similarity(name, 'Postgress') AS sim
FROM products
WHERE similarity(name, 'Postgress') > 0.3
ORDER BY sim DESC;

-- GIN index for fast trigram lookups
CREATE INDEX idx_products_name_trgm ON products USING gin(name gin_trgm_ops);

-- Use the LIKE or ILIKE operator with index support
SELECT * FROM products WHERE name ILIKE '%postgres%';

Without pg_trgm, ILIKE '%search%' requires a sequential scan. With a trigram GIN index, it uses the index. This is the simplest way to make search boxes fast.

pg_stat_statements: Query Performance Tracking

pg_stat_statements tracks execution statistics for every unique query that runs against your database. It is the single most important extension for performance work.

CREATE EXTENSION pg_stat_statements;
# postgresql.conf (requires restart)
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
-- Top 10 queries by total execution time
SELECT query,
       calls,
       total_exec_time::numeric(12,2) AS total_ms,
       mean_exec_time::numeric(12,2) AS mean_ms,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
                    query                     | calls  | total_ms  | mean_ms | rows
----------------------------------------------+--------+-----------+---------+--------
 SELECT * FROM orders WHERE customer_id = $1  | 892341 | 234561.23 |    0.26 | 892341
 SELECT * FROM products WHERE category = $1   |  45123 | 189234.56 |    4.19 | 451230

This is how you find your slow queries, your most-called queries, and the queries that dominate database CPU time. Without pg_stat_statements, you are guessing.

pgcrypto: Encryption Functions

pgcrypto provides cryptographic functions inside the database. It handles hashing, encryption, and random data generation.

CREATE EXTENSION pgcrypto;

-- Hash a password with bcrypt
INSERT INTO users (email, password_hash)
VALUES ('user@example.com', crypt('their_password', gen_salt('bf', 10)));

-- Verify a password
SELECT id FROM users
WHERE email = 'user@example.com'
  AND password_hash = crypt('their_password', password_hash);

-- Encrypt sensitive data
UPDATE patients SET
    ssn_encrypted = pgp_sym_encrypt(ssn, 'encryption_key_here');

-- Decrypt when needed
SELECT pgp_sym_decrypt(ssn_encrypted, 'encryption_key_here') AS ssn
FROM patients
WHERE id = 42;

For UUIDs, modern Postgres (13+) has gen_random_uuid() built in. For older versions, use the uuid-ossp extension:

-- Built-in (Postgres 13+)
SELECT gen_random_uuid();

-- uuid-ossp extension (older versions or v1/v4 UUIDs)
CREATE EXTENSION "uuid-ossp";
SELECT uuid_generate_v4();

TimescaleDB: Time Series

TimescaleDB extends Postgres with hypertables that automatically partition time-series data. It adds continuous aggregates, compression, and time-series-specific query functions.

-- After installing TimescaleDB
CREATE TABLE sensor_data (
    time timestamptz NOT NULL,
    sensor_id int NOT NULL,
    temperature double precision,
    humidity double precision
);

-- Convert to a hypertable (automatic time-based partitioning)
SELECT create_hypertable('sensor_data', 'time');

-- Query with time-series functions
SELECT time_bucket('1 hour', time) AS hour,
       avg(temperature) AS avg_temp,
       max(temperature) AS max_temp
FROM sensor_data
WHERE time > now() - interval '7 days'
GROUP BY hour
ORDER BY hour;

TimescaleDB handles the partitioning, chunk management, and retention policies that you would otherwise build manually. For serious time-series workloads (IoT, metrics, financial data), it is the standard Postgres extension.

Installing & Managing Extensions

-- List available extensions
SELECT * FROM pg_available_extensions WHERE name LIKE 'pg%';

-- List installed extensions
SELECT * FROM pg_extension;

-- Install an extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Upgrade an extension after a Postgres upgrade
ALTER EXTENSION pg_trgm UPDATE;

-- Remove an extension
DROP EXTENSION pg_trgm;

On managed Postgres services (RDS, Cloud SQL, Supabase), available extensions are curated by the provider. Check your provider's documentation for supported extensions and versions.

Common Pitfalls

  • Not installing pg_stat_statements from day one. This is the first extension every Postgres deployment should enable. Without it, you have no visibility into query performance.
  • Using pgcrypto for password hashing when your application layer should handle it. Database-level hashing means the plaintext password crosses the network to the database. Prefer hashing in the application with bcrypt or argon2.
  • Creating pgvector indexes too early. Build and load your data first, then create the HNSW or IVFFlat index. Building the index on an empty table produces poor results.
  • Assuming all extensions are available on managed services. RDS, Cloud SQL, and other managed offerings support a subset of extensions. Verify availability before depending on an extension.
  • Ignoring extension version compatibility. Extensions must be compatible with your Postgres major version. Plan extension updates as part of Postgres upgrades.
  • Overloading Postgres with too many extensions. Each extension adds complexity. Only install what you need. Five well-chosen extensions are better than fifteen you do not understand.

Key Takeaways

  • pgvector makes Postgres a capable vector similarity search engine for AI/ML embedding workloads. For most use cases, you do not need a dedicated vector database.
  • PostGIS is the gold standard for geospatial queries in any database, not just Postgres.
  • pg_trgm provides fuzzy text matching and makes ILIKE queries index-backed. Essential for any user-facing search.
  • pg_stat_statements is non-negotiable for production. It provides the query performance visibility that every other optimization depends on.
  • pgcrypto handles encryption and hashing inside the database. Use it for data-at-rest encryption, but prefer application-level password hashing.
  • TimescaleDB turns Postgres into a competitive time-series database with automatic partitioning and compression.
  • The extension ecosystem is what makes Postgres a platform rather than just a database. Choose your extensions deliberately and manage them as part of your infrastructure.