5 min read
On this page

Full-Text Search

When Postgres Search Is Enough

Most applications do not need Elasticsearch. They need a search box that returns relevant results quickly. PostgreSQL has built-in full-text search that handles stemming, ranking, phrase matching, and custom dictionaries. It runs inside your existing database, requires no additional infrastructure, and scales to millions of documents with proper indexing.

The question is not "should I use Elasticsearch?" The question is "have I tried Postgres full-text search first?" For the vast majority of applications, the answer is that Postgres FTS is more than sufficient.

tsvector & tsquery: The Core Types

Full-text search in Postgres revolves around two data types:

  • tsvector: a sorted list of normalized words (lexemes) extracted from a document
  • tsquery: a search predicate with boolean operators
-- Convert text to a tsvector
SELECT to_tsvector('english', 'PostgreSQL is a powerful relational database system');
 'databas':6 'power':4 'postgresql':1 'relat':5 'system':7

Notice what happened: "PostgreSQL" became "postgresql" (lowercased), "powerful" became "power" (stemmed), "relational" became "relat" (stemmed), and "is" and "a" were removed (stop words). Each lexeme has a position number.

-- Convert a search to a tsquery
SELECT to_tsquery('english', 'powerful & database');
 'power' & 'databas'

The same stemming applies to the query. Searching for "powerful" matches documents containing "power", "powerful", "powers", and "empowered."

Building a Search Column

The standard pattern is to add a tsvector column and keep it updated:

-- Add a search vector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;

-- Populate it from title and body (title weighted higher)
UPDATE articles SET search_vector =
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B');

-- Create a GIN index for fast search
CREATE INDEX idx_articles_search ON articles USING gin(search_vector);

-- Keep it updated with a trigger
CREATE OR REPLACE FUNCTION articles_search_update() RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_articles_search
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION articles_search_update();

The weight categories (A, B, C, D) let you boost matches in titles over body text. Weight A has the highest relevance by default.

Querying with tsquery

-- Basic search: find articles matching "postgres" AND "replication"
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & replication') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- OR search: "postgres" OR "mysql"
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgres | mysql');

-- NOT search: "database" but NOT "nosql"
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & !nosql');

-- Phrase search: "full text search" as a phrase
SELECT title
FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'full text search');

-- Prefix search: words starting with "post"
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'post:*');

The @@ operator is the match operator. It returns true if the tsvector matches the tsquery.

Ranking Results

Postgres provides two ranking functions:

-- ts_rank: standard relevance ranking
SELECT title,
       ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- ts_rank_cd: cover density ranking (considers proximity of matched terms)
SELECT title,
       ts_rank_cd(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Ranking respects weights. A match in a weight-A field (like title) ranks higher than the same match in a weight-B field (like body).

-- Custom weight array: {D, C, B, A} weights
-- Make title matches (A) 10x more important than body (B)
SELECT title,
       ts_rank('{0.1, 0.2, 1.0, 10.0}', search_vector, query) AS rank
FROM articles, to_tsquery('english', 'replication') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Highlighting Matches

ts_headline highlights matching terms in the original text, which is essential for displaying search results:

SELECT title,
       ts_headline('english', body, to_tsquery('english', 'replication'),
           'StartSel=<b>, StopSel=</b>, MaxWords=50, MinWords=20') AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'replication')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'replication')) DESC
LIMIT 10;
 title          |                          snippet
----------------+-------------------------------------------------------------
 Replication    | ...streaming <b>replication</b> sends WAL records to
                | standby servers. Physical <b>replication</b> copies...

ts_headline is expensive because it processes the original text, not the tsvector. Do not call it on thousands of rows. Apply it only to the final result set after filtering and limiting.

Language Configurations

Postgres ships with configurations for many languages. Each configuration defines how text is parsed, which stop words are removed, and how stemming works.

-- List available configurations
SELECT cfgname FROM pg_ts_config;

-- Use a specific language
SELECT to_tsvector('french', 'Les bases de donnees relationnelles');
SELECT to_tsvector('german', 'Die relationale Datenbank');
SELECT to_tsvector('spanish', 'La base de datos relacional');

-- simple configuration: no stemming, no stop words
SELECT to_tsvector('simple', 'PostgreSQL is a powerful database');
 -- 'simple' keeps everything, just lowercases
 'a':3 'database':6 'is':2 'postgresql':1 'powerful':5

Use the 'simple' configuration when you do not want stemming (for example, searching product SKUs or codes).

GIN Indexes for Search Performance

Without a GIN index, every full-text search query requires a sequential scan. With a GIN index, Postgres uses the index to find matching documents directly.

-- Standard GIN index on a tsvector column
CREATE INDEX idx_search ON articles USING gin(search_vector);

-- GIN index with fastupdate disabled (faster reads, slower writes)
CREATE INDEX idx_search ON articles USING gin(search_vector)
    WITH (fastupdate = off);

GIN indexes can be large. For a table with 10 million rows, the GIN index on a tsvector column might be 2-5 GB. This is the trade-off for fast search.

-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_search'));

When Postgres FTS Is Not Enough

Postgres full-text search has limits. Consider Elasticsearch, Typesense, or Meilisearch when you need:

  • Typo tolerance across all queries (pg_trgm handles this partially, but dedicated search engines do it better)
  • Faceted search with complex aggregations (count by category, price range, brand simultaneously)
  • Real-time indexing at extreme write throughput (millions of documents per second)
  • Distributed search across a cluster (Postgres FTS runs on a single server)
  • Sophisticated relevance tuning (Elasticsearch BM25 tuning, function scores, decay functions)

The hybrid approach works well: store data in Postgres as the source of truth, index it in Elasticsearch for search, and keep them synchronized with a change data capture pipeline or periodic reindex.

-- The hybrid pattern: Postgres for storage, search results return IDs
-- 1. Search engine returns matching document IDs
-- 2. Fetch full documents from Postgres
SELECT * FROM articles WHERE id = ANY(ARRAY[42, 17, 89, 3, 56]);

A Complete Search Implementation

-- Full working example
CREATE TABLE products (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL,
    description text,
    category text NOT NULL,
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(category, '')), 'C')
    ) STORED
);

CREATE INDEX idx_products_search ON products USING gin(search_vector);

-- Search with ranking and highlighting
SELECT id, name, category,
       ts_rank(search_vector, query) AS rank,
       ts_headline('english', description, query,
           'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15') AS snippet
FROM products, websearch_to_tsquery('english', 'wireless bluetooth headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

The GENERATED ALWAYS AS ... STORED column (Postgres 12+) eliminates the need for a trigger. The websearch_to_tsquery function accepts natural-language-style queries that users actually type.

Common Pitfalls

  • Calling ts_headline on unfiltered result sets. ts_headline is expensive. Always filter and limit first, then highlight only the results you display.
  • Using the wrong language configuration. English stemming on French text produces garbage results. Match the configuration to your content language.
  • Forgetting to index the tsvector column. Without a GIN index, every search query is a sequential scan. This is the most common performance mistake.
  • Not using weights. Matching a search term in the title should rank higher than matching it in the body. Use setweight to express this.
  • Building a search engine when a search box is enough. Most applications need a simple search box with relevant results, not a full-featured search engine. Postgres FTS covers this.
  • Ignoring websearch_to_tsquery. This function (Postgres 11+) parses natural language queries the way users expect. Use it instead of forcing users to learn tsquery syntax.

Key Takeaways

  • Postgres full-text search uses tsvector (documents) and tsquery (queries) with stemming, stop word removal, and position tracking built in.
  • GIN indexes are essential for search performance. Without them, every search is a sequential scan.
  • Weights (A, B, C, D) let you boost matches in important fields like titles over less important fields like descriptions.
  • ts_rank and ts_rank_cd provide relevance ranking. ts_headline provides result highlighting. Use ts_headline sparingly.
  • Postgres FTS is sufficient for the majority of application search needs. Reach for Elasticsearch only when you need typo tolerance, faceted search, distributed search, or extreme write throughput.
  • The hybrid approach (Postgres for storage, search engine for indexing) gives you the best of both worlds when Postgres FTS is not enough.