Vector Databases

You have millions of embedding vectors. A user sends a query, you embed it, and you need to find the 10 most similar vectors out of those millions — in under 100 milliseconds. A brute-force comparison against every vector is O(n), which is too slow at scale. Vector databases solve this with specialized indexing algorithms that trade a small amount of accuracy for orders-of-magnitude speed improvement.

The Landscape

The vector database market exploded in 2023-2024. Some options are purpose-built for vectors, others are extensions of existing databases.

Purpose-Built Vector Databases

Database     Deployment              Notes
Pinecone     Managed (cloud only)    Simplest to operate. No self-hosted option.
Weaviate     Managed or self-hosted  Hybrid search (vector + keyword). GraphQL API.
Qdrant       Managed or self-hosted  Rust-based, fast. Good filtering.
Milvus       Self-hosted or Zilliz   Apache 2.0. Handles billions of vectors.
Chroma       Self-hosted (or cloud)  Developer-friendly. Popular with LLM apps.

Vector Extensions for Existing Databases

Extension         Database       Notes
pgvector          PostgreSQL     SQL interface. No new infrastructure.
pgvectorscale     PostgreSQL     Timescale's optimized fork of pgvector.
Atlas Vector      MongoDB        Native vector search in MongoDB.
OpenSearch k-NN   OpenSearch     If you already run OpenSearch.
Redis VSS         Redis          In-memory, very fast, limited persistence story.

pgvector: When It Is Enough

If you are already running PostgreSQL, pgvector eliminates an entire category of infrastructure. You store vectors alongside your relational data in the same database, query them with SQL, and get transactional consistency for free.

# Install pgvector extension (run once)
# CREATE EXTENSION vector;

import psycopg2
import numpy as np

conn = psycopg2.connect("postgresql://localhost/mydb")
cur = conn.cursor()

# Create a table with a vector column
cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        title TEXT,
        content TEXT,
        category TEXT,
        embedding vector(1536)
    )
""")

# Insert a document with its embedding
embedding = np.random.rand(1536).tolist()  # replace with real embedding
cur.execute(
    "INSERT INTO documents (title, content, category, embedding) VALUES (%s, %s, %s, %s)",
    ("Query optimization guide", "Full text here...", "engineering", str(embedding))
)

# Find the 10 most similar documents using cosine distance
query_embedding = np.random.rand(1536).tolist()
cur.execute("""
    SELECT id, title, 1 - (embedding <=> %s::vector) AS similarity
    FROM documents
    ORDER BY embedding <=> %s::vector
    LIMIT 10
""", (str(query_embedding), str(query_embedding)))

results = cur.fetchall()
conn.commit()

pgvector supports three distance operators:

<=>   Cosine distance
<->   L2 (Euclidean) distance
<#>   Negative inner product (for dot product similarity)

When pgvector Is Enough

pgvector handles millions of vectors comfortably on modern hardware. Benchmarks consistently show sub-100ms query times for 1-5 million vectors with proper indexing. For many production applications — RAG systems, semantic search within a product, recommendation engines for mid-size catalogs — this is more than sufficient.

The real advantages:

One database to operate. No new infrastructure, no new backup strategy, no new monitoring.
Joins with relational data. Filter by user, tenant, category, date — all in one SQL query.
ACID transactions. Your vectors and metadata are always consistent.
Existing tooling. pgdump, replication, connection pooling — it all works.

When You Need a Dedicated Vector Database

Hundreds of millions to billions of vectors. pgvector starts to struggle. Purpose-built databases like Milvus are designed for this scale.
Sub-10ms latency requirements. In-memory vector databases (Qdrant, Redis VSS) can hit microsecond-level latencies.
Distributed vector search. Sharding vectors across a cluster with automatic rebalancing.
Advanced features. Multi-tenancy, role-based access, built-in reranking, hybrid search scoring.

Indexing Algorithms

The magic of vector databases is approximate nearest neighbor (ANN) search. Instead of comparing your query against every vector, ANN algorithms build an index structure that narrows the search space.

HNSW (Hierarchical Navigable Small World)

The most popular algorithm. Builds a multi-layer graph where each node is a vector and edges connect nearby vectors. Search starts at the top layer (sparse, long-range connections) and descends to lower layers (dense, short-range connections).

Layer 3:  A ---- D                    (few nodes, long jumps)
Layer 2:  A -- C -- D -- F            (more nodes, medium jumps)
Layer 1:  A - B - C - D - E - F - G   (all nodes, short jumps)
Layer 0:  [all vectors, fine-grained]

HNSW gives excellent recall (95-99% of true nearest neighbors) with fast query times. The tradeoff is memory: the index must fit in RAM.

# pgvector HNSW index
# m = max connections per node (higher = better recall, more memory)
# ef_construction = search width during build (higher = better recall, slower build)
cur.execute("""
    CREATE INDEX ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
""")

IVF (Inverted File Index)

Partitions vectors into clusters (using k-means), then only searches the clusters nearest to the query vector. Faster to build than HNSW, uses less memory, but lower recall at the same speed.

# pgvector IVF index
# lists = number of clusters (sqrt(n) is a common starting point)
cur.execute("""
    CREATE INDEX ON documents
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100)
""")

# At query time, set the number of clusters to search (probes)
cur.execute("SET ivfflat.probes = 10")  # search 10 of 100 clusters

Algorithm   Build Speed   Query Speed   Memory    Recall   Best For
HNSW        Slow          Fast          High      High     Most use cases
IVF         Fast          Medium        Medium    Medium   Large datasets, memory-constrained

For most applications, HNSW is the right default. IVF is useful when you have very large datasets and cannot afford the memory overhead of HNSW.

Filtering with Metadata

Real queries are rarely "find the 10 most similar vectors." They are "find the 10 most similar vectors that belong to this tenant, were created in the last 30 days, and are in the engineering category."

Pre-filtering vs Post-filtering

Pre-filtering:  Filter first, then search vectors within the filtered set.
                Accurate, but slow if the filter is very selective (small result set).

Post-filtering: Search vectors first, then filter results.
                Fast, but may return fewer than K results if many get filtered out.

Most vector databases use a hybrid approach. pgvector lets you combine SQL WHERE clauses with vector search:

cur.execute("""
    SELECT id, title, 1 - (embedding <=> %s::vector) AS similarity
    FROM documents
    WHERE category = 'engineering'
      AND created_at > NOW() - INTERVAL '30 days'
    ORDER BY embedding <=> %s::vector
    LIMIT 10
""", (str(query_embedding), str(query_embedding)))

Dedicated vector databases handle this differently. Qdrant, for example, uses payload filtering:

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
from datetime import datetime, timedelta

client = QdrantClient("localhost", port=6333)

results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[
            FieldCondition(key="category", match=MatchValue(value="engineering")),
            FieldCondition(
                key="created_at",
                range=Range(gte=(datetime.now() - timedelta(days=30)).isoformat())
            ),
        ]
    ),
    limit=10
)

Production Considerations

Sizing & Performance

Rule of thumb for pgvector with HNSW:
  - 1M vectors at 1536 dims: ~6 GB vectors + ~2 GB index = ~8 GB RAM
  - 5M vectors at 1536 dims: ~30 GB vectors + ~10 GB index = ~40 GB RAM
  - Query latency: 5-50ms depending on index params and hardware

Rule of thumb for Qdrant / Milvus:
  - Same vectors, ~30-50% less memory with quantization
  - Query latency: 1-10ms
  - Can shard across multiple nodes

Quantization

Storing full float32 vectors is expensive. Quantization reduces each float to fewer bits:

float32:  4 bytes per dimension  (full precision)
float16:  2 bytes per dimension  (minimal quality loss)
int8:     1 byte per dimension   (noticeable on edge cases)
binary:   1 bit per dimension    (fast but lossy, good for re-ranking stage)

pgvector supports halfvec (float16) natively. Qdrant and Milvus support scalar and product quantization.

Reindexing

When you add a significant number of vectors (say, 20%+ of the collection), HNSW indexes degrade. You need to rebuild the index periodically. With pgvector, this means REINDEX INDEX CONCURRENTLY. With managed vector databases, this is handled automatically.

Common Pitfalls

Choosing a dedicated vector database before you need one. pgvector in PostgreSQL handles millions of vectors. Start there. Migrate when you hit a real scaling wall, not a theoretical one.
No index on the vector column. Without an ANN index, every query is a full table scan. This is the number one performance complaint from pgvector users, and it is always a missing index.
Wrong distance metric. If you build an HNSW index with cosine distance but query with L2 distance, the index is useless and the database falls back to sequential scan.
Ignoring index build time. HNSW indexes on millions of vectors take minutes to hours to build. Plan for this during initial load and migrations.
Filtering after vector search. If you search for top 10 and then filter, you might end up with 2 results. Over-fetch (top 50) and filter, or use a database that supports pre-filtering.
Not monitoring recall. ANN search is approximate. Periodically spot-check results against brute-force search to verify your index parameters give acceptable recall.

Key Takeaways

pgvector in PostgreSQL is the right starting point for most applications. It handles millions of vectors, offers SQL joins with relational data, and requires no new infrastructure.
Purpose-built vector databases (Pinecone, Qdrant, Milvus) earn their place at hundreds of millions of vectors, sub-10ms latency requirements, or when you need distributed search.
HNSW is the default indexing algorithm. It gives the best recall-speed tradeoff for most workloads.
Always create an ANN index. Without one, every search is a full scan.
Metadata filtering is a first-class concern. Most real queries combine vector similarity with structured filters.
Quantization (float16, int8) cuts memory usage substantially with minimal quality loss.