Migrations

Schema Changes in Production

Your application evolves. Tables need new columns. Columns change types. Indexes get added. Constraints get modified. Each of these changes must be applied to the production database without losing data, without downtime, and without breaking the application code that is currently running.

Database migrations are the discipline of managing these schema changes: writing them as versioned SQL scripts, applying them in order, and making them reversible. Done well, migrations are a non-event. Done poorly, they lock tables for 20 minutes and take your application offline.

The Basics: Up & Down SQL Files

A migration is a pair of SQL files: one that applies the change (up) and one that reverses it (down).

migrations/
  001_create_users.up.sql
  001_create_users.down.sql
  002_add_email_index.up.sql
  002_add_email_index.down.sql
  003_add_orders_table.up.sql
  003_add_orders_table.down.sql

-- 001_create_users.up.sql
CREATE TABLE users (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email text NOT NULL UNIQUE,
    name text NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now()
);

-- 001_create_users.down.sql
DROP TABLE users;

-- 002_add_email_index.up.sql
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);

-- 002_add_email_index.down.sql
DROP INDEX idx_users_email;

The migration tool tracks which migrations have been applied (usually in a schema_migrations table) and applies new ones in order.

Migration Tools

dbmate

# Install
brew install dbmate

# Create a new migration
dbmate new add_orders_table

# Apply pending migrations
dbmate up

# Rollback the last migration
dbmate down

# Check migration status
dbmate status

sqlx-cli (Rust ecosystem)

# Install
cargo install sqlx-cli

# Create a migration
sqlx migrate add create_orders

# Run migrations
sqlx migrate run

# Revert the last migration
sqlx migrate revert

Flyway

# Apply migrations
flyway migrate

# Check status
flyway info

# Validate applied migrations match local files
flyway validate

sqitch

# Initialize
sqitch init myapp --engine pg

# Add a change
sqitch add create_orders --requires create_users -n "Add orders table"

# Deploy
sqitch deploy

# Revert
sqitch revert --to create_users

Each tool has trade-offs. dbmate and sqlx-cli are simple and file-based. Flyway is Java-based and enterprise-oriented. sqitch uses a dependency graph rather than linear ordering. Choose the one that fits your stack.

Migration Best Practices

Always Make Migrations Reversible

Every up migration should have a corresponding down migration. If you cannot reverse a migration, you cannot recover from a bad deploy.

-- up: add a column
ALTER TABLE users ADD COLUMN phone text;

-- down: remove the column
ALTER TABLE users DROP COLUMN phone;

Some migrations are inherently destructive and cannot be fully reversed:

-- up: drop a column (data is gone)
ALTER TABLE users DROP COLUMN legacy_field;

-- down: re-add the column (but the data is lost)
ALTER TABLE users ADD COLUMN legacy_field text;

For destructive migrations, consider keeping the data accessible for a rollback period before dropping it.

Never Be Destructive in Production Without a Plan

Dropping columns, tables, or changing column types should follow the expand-contract pattern (covered below). Never run a destructive migration that could break running application instances.

One Change Per Migration

Keep migrations small and focused. A migration that creates a table, adds three indexes, and modifies another table is hard to debug if something fails halfway through.

-- Good: one migration, one change
-- 004_add_status_to_orders.up.sql
ALTER TABLE orders ADD COLUMN status text NOT NULL DEFAULT 'pending';

Use Transactions for Safety

Most migration tools wrap each migration in a transaction. If any statement fails, the entire migration is rolled back. Verify your tool does this, and be aware of exceptions:

-- CREATE INDEX CONCURRENTLY cannot run inside a transaction
-- This migration must be run outside a transaction
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);

Mark these migrations as non-transactional in your migration tool's configuration.

Zero-Downtime Migrations: The Expand-Contract Pattern

The expand-contract pattern ensures schema changes never break running application code. The idea: expand the schema to support both old and new code, then contract it after old code is gone.

Adding a Column

This is the simplest case. Adding a nullable column with no default does not lock the table (in Postgres 11+, adding a column with a DEFAULT also avoids a table rewrite).

-- Step 1: Add the column (instant, no lock)
ALTER TABLE users ADD COLUMN phone text;

-- Step 2: Deploy code that writes to the new column
-- Step 3: Backfill existing rows
UPDATE users SET phone = 'unknown' WHERE phone IS NULL;

-- Step 4: Add NOT NULL constraint if needed (after backfill)
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;

Renaming a Column

You cannot rename a column while old code is reading it. Use the expand-contract pattern:

-- Step 1: Add the new column
ALTER TABLE users ADD COLUMN full_name text;

-- Step 2: Backfill from old column
UPDATE users SET full_name = name WHERE full_name IS NULL;

-- Step 3: Deploy code that reads/writes both columns
-- Step 4: Deploy code that only uses new column
-- Step 5: Drop the old column (after all old code is gone)
ALTER TABLE users DROP COLUMN name;

Changing a Column Type

-- Step 1: Add a new column with the target type
ALTER TABLE orders ADD COLUMN amount_numeric numeric(12,2);

-- Step 2: Backfill (batched to avoid long transactions)
UPDATE orders SET amount_numeric = amount::numeric(12,2)
WHERE id BETWEEN 1 AND 100000;
-- Repeat in batches...

-- Step 3: Deploy code that writes to both columns
-- Step 4: Deploy code that reads from new column
-- Step 5: Drop old column
ALTER TABLE orders DROP COLUMN amount;

-- Step 6: Rename new column
ALTER TABLE orders RENAME COLUMN amount_numeric TO amount;

Dropping a Column

-- Step 1: Deploy code that no longer reads the column
-- Step 2: Deploy code that no longer writes the column
-- Step 3: Drop the column
ALTER TABLE users DROP COLUMN legacy_field;

Never drop a column while any running code references it.

The Migration That Locks Your Table for 20 Minutes

Some DDL operations take locks that block all reads and writes. The worst offenders:

-- These can lock the table for a long time on large tables:
ALTER TABLE orders ALTER COLUMN amount TYPE numeric(12,2);  -- Rewrites the table
ALTER TABLE orders ADD COLUMN total numeric NOT NULL;       -- Before Postgres 11
CREATE INDEX idx_orders_date ON orders (created_at);        -- Without CONCURRENTLY
VACUUM FULL orders;                                         -- Exclusive lock

Safe alternatives:

-- Use CONCURRENTLY for index creation (no lock, but takes longer)
CREATE INDEX CONCURRENTLY idx_orders_date ON orders (created_at);

-- Add a column with a default (no rewrite in Postgres 11+)
ALTER TABLE orders ADD COLUMN total numeric NOT NULL DEFAULT 0;

-- Set a short lock timeout to fail fast instead of blocking
SET lock_timeout = '5s';
ALTER TABLE orders ADD CONSTRAINT chk_positive CHECK (amount > 0);

Always set a lock_timeout before running DDL in production:

-- If the lock cannot be acquired in 5 seconds, abort rather than block
SET lock_timeout = '5s';

This prevents a migration from blocking all queries while it waits for a lock held by a long-running transaction.

Backfilling Large Tables

When adding a column that needs to be populated, do not update all rows in a single transaction:

-- Bad: locks the entire table, generates massive WAL, might run for hours
UPDATE users SET new_column = compute_value(old_column);

-- Good: batch updates
DO $$
DECLARE
    batch_size int := 10000;
    min_id bigint;
    max_id bigint;
BEGIN
    SELECT min(id), max(id) INTO min_id, max_id FROM users;
    FOR i IN 0..((max_id - min_id) / batch_size) LOOP
        UPDATE users SET new_column = compute_value(old_column)
        WHERE id BETWEEN min_id + (i * batch_size)
          AND min_id + ((i + 1) * batch_size) - 1
          AND new_column IS NULL;
        COMMIT;
    END LOOP;
END $$;

Batching keeps transactions short, generates manageable WAL, and allows autovacuum to clean up between batches.

Common Pitfalls

Running CREATE INDEX without CONCURRENTLY in production. A regular CREATE INDEX locks the table for writes. On a large table, this can block all writes for minutes or hours.
Not setting lock_timeout. Without it, a DDL statement waits indefinitely for a lock. Set lock_timeout to fail fast and retry rather than block all traffic.
Deploying code and migration simultaneously. If the migration fails but the code deploys, the application breaks. Run migrations before deploying new code.
Dropping columns before all code stops using them. If any running instance references the column, it gets errors. Use expand-contract: remove code references first, then drop the column.
Backfilling in a single transaction. Updating millions of rows in one transaction generates massive WAL, holds locks for the duration, and prevents autovacuum from running. Batch updates.
Not testing migrations on a production-sized dataset. A migration that takes 1 second on 1000 rows might take 20 minutes on 10 million rows. Test against a realistic data volume.
Missing down migrations. If you cannot roll back, a bad migration becomes a crisis. Always write the down migration, even if you think you will never need it.

Key Takeaways

Migrations are versioned SQL files (up/down pairs) tracked by a migration tool. Keep them small, focused, and reversible.
Use the expand-contract pattern for zero-downtime migrations: add new, migrate data, switch code, remove old.
CREATE INDEX CONCURRENTLY is mandatory in production. Regular CREATE INDEX locks the table for writes.
Set lock_timeout before any DDL in production. Fail fast rather than block all queries.
Backfill large tables in batches, not in a single transaction. Batch updates keep transactions short and manageable.
Test migrations on production-sized data before applying them. Performance on a small test database tells you nothing about production.
Deploy migrations before code. Roll back code before rolling back migrations. The order matters.