Point-in-Time Recovery
The Accidental DELETE at 3 PM
Someone ran DELETE FROM orders WHERE status = 'pending' without a WHERE clause limiting it to test data. They meant to delete 50 rows. They deleted 2 million. The transaction committed. The data is gone from the live database. This scenario plays out at companies every week.
Point-in-Time Recovery (PITR) lets you restore your database to any moment before that DELETE happened. Not just to the last backup, but to 2:59:47 PM, one second before the mistake. PITR is the safety net that turns a catastrophe into an inconvenience.
How PITR Works
PITR combines two components:
- A base backup (a complete copy of the database at some point in time)
- WAL archives (a continuous record of every change since that base backup)
The recovery process replays WAL from the base backup forward, applying every change in order, until it reaches your target point in time. Then it stops and opens the database in a consistent state as of that moment.
Base Backup WAL Segments Target
(Monday 2 AM) (continuous) (Wednesday 2:59 PM)
| |
v v
[Base] --> [WAL 1] --> [WAL 2] --> ... [WAL N] --> STOP
Every committed transaction between the base backup and the target time is present. Every transaction after the target time is discarded. The database opens as if time stopped at your chosen moment.
Setting Up WAL Archiving
PITR requires WAL archiving to be configured before disaster strikes. You cannot retroactively archive WAL that was already recycled.
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
The archive_command copies each completed WAL segment to a durable location. The test ! -f check prevents overwriting an existing archive, which protects against edge cases during recovery.
For production systems, use a proper backup tool:
# Using pgBackRest
archive_command = 'pgbackrest --stanza=myapp archive-push %p'
# Using wal-g with S3
archive_command = 'wal-g wal-push %p'
After changing these settings, restart Postgres:
sudo systemctl restart postgresql
Verify archiving works:
-- Force a WAL segment switch to test archiving
SELECT pg_switch_wal();
# Check that WAL files appear in the archive
ls -la /archive/
Taking a Base Backup for PITR
The base backup is your starting point. Take one immediately after enabling WAL archiving, and then regularly (daily or weekly depending on your RPO and RTO requirements).
# Standard base backup for PITR
pg_basebackup -h localhost -U replication_user \
-D /backups/base/2026-04-18 \
--wal-method=stream \
--checkpoint=fast \
--label="pitr_base_20260418" \
--progress
The --wal-method=stream flag ensures the backup includes the WAL needed for consistency. The --label helps you identify backups during recovery.
# With pgBackRest (handles base backup and WAL archiving together)
pgbackrest --stanza=myapp --type=full backup
The Recovery Process Step by Step
When disaster strikes, here is the exact sequence:
Step 1: Stop PostgreSQL
sudo systemctl stop postgresql
Step 2: Preserve the Current Data Directory
Do not delete the current data directory yet. You might need it for forensics.
mv /var/lib/postgresql/16/main /var/lib/postgresql/16/main_broken
Step 3: Restore the Base Backup
# Copy the base backup into position
cp -r /backups/base/2026-04-18 /var/lib/postgresql/16/main
# Set correct ownership
chown -R postgres:postgres /var/lib/postgresql/16/main
Step 4: Configure Recovery
Create a recovery signal file and configure the recovery target:
# Create the recovery signal file
touch /var/lib/postgresql/16/main/recovery.signal
# postgresql.conf (or a separate conf file)
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2026-04-18 14:59:47+00'
recovery_target_action = 'pause'
The restore_command tells Postgres how to fetch archived WAL segments. The recovery_target_time is where replay stops. The recovery_target_action = 'pause' keeps the database in recovery mode so you can verify the state before promoting.
Step 5: Start PostgreSQL
sudo systemctl start postgresql
Postgres starts in recovery mode. It reads WAL from the archive, replaying transactions in order. You can monitor progress in the logs:
tail -f /var/log/postgresql/postgresql-16-main.log
LOG: starting point-in-time recovery to 2026-04-18 14:59:47+00
LOG: restored log file "000000010000000000000042" from archive
LOG: restored log file "000000010000000000000043" from archive
...
LOG: recovery stopping before commit of transaction 58234, time 2026-04-18 14:59:48+00
LOG: pausing at the end of recovery
Step 6: Verify & Promote
Connect and verify the data looks correct:
-- Check that the accidentally deleted rows are present
SELECT count(*) FROM orders WHERE status = 'pending';
If everything looks good, promote the server to accept writes:
SELECT pg_wal_replay_resume();
Or if using recovery_target_action = 'promote', it promotes automatically.
Recovery Targets
PITR supports several target types, depending on what information you have about the incident.
By Timestamp
recovery_target_time = '2026-04-18 14:59:47+00'
The most common choice. "Restore to one minute before the incident." Requires knowing approximately when the problem occurred.
By Transaction ID
recovery_target_xid = '58233'
If you know the exact transaction that caused the problem, you can stop just before it. Finding the transaction ID requires examining WAL or logs.
By Named Restore Point
-- Create a named restore point before a risky operation
SELECT pg_create_restore_point('before_migration_20260418');
recovery_target_name = 'before_migration_20260418'
Named restore points are excellent before migrations, bulk operations, or any planned risky change. Create one as a habit before running anything destructive.
By LSN (Log Sequence Number)
recovery_target_lsn = '0/1A2B3C4D'
The most precise target. Used when you have exact WAL position information from logs or monitoring.
Inclusive vs Exclusive
# Stop AFTER the target transaction (inclusive)
recovery_target_inclusive = true
# Stop BEFORE the target transaction (exclusive, the default)
recovery_target_inclusive = false
Testing Your Backups
If you do not test your backups, you do not have backups. You have files on disk that might contain your data, but you have no evidence they work.
# Monthly PITR test script
#!/bin/bash
set -e
TEST_DIR="/tmp/pitr_test"
mkdir -p "$TEST_DIR"
# Restore base backup to a temp location
cp -r /backups/base/latest "$TEST_DIR/data"
# Configure recovery to a specific time
cat > "$TEST_DIR/data/recovery.signal" <<'SIGNAL'
SIGNAL
cat >> "$TEST_DIR/data/postgresql.conf" <<CONF
restore_command = 'cp /archive/%f %p'
recovery_target_time = '$(date -u +"%Y-%m-%d %H:%M:%S+00" -d "1 hour ago")'
recovery_target_action = 'promote'
port = 5433
CONF
chown -R postgres:postgres "$TEST_DIR"
# Start on a different port
pg_ctl -D "$TEST_DIR/data" start -w
# Run validation queries
psql -p 5433 -c "SELECT count(*) FROM orders;"
psql -p 5433 -c "SELECT max(created_at) FROM orders;"
# Cleanup
pg_ctl -D "$TEST_DIR/data" stop
rm -rf "$TEST_DIR"
echo "PITR test passed"
Automate this test. Run it monthly at minimum. Alert if it fails. Include it in your disaster recovery runbook.
Common Pitfalls
- Not enabling WAL archiving before the disaster. PITR requires WAL archives that were collected before the incident. If archiving was not enabled, PITR is impossible.
- Running out of archive storage. WAL archives grow continuously. Without retention policies, you fill up disk and archiving silently stops. Monitor archive storage and set retention.
- Using the wrong timezone in recovery_target_time. Always use timestamptz format with explicit timezone. An off-by-hours recovery target can mean recovering past the incident.
- Forgetting recovery.signal. Without this file, Postgres starts normally and does not enter recovery mode. Your restore_command configuration is ignored.
- Not testing recovery regularly. A backup that has never been restored is an untested hypothesis. Validate your PITR process at least monthly.
- Losing WAL between base backup and archive. If there is a gap in your WAL archive, recovery cannot proceed past the gap. Continuous monitoring of archive status is essential.
Key Takeaways
- PITR lets you restore to any point in time, not just the last backup. It combines a base backup with continuous WAL archives.
- WAL archiving must be configured and working before disaster strikes. There is no retroactive PITR.
- The recovery process is mechanical: restore base backup, configure recovery target, start Postgres, verify, promote.
- Named restore points are a best practice before any risky operation. They cost nothing and provide a precise recovery target.
- Test PITR regularly. An untested backup strategy is no strategy at all. Automate monthly restore tests and alert on failures.
- Recovery targets can be timestamps, transaction IDs, named restore points, or LSNs. Use whichever gives you the most precision for your situation.