Backup Strategies
Why Backups Are Non-Negotiable
Every database will eventually face a disaster. Hardware fails. Humans make mistakes. Software has bugs. The question is not whether you will need a backup, but whether you will have one when you do. A backup strategy is not a checkbox on a compliance form. It is the difference between a minor incident and a company-ending event.
The fundamental question behind any backup strategy: how much data can you afford to lose? The answer determines your approach, your tools, and your budget.
Logical Backups with pg_dump
pg_dump produces a logical backup: a set of SQL commands (or a custom-format archive) that can recreate your database from scratch. It reads the database through the normal SQL interface and outputs the schema and data.
# Plain SQL output (human readable, largest file size)
pg_dump -h localhost -U postgres -d myapp > myapp_backup.sql
# Custom format (compressed, supports parallel restore)
pg_dump -Fc -h localhost -U postgres -d myapp -f myapp_backup.dump
# Directory format (parallel dump, one file per table)
pg_dump -Fd -j 4 -h localhost -U postgres -d myapp -f myapp_backup_dir/
Restoring from a logical backup:
-- Plain SQL format
psql -h localhost -U postgres -d myapp_restored < myapp_backup.sql
# Custom format (supports parallel restore)
pg_restore -Fc -j 4 -h localhost -U postgres -d myapp_restored myapp_backup.dump
Strengths of pg_dump
Logical backups are portable. You can restore a pg_dump from Postgres 14 into Postgres 16. You can restore it on a different OS or architecture. You can selectively restore individual tables. The output is inspectable. You can grep the SQL file to see what is in your backup.
# Dump only specific tables
pg_dump -Fc -t orders -t order_items -d myapp -f orders_backup.dump
# Dump only schema (no data)
pg_dump --schema-only -d myapp > schema.sql
# Dump only data (no schema)
pg_dump --data-only -d myapp > data.sql
Weaknesses of pg_dump
pg_dump is slow for large databases. It reads every row through SQL. A 500 GB database might take hours to dump. During that time, the dump holds a snapshot, which means the backup represents a point in time when the dump started, not when it finished. The resulting file can be enormous.
pg_dump also cannot capture WAL state. You get a snapshot of the data, but you lose the ability to recover to any point between backups. If your last pg_dump was 12 hours ago, you lose up to 12 hours of data.
Physical Backups with pg_basebackup
pg_basebackup copies the entire data directory at the filesystem level. It produces a binary-identical copy of the database cluster, including all tablespaces and WAL files needed for consistency.
# Basic physical backup
pg_basebackup -h localhost -U replication_user -D /backups/base_backup \
--wal-method=stream --checkpoint=fast --progress
# Compressed backup as a tar archive
pg_basebackup -h localhost -U replication_user -D /backups/base_backup \
--format=tar --gzip --wal-method=stream --checkpoint=fast
Physical backups are fast because they copy raw files rather than reading through SQL. A 500 GB database backs up at disk I/O speed, not query speed. The backup includes everything: configuration, WAL position, all databases in the cluster.
The downside: physical backups are not portable across major Postgres versions or architectures. You cannot restore a pg_basebackup from Postgres 15 into Postgres 16. You get all-or-nothing; you cannot restore a single table from a physical backup without restoring the entire cluster.
Continuous Archiving with WAL
Write-Ahead Logging (WAL) records every change to the database before it is applied to data files. By archiving WAL segments, you create a continuous record of every modification. Combined with a base backup, WAL archiving enables Point-in-Time Recovery (PITR).
# postgresql.conf - enable WAL archiving
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'
The archive_command runs for every completed WAL segment (16 MB by default). You can replace the cp with a script that ships WAL to S3, GCS, or another durable storage system.
# Archive WAL to S3 using wal-g (a popular backup tool)
archive_command = 'wal-g wal-push %p'
# Or using pgBackRest
archive_command = 'pgbackrest --stanza=myapp archive-push %p'
With WAL archiving, your RPO (Recovery Point Objective) drops to the size of one WAL segment. In practice, with streaming replication, you can achieve near-zero data loss.
The 3-2-1 Backup Rule
The 3-2-1 rule is the minimum standard for backup resilience:
- 3 copies of your data (the original plus two backups)
- 2 different media types (local disk plus cloud storage, or disk plus tape)
- 1 offsite copy (a different data center, region, or cloud provider)
# Example 3-2-1 implementation
# Copy 1: Live database (primary server)
# Copy 2: Local pg_basebackup on the same network (fast restore)
pg_basebackup -D /local-backup/base/ --wal-method=stream
# Copy 3: WAL + base backup shipped to S3 in another region
wal-g backup-push /var/lib/postgresql/data
A backup that only exists on the same server as the database is not a backup. A backup that only exists in the same data center is one power failure away from gone. A backup that exists in S3 but has never been tested is a prayer, not a strategy.
Backup Frequency & RPO
Your backup frequency is directly tied to your RPO: the maximum amount of data you can afford to lose.
| Strategy | RPO | Restore Time | Complexity |
|---|---|---|---|
| Weekly pg_dump | Up to 7 days | Hours | Low |
| Daily pg_dump | Up to 24 hours | Hours | Low |
| Daily pg_basebackup | Up to 24 hours | Minutes to hours | Medium |
| pg_basebackup + WAL archiving | Seconds to minutes | Minutes to hours | Medium |
| Streaming replication + WAL | Near zero | Seconds (failover) | High |
# Cron job for daily base backups at 2 AM
0 2 * * * /usr/bin/pg_basebackup -h localhost -U backup_user \
-D /backups/daily/$(date +\%Y\%m\%d) \
--wal-method=stream --checkpoint=fast 2>> /var/log/backup.log
For most production systems, the answer is daily base backups with continuous WAL archiving. This gives you an RPO of seconds (the last archived WAL) with reasonable operational overhead.
Backup Tools Worth Knowing
# pgBackRest - full-featured backup solution
pgbackrest --stanza=myapp --type=full backup
pgbackrest --stanza=myapp --type=diff backup
pgbackrest --stanza=myapp --type=incr backup
# wal-g - cloud-native backup tool (S3, GCS, Azure)
wal-g backup-push /var/lib/postgresql/data
wal-g backup-list
# Barman - backup and recovery manager
barman backup main-server
barman list-backup main-server
pgBackRest supports full, differential, and incremental backups. wal-g is lightweight and cloud-native. Barman is the established choice for complex multi-server setups. All three handle WAL archiving, backup rotation, and verification.
Common Pitfalls
- Never testing restores. A backup you have never restored is not a backup. It is a file on disk that might contain your data. Schedule monthly restore tests.
- Storing backups only on the same server. Disk failure takes your database and your backup simultaneously.
- Ignoring backup monitoring. If your archive_command fails silently, you have no WAL archive and no PITR capability. Monitor archive status and alert on failures.
- Using pg_dump for large databases without considering alternatives. A 1 TB database can take hours to dump. pg_basebackup or incremental tools are better choices.
- Forgetting to back up configuration. pg_basebackup includes postgresql.conf and pg_hba.conf. pg_dump does not. Back up your configuration separately if using logical backups.
- No retention policy. Backups accumulate. Without a retention policy, you run out of disk space. Define how many days or copies you keep and automate cleanup.
Key Takeaways
- pg_dump produces portable, human-readable logical backups but is slow for large databases and cannot support PITR on its own.
- pg_basebackup creates fast physical backups that include everything, but they are version-specific and all-or-nothing.
- WAL archiving enables continuous backup and PITR when combined with a base backup. This is the foundation of serious backup strategies.
- The 3-2-1 rule (3 copies, 2 media types, 1 offsite) is the minimum standard. Most production systems should exceed it.
- Your backup frequency is determined by your RPO. If you cannot afford to lose more than 5 minutes of data, daily pg_dump is not enough.
- A backup that has never been restored is not a backup. Test regularly.