Schema Design Patterns
Normalization: When to Do It
Normalization eliminates data redundancy by decomposing tables so each fact is stored once. Third Normal Form (3NF) is the standard target for transactional systems.
-- Denormalized: order stores customer name and email
CREATE TABLE orders_denorm (
id bigint PRIMARY KEY,
customer_name text,
customer_email text,
product_name text,
product_price numeric(10,2),
quantity integer
);
-- Normalized: each fact stored once
CREATE TABLE customers (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
email text NOT NULL UNIQUE
);
CREATE TABLE products (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
price numeric(10,2) NOT NULL
);
CREATE TABLE orders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customers(id),
product_id bigint NOT NULL REFERENCES products(id),
quantity integer NOT NULL,
unit_price numeric(10,2) NOT NULL, -- snapshot price at time of order
created_at timestamptz NOT NULL DEFAULT now()
);
Notice unit_price in the orders table. This is intentional denormalization: you snapshot the price at order time because the product price may change later. This is a common and correct pattern.
When to Denormalize
Denormalization is appropriate when:
- Read performance matters more than write consistency. Materialized views and summary tables are a form of denormalization.
- The data is a historical snapshot. Order line items should capture the price at the time of purchase, not a live reference.
- Join overhead is measurable. Profile first. Postgres is extremely efficient at joins. Most denormalization is premature.
-- Materialized view: denormalized for reporting
CREATE MATERIALIZED VIEW order_summary AS
SELECT
o.id AS order_id,
c.name AS customer_name,
p.name AS product_name,
o.quantity,
o.unit_price,
o.quantity * o.unit_price AS total
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN products p ON p.id = o.product_id;
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY order_summary;
Foreign Keys: Use Them
Foreign keys enforce referential integrity at the database level. Some teams skip them for "performance." Do not do this.
CREATE TABLE orders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id bigint NOT NULL REFERENCES users(id),
status text NOT NULL DEFAULT 'pending'
);
Postgres enforces foreign keys efficiently. The referenced column must have a unique index (primary key or unique constraint), so the lookup is an index scan. The cost is marginal compared to the data integrity you gain.
ON DELETE Behavior
-- Prevent deletion of referenced rows (default)
REFERENCES users(id) ON DELETE RESTRICT
-- Cascade deletes to child rows
REFERENCES users(id) ON DELETE CASCADE
-- Set child column to NULL on parent deletion
REFERENCES users(id) ON DELETE SET NULL
-- Set child column to default value
REFERENCES users(id) ON DELETE SET DEFAULT
Use CASCADE carefully. Deleting a user should probably not silently delete all their orders. RESTRICT (the default) forces you to handle dependencies explicitly in application code.
Index the Foreign Key Column
Postgres does not automatically create an index on the foreign key column. You must do it yourself:
CREATE INDEX idx_orders_user_id ON orders (user_id);
Without this index, any DELETE or UPDATE on the parent table triggers a sequential scan on the child table to check for references. On large tables, this causes severe lock contention.
Naming Conventions
Consistent naming makes schemas readable and tooling predictable.
snake_case Everything
-- Good
CREATE TABLE user_accounts (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
first_name text NOT NULL,
last_name text NOT NULL,
email_address text NOT NULL
);
-- Bad: mixed case requires quoting everywhere
CREATE TABLE "UserAccounts" (
"Id" bigint PRIMARY KEY,
"FirstName" text NOT NULL
);
Postgres folds unquoted identifiers to lowercase. Mixed-case names require double quotes in every query, which is error-prone and tedious.
Singular Table Names
-- Preferred: singular
CREATE TABLE customer (...);
CREATE TABLE order_item (...);
-- Also common: plural (pick one and be consistent)
CREATE TABLE customers (...);
CREATE TABLE order_items (...);
The key is consistency within a project. Mixing singular and plural is worse than either convention alone.
_id Suffix for Foreign Keys
CREATE TABLE orders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id bigint NOT NULL REFERENCES users(id),
product_id bigint NOT NULL REFERENCES products(id)
);
The _id suffix makes join conditions self-documenting:
SELECT * FROM orders o JOIN users u ON u.id = o.user_id;
Index Naming
-- Pattern: idx_{table}_{columns}
CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_created_at ON orders (created_at);
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
Timestamps on Every Table
Every table should have created_at. Most should also have updated_at:
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email text NOT NULL UNIQUE,
name text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
Update updated_at with a trigger:
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS trigger AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_users_updated_at
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_updated_at();
This trigger is reusable across all tables. now() inside a transaction returns the transaction start time, so all updates within a single transaction get the same timestamp.
Soft Deletes vs Hard Deletes
Soft Deletes
Mark rows as deleted instead of removing them:
ALTER TABLE users ADD COLUMN deleted_at timestamptz;
-- "Delete" a user
UPDATE users SET deleted_at = now() WHERE id = 1;
-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;
Advantages:
- Data is recoverable
- Audit trail is preserved
- Foreign key references remain valid
Disadvantages:
- Every query must filter on
deleted_at IS NULL - Table size grows without bound
- Unique constraints need partial indexes
-- Unique email only among non-deleted users
CREATE UNIQUE INDEX idx_users_email_active
ON users (email)
WHERE deleted_at IS NULL;
Hard Deletes
DELETE FROM users WHERE id = 1;
Advantages:
- Simple queries (no filter needed)
- Table stays compact
- Unique constraints work naturally
Disadvantages:
- Data is gone (unless you have an audit table)
- Foreign key cascades need careful handling
The Practical Choice
Use hard deletes with an audit table for most systems. Soft deletes are appropriate when:
- Regulatory requirements mandate data retention
- Users expect an "undo" feature
- The table is small enough that growth is not a concern
The Audit Table Pattern
Track who changed what and when:
CREATE TABLE audit_log (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
table_name text NOT NULL,
record_id bigint NOT NULL,
action text NOT NULL CHECK (action IN ('INSERT', 'UPDATE', 'DELETE')),
old_data jsonb,
new_data jsonb,
changed_by text,
changed_at timestamptz NOT NULL DEFAULT now()
);
Automate with a trigger:
CREATE OR REPLACE FUNCTION audit_trigger()
RETURNS trigger AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by)
VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', to_jsonb(NEW), current_user);
RETURN NEW;
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by)
VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', to_jsonb(OLD), to_jsonb(NEW), current_user);
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by)
VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', to_jsonb(OLD), current_user);
RETURN OLD;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_users_audit
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger();
The audit table grows large. Partition it by time and archive old partitions:
CREATE TABLE audit_log (
id bigint GENERATED ALWAYS AS IDENTITY,
table_name text NOT NULL,
record_id bigint NOT NULL,
action text NOT NULL,
old_data jsonb,
new_data jsonb,
changed_by text,
changed_at timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (changed_at);
CREATE TABLE audit_log_2024_q1 PARTITION OF audit_log
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE audit_log_2024_q2 PARTITION OF audit_log
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
Other Useful Patterns
Status Columns with Check Constraints
CREATE TABLE orders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
status text NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
created_at timestamptz NOT NULL DEFAULT now()
);
Immutable Insert-Only Tables
For event logs and time-series data, make tables insert-only with no updates or deletes:
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_type text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
-- Prevent updates and deletes at the trigger level
CREATE OR REPLACE FUNCTION prevent_modification()
RETURNS trigger AS $$
BEGIN
RAISE EXCEPTION 'Modifications to % are not allowed', TG_TABLE_NAME;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_events_immutable
BEFORE UPDATE OR DELETE ON events
FOR EACH ROW EXECUTE FUNCTION prevent_modification();
Lookup Tables
CREATE TABLE countries (
code char(2) PRIMARY KEY, -- one of the rare valid uses of char
name text NOT NULL
);
CREATE TABLE addresses (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
country_code char(2) NOT NULL REFERENCES countries(code),
city text NOT NULL,
postal_code text
);
Common Pitfalls
- Skipping foreign keys for performance. The cost is minimal. The data integrity is invaluable. Orphaned rows cause application bugs that are far more expensive than the index lookup.
- Not indexing foreign key columns. Postgres does not auto-create these indexes. Missing them causes full table scans during parent row modifications.
- Mixing naming conventions. snake_case in some tables, camelCase in others. Pick one convention and enforce it.
- Forgetting updated_at timestamps. Without them, debugging "when did this change?" requires an audit log or guesswork.
- Over-normalizing. Normalizing a table that is always queried with a specific join just to satisfy 3NF adds complexity without benefit. Pragmatism beats theory.
- Soft deletes without partial unique indexes. If you soft-delete a user with email X, then a new user signs up with email X, a plain unique index rejects the insert. You need a partial unique index filtered on
deleted_at IS NULL.
Key Takeaways
- Normalize by default. Denormalize with intention and measurement, not assumption.
- Use foreign keys. Index the foreign key columns. Use ON DELETE behavior that matches your business logic.
- Adopt consistent naming: snake_case, singular or plural (pick one), _id suffix for foreign keys.
- Add created_at and updated_at to every table. Automate updated_at with a reusable trigger.
- Choose between soft deletes and hard deletes based on your requirements. Hard deletes with an audit table is the most common production pattern.
- The audit table pattern (trigger + JSONB old/new data + partitioning) provides a complete change history without complicating your main tables.