Choosing the Right Types
Why Types Matter
Choosing the right data type is not a minor detail. It affects storage size, query performance, index efficiency, and application correctness. Postgres has a rich type system, and the defaults most developers reach for are often not the best choice.
Getting types right at the start saves painful migrations later. Changing a column type on a 500-million-row table is an operation you want to avoid.
Integer Types
Postgres provides three integer types:
| Type | Storage | Range |
|---|---|---|
| smallint (int2) | 2 bytes | -32,768 to 32,767 |
| integer (int4) | 4 bytes | -2,147,483,648 to 2,147,483,647 |
| bigint (int8) | 8 bytes | -9.2 quintillion to 9.2 quintillion |
When to Use Each
smallint: Status codes, enum-like values, small counters. Saves 2 bytes per row compared to integer. On a billion-row table, that is 2GB of savings.
integer: The default choice for most columns. Counters, foreign keys to tables under 2 billion rows, quantities.
bigint: Primary keys on tables that may grow large. Any column that could exceed 2.1 billion. Use bigint for IDs in any system that might scale.
CREATE TABLE orders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id bigint NOT NULL REFERENCES users(id),
quantity integer NOT NULL,
status smallint NOT NULL DEFAULT 0
);
The most common mistake: using integer for primary keys on tables that eventually exceed 2.1 billion rows. The migration to bigint on a huge table is expensive. Start with bigint for any table that might grow indefinitely.
Primary Key Generation: Serial vs Identity vs UUID
SERIAL (Legacy)
CREATE TABLE users (
id serial PRIMARY KEY -- creates a sequence, type is integer
);
serial is syntactic sugar that creates a sequence and sets the column default. It has quirks: the sequence is not truly owned by the column in all cases, and serial does not prevent manual inserts that break the sequence.
IDENTITY (Preferred)
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
);
GENERATED ALWAYS AS IDENTITY is the SQL standard approach. It prevents manual ID inserts by default (you must explicitly override with OVERRIDING SYSTEM VALUE). Use this over serial for new tables.
UUID
CREATE TABLE events (
id uuid PRIMARY KEY DEFAULT gen_random_uuid()
);
UUIDs are 16 bytes (vs 8 bytes for bigint). They are useful when:
- IDs are generated by clients (distributed systems)
- You need globally unique identifiers across databases
- You want to prevent ID enumeration
UUIDs are worse for index performance. B-tree indexes on random UUIDs cause heavy random I/O and page splits. If you use UUIDs, consider UUIDv7 (time-ordered) which maintains insert order.
-- UUIDv7 via pg_uuidv7 extension or manual generation
-- Keeps chronological order, much better for B-tree indexes
For most applications, bigint GENERATED ALWAYS AS IDENTITY is the right choice.
Text Types: text vs varchar vs char
| Type | Storage | Behavior |
|---|---|---|
| text | Variable, unlimited | No length restriction |
| varchar(n) | Variable, max n chars | Error if input exceeds n |
| char(n) | Fixed, padded to n chars | Blank-padded, wastes space |
Just Use text
In Postgres, text and varchar have identical performance. There is no storage or speed advantage to varchar. The only difference is that varchar(n) enforces a length limit.
-- These perform identically
CREATE TABLE a (name text);
CREATE TABLE b (name varchar(255));
char(n) pads values with spaces to the specified length. It wastes storage and causes subtle comparison bugs. Never use it.
If you need a length constraint, use a CHECK constraint. It is more explicit and easier to modify:
CREATE TABLE users (
email text NOT NULL CHECK (length(email) <= 320),
name text NOT NULL CHECK (length(name) <= 200)
);
Changing a CHECK constraint does not rewrite the table. Changing varchar(100) to varchar(200) does not rewrite either (since Postgres 9.2), but changing varchar(200) to varchar(100) requires a table scan.
Timestamp Types
| Type | Storage | Contains |
|---|---|---|
| timestamp | 8 bytes | Date and time, no timezone |
| timestamptz | 8 bytes | Date and time, with timezone |
Always Use timestamptz
timestamp (without timezone) stores the literal date and time you give it. If you insert 2024-03-15 10:00:00 in New York and query it in London, you still get 2024-03-15 10:00:00. There is no way to know what timezone that refers to.
timestamptz stores the value internally as UTC. On input, it converts from the session's timezone to UTC. On output, it converts from UTC to the session's timezone.
SET timezone = 'America/New_York';
INSERT INTO events (created_at) VALUES ('2024-03-15 10:00:00');
SET timezone = 'Europe/London';
SELECT created_at FROM events;
-- Returns: 2024-03-15 14:00:00+00 (correctly adjusted)
Using timestamp instead of timestamptz is one of the most common schema mistakes. It leads to bugs when users are in different timezones, when daylight saving time changes, and when servers move between regions.
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_type text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
Numeric Types for Money & Precision
| Type | Storage | Precision |
|---|---|---|
| numeric(p,s) | Variable | Exact, user-defined |
| real (float4) | 4 bytes | 6 decimal digits |
| double precision | 8 bytes | 15 decimal digits |
Use numeric for Money
Floating-point types (real, double precision) cannot represent decimal fractions exactly:
SELECT 0.1::double precision + 0.2::double precision;
0.30000000000000004
For financial calculations, this is unacceptable. Use numeric:
SELECT 0.1::numeric + 0.2::numeric;
0.3
CREATE TABLE line_items (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
product_id bigint NOT NULL,
unit_price numeric(12, 2) NOT NULL,
quantity integer NOT NULL,
total numeric(14, 2) GENERATED ALWAYS AS (unit_price * quantity) STORED
);
numeric is slower than floating-point for calculations. For scientific data, sensor readings, or statistical computations where exact decimal representation is not needed, double precision is fine.
The Money Type
Postgres has a money type. Do not use it. It is locale-dependent, hard to work with in application code, and numeric does everything better.
Boolean
ALTER TABLE users ADD COLUMN is_active boolean NOT NULL DEFAULT true;
SELECT * FROM users WHERE is_active;
SELECT * FROM users WHERE NOT is_active;
Postgres booleans accept true, false, t, f, yes, no, on, off, 1, 0 on input. Always use true/false for clarity.
Partial indexes pair well with booleans:
-- Only index the rare case
CREATE INDEX idx_users_inactive ON users (id) WHERE NOT is_active;
Date & Interval
-- Date: calendar date without time
SELECT current_date; -- 2024-03-15
-- Interval: duration of time
SELECT now() - interval '30 days';
SELECT age(created_at) FROM users;
age
------------------------
2 years 3 mons 15 days
Intervals are useful for expiration logic, age calculations, and time-window queries:
SELECT * FROM sessions
WHERE last_active_at < now() - interval '30 minutes';
Arrays
Postgres supports arrays of any type:
CREATE TABLE articles (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title text NOT NULL,
tags text[] NOT NULL DEFAULT '{}'
);
INSERT INTO articles (title, tags)
VALUES ('Postgres Internals', ARRAY['postgres', 'database', 'internals']);
SELECT * FROM articles WHERE 'postgres' = ANY(tags);
SELECT * FROM articles WHERE tags @> ARRAY['postgres', 'database'];
GIN indexes make array queries fast:
CREATE INDEX idx_articles_tags ON articles USING gin(tags);
Arrays are appropriate for small, fixed-meaning lists. If you find yourself querying array elements by position or joining on array contents frequently, normalize into a separate table instead.
Enums vs Check Constraints
-- Enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered');
-- Check constraint (simpler)
CREATE TABLE orders (
status text NOT NULL CHECK (status IN ('pending', 'processing', 'shipped', 'delivered'))
);
Enums are stored efficiently (4 bytes) and enforce values at the type level. But adding a value requires ALTER TYPE, and removing or reordering values is impossible without recreating the type. Check constraints are more flexible and easier to modify.
For most cases, use a text column with a check constraint. Use enums when you need the storage efficiency on very large tables.
Common Pitfalls
- Using integer for primary keys on growing tables. The 2.1 billion limit is reached faster than you expect. Default to bigint.
- Using timestamp instead of timestamptz. You will have timezone bugs. Always use timestamptz.
- Using float for money. Floating-point arithmetic produces rounding errors. Use numeric for any financial data.
- Using char(n). It pads with spaces, wastes storage, and causes comparison surprises. Use text.
- Using varchar(255) by habit. This is a MySQL/SQL Server convention. In Postgres, it offers no performance benefit over text.
- Using the money type. It is locale-dependent and inflexible. Use numeric(12,2) instead.
- Choosing UUID without understanding the index cost. Random UUIDs cause heavy B-tree fragmentation. Use bigint or time-ordered UUIDs.
Key Takeaways
- Default to bigint for primary keys, text for strings, timestamptz for times, numeric for money, and boolean for flags.
- Use
GENERATED ALWAYS AS IDENTITYover serial for primary key generation. - text and varchar perform identically in Postgres. Use text with CHECK constraints for length limits.
- timestamptz stores UTC internally and converts on display. timestamp stores a bare value with no timezone awareness.
- Floating-point types are inexact. Use numeric when precision matters.
- Arrays and JSONB give flexibility within a relational schema, but normalize when the data has clear relational structure.