JSON & JSONB
json vs jsonb
Postgres has two JSON types. Use JSONB.
| Feature | json | jsonb |
|---|---|---|
| Storage | Raw text | Binary, decomposed |
| Duplicate keys | Preserved | Last value wins |
| Key order | Preserved | Not preserved |
| Indexing | Not supported | GIN indexes supported |
| Operators | Limited | Full set (@>, ?, etc.) |
| Write speed | Slightly faster | Slightly slower (parsing) |
| Read/query speed | Slower (re-parsed) | Faster (pre-parsed) |
The json type stores the raw JSON text. Every time you query it, Postgres re-parses the text. It exists for edge cases where you need to preserve exact formatting or duplicate keys. For everything else, use jsonb.
-- JSONB: the type you want
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_type text NOT NULL,
payload jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now()
);
When to Use JSONB
JSONB is the right choice when:
Flexible Attributes
Products in an e-commerce system have different attributes per category. Shirts have size and color. Laptops have RAM and screen size. A rigid relational schema requires either a wide table with mostly-null columns or an EAV pattern (entity-attribute-value, which is worse).
CREATE TABLE products (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
category text NOT NULL,
price numeric(10,2) NOT NULL,
attributes jsonb NOT NULL DEFAULT '{}'
);
INSERT INTO products (name, category, price, attributes) VALUES
('Cotton T-Shirt', 'clothing', 29.99,
'{"size": "L", "color": "blue", "material": "cotton"}'),
('ThinkPad X1', 'electronics', 1499.99,
'{"ram_gb": 16, "screen_inches": 14, "cpu": "i7-1365U"}');
The relational columns (name, category, price) are queried, filtered, and joined on. The JSONB column holds the flexible part.
API Responses & External Data
When you store webhook payloads, API responses, or third-party data, the schema is outside your control and may change:
CREATE TABLE webhook_events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
provider text NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL,
received_at timestamptz NOT NULL DEFAULT now()
);
Event Data & Audit Logs
Events and audit records often have type-specific details:
INSERT INTO audit_log (action, details) VALUES
('user.login', '{"ip": "192.168.1.100", "user_agent": "Mozilla/5.0"}'),
('order.placed', '{"order_id": 42, "items": 3, "total": 159.99}'),
('payment.failed', '{"order_id": 42, "reason": "insufficient_funds", "gateway": "stripe"}');
JSONB Operators
Extraction
-- -> returns jsonb (preserves type)
SELECT payload->'size' FROM products; -- "L" (jsonb string)
-- ->> returns text
SELECT payload->>'size' FROM products; -- L (text)
-- Nested access
SELECT payload->'address'->>'city' FROM users;
-- #> path extraction (jsonb)
SELECT payload #> '{address,city}' FROM users;
-- #>> path extraction (text)
SELECT payload #>> '{address,city}' FROM users;
Containment
-- @> : does the left value contain the right value?
SELECT * FROM products
WHERE attributes @> '{"color": "blue"}';
-- <@ : is the left value contained in the right value?
SELECT * FROM products
WHERE '{"color": "blue"}'::jsonb <@ attributes;
Containment is the most index-friendly operator. GIN indexes support @> natively.
Existence
-- ? : does the key exist?
SELECT * FROM products WHERE attributes ? 'color';
-- ?| : does any of these keys exist?
SELECT * FROM products WHERE attributes ?| array['color', 'size'];
-- ?& : do all of these keys exist?
SELECT * FROM products WHERE attributes ?& array['color', 'size'];
Concatenation & Removal
-- || : merge two JSONB values
UPDATE products
SET attributes = attributes || '{"on_sale": true}'
WHERE id = 1;
-- - : remove a key
UPDATE products
SET attributes = attributes - 'on_sale'
WHERE id = 1;
-- #- : remove a nested key by path
UPDATE products
SET attributes = attributes #- '{address,zip}'
WHERE id = 1;
JSONB Functions
-- Extract keys
SELECT jsonb_object_keys(attributes) FROM products WHERE id = 1;
-- Expand to rows
SELECT * FROM jsonb_each(
'{"name": "Alice", "age": 30}'::jsonb
);
key | value
------+--------
age | 30
name | "Alice"
-- Expand array to rows
SELECT * FROM jsonb_array_elements(
'[1, 2, 3]'::jsonb
);
-- Build JSONB from columns
SELECT jsonb_build_object(
'id', id,
'name', name,
'email', email
) FROM users;
-- Aggregate rows into a JSON array
SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name))
FROM users
WHERE created_at > '2024-01-01';
jsonb_set for Nested Updates
-- Update a nested field
UPDATE products
SET attributes = jsonb_set(attributes, '{color}', '"red"')
WHERE id = 1;
-- Set a nested path (creates intermediate objects if create_missing is true)
UPDATE users
SET profile = jsonb_set(profile, '{preferences,theme}', '"dark"', true)
WHERE id = 1;
GIN Indexes on JSONB
GIN (Generalized Inverted Index) indexes make JSONB queries fast. Without an index, every JSONB query is a sequential scan.
Default GIN Index
CREATE INDEX idx_products_attributes ON products USING gin(attributes);
This supports @>, ?, ?|, and ?& operators. It does not support ->>-based comparisons.
-- Uses the GIN index
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
-- Does NOT use the GIN index
SELECT * FROM products WHERE attributes->>'color' = 'blue';
jsonb_path_ops Index
CREATE INDEX idx_products_attributes ON products
USING gin(attributes jsonb_path_ops);
jsonb_path_ops is smaller and faster than the default operator class but only supports the @> containment operator. If containment queries are your primary access pattern, use this.
Expression Indexes for Specific Fields
If you frequently query a specific JSONB field with equality or range comparisons, create a B-tree index on the extracted value:
-- Index a specific extracted field
CREATE INDEX idx_products_color ON products ((attributes->>'color'));
-- Now this uses a B-tree index scan
SELECT * FROM products WHERE attributes->>'color' = 'blue';
-- For numeric comparisons, cast the extracted value
CREATE INDEX idx_products_ram ON products (((attributes->>'ram_gb')::int));
SELECT * FROM products
WHERE (attributes->>'ram_gb')::int >= 16;
When NOT to Use JSONB
When You Know the Schema
If every row has the same structure and you know it at design time, use relational columns:
-- Bad: known schema stuffed into JSONB
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
data jsonb NOT NULL -- contains name, email, created_at for every row
);
-- Good: proper relational columns
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
email text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
Relational columns give you type safety, NOT NULL constraints, default values, foreign keys, and better query optimization. JSONB gives you none of these.
When You Need Referential Integrity
JSONB fields cannot participate in foreign key constraints. If a JSONB object contains a user_id, nothing stops you from inserting an ID that does not exist in the users table.
-- This cannot be enforced with JSONB
-- payload->>'user_id' REFERENCES users(id) -- not possible
If the relationship matters for data integrity, model it as a relational column.
When You Need Complex Aggregations
Aggregating over JSONB fields is possible but slower and more awkward than aggregating over typed columns:
-- Clunky and slow
SELECT (attributes->>'color') AS color, count(*)
FROM products
GROUP BY attributes->>'color';
-- Clean and fast
SELECT color, count(*)
FROM products
GROUP BY color;
The Hybrid Approach
The most effective pattern combines relational columns for structured, queryable data with a JSONB column for flexible attributes:
CREATE TABLE products (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
category text NOT NULL,
price numeric(10,2) NOT NULL,
is_active boolean NOT NULL DEFAULT true,
attributes jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now()
);
-- Relational indexes for common queries
CREATE INDEX idx_products_category ON products (category) WHERE is_active;
-- GIN index for flexible attribute queries
CREATE INDEX idx_products_attrs ON products USING gin(attributes jsonb_path_ops);
The relational columns handle filtering, sorting, joins, and constraints. The JSONB column handles everything else. This gives you the structure of a relational database with the flexibility of a document store.
JSONB Storage Considerations
JSONB is stored in a decomposed binary format. Each value is tagged with its type and stored inline. This means:
- Small JSONB values (a few keys) are efficient.
- Large JSONB values (hundreds of keys, nested arrays with thousands of elements) increase page size and reduce cache efficiency.
- Repeated keys across rows are not deduplicated. If every row has
{"type": "page_view"}, the string "type" is stored in every row.
For very large JSONB values, consider whether the data belongs in a separate table or a different storage system.
-- Check average JSONB column size
SELECT avg(pg_column_size(attributes)) AS avg_bytes,
max(pg_column_size(attributes)) AS max_bytes
FROM products;
Common Pitfalls
- Putting everything in JSONB. "Schemaless" is not a feature, it is a missing constraint. Use relational columns for data you understand and JSONB for data you do not.
- Forgetting to index JSONB. Without a GIN index, every JSONB query scans the entire table. Create indexes for your query patterns.
- Using -> when you need ->>. The
->operator returns jsonb (quoted strings). The->>operator returns text. Comparisons with=usually need->> - Deeply nested JSONB structures. Extracting
payload->'a'->'b'->'c'->'d'->>'e'in WHERE clauses is slow and unreadable. Flatten your JSONB or extract frequently accessed paths into relational columns. - Not validating JSONB at the application layer. Postgres ensures valid JSON syntax but does not validate structure. A CHECK constraint with
jsonb_typeofor a JSON Schema validator can catch malformed data. - Using json instead of jsonb. The
jsontype re-parses on every access and cannot be indexed. Usejsonbunless you have a specific reason not to.
Key Takeaways
- Use JSONB, not json. JSONB is binary, indexable, and faster for reads.
- JSONB is ideal for flexible attributes, API payloads, event data, and any schema that varies across rows.
- The
@>containment operator is the most index-friendly way to query JSONB. GIN indexes support it natively. - Do not use JSONB when you know the schema, need foreign keys, or perform heavy aggregations on the same fields.
- The hybrid approach (relational columns + JSONB attributes column) gives you the best of both worlds.
- Always create GIN indexes on JSONB columns you query. Without them, every query is a sequential scan.