Slowly Changing Dimensions

Customers move. Products get repriced. Employees switch departments. The data that describes your business entities changes over time, and how you handle those changes determines whether your analytics reflect reality or fiction.

Slowly Changing Dimensions (SCDs) are techniques for tracking changes in dimension tables. The name comes from the fact that dimension attributes change slowly relative to fact table transactions — a customer's address changes a few times over years, while they might place orders daily.

The Problem

Consider a simple scenario. A customer lives in New York and places an order on January 15. On February 1, they move to California. They place another order on February 20.

If you overwrite the address, both orders show California. Your Q1 revenue-by-state report is wrong — New York's revenue is understated, California's is overstated.

If you do not track the change at all, both orders show New York. Also wrong.

The right answer depends on what question you are trying to answer, and that depends on the SCD type you choose.

SCD Type 0: Retain Original

The attribute never changes in the warehouse, even if it changes in the source system. The value captured at the time the dimension record was first created is preserved forever.

When to use it: For attributes that should reflect the state at the time of first capture. A customer's original signup channel, a product's launch date, an employee's original hire date.

-- Type 0: the original_signup_channel never changes
-- even if the source system somehow updates it
CREATE TABLE dim_customer (
    customer_key         INT PRIMARY KEY,
    customer_id          VARCHAR(50),
    name                 VARCHAR(200),
    original_signup_channel VARCHAR(50),  -- Type 0: never updated
    current_state        VARCHAR(50)      -- Type 1: overwritten
);

Type 0 is simple and appropriate for genuinely immutable attributes. Most dimension tables have a few Type 0 columns mixed in with other SCD types.

SCD Type 1: Overwrite

When an attribute changes, you overwrite the old value with the new one. No history is preserved. The dimension table always reflects the current state.

-- Before: customer lives in New York
SELECT * FROM dim_customer WHERE customer_id = 'C-1001';

customer_key | customer_id | name       | city     | state
1001         | C-1001      | Jane Smith | New York | NY

-- After Type 1 update: city is overwritten
UPDATE dim_customer
SET city = 'Los Angeles', state = 'CA'
WHERE customer_id = 'C-1001';

customer_key | customer_id | name       | city        | state
1001         | C-1001      | Jane Smith | Los Angeles | CA

When to use it:

Correcting data entry errors (the name was misspelled, fix it)
Attributes where history does not matter (a customer updates their display name)
When simplicity outweighs the need for historical analysis

The tradeoff: All historical facts are now associated with the current attribute value. If Jane placed 100 orders while in New York, those orders now look like they came from California. This is acceptable when the attribute is irrelevant to historical analysis, but dangerous when it is not.

SCD Type 2: Add New Row

When an attribute changes, you keep the old row and insert a new one. The old row is marked as inactive (or given an end date), and the new row becomes the current record. This preserves full history.

CREATE TABLE dim_customer (
    customer_key     INT PRIMARY KEY,       -- surrogate key
    customer_id      VARCHAR(50),           -- natural key (not unique)
    name             VARCHAR(200),
    city             VARCHAR(100),
    state            VARCHAR(50),
    effective_date   DATE,
    expiration_date  DATE,                  -- '9999-12-31' for current
    is_current       BOOLEAN
);

Before the move:

customer_key | customer_id | name       | city     | state | effective  | expiration | is_current
1001         | C-1001      | Jane Smith | New York | NY    | 2024-01-01 | 9999-12-31 | true

After the move:

customer_key | customer_id | name       | city        | state | effective  | expiration | is_current
1001         | C-1001      | Jane Smith | New York    | NY    | 2024-01-01 | 2025-01-31 | false
1002         | C-1001      | Jane Smith | Los Angeles | CA    | 2025-02-01 | 9999-12-31 | true

Now fact_orders rows that reference customer_key = 1001 correctly show New York, and rows that reference customer_key = 1002 correctly show California. The surrogate key (customer_key) is what the fact table joins on, and each version of the customer gets a different surrogate key.

Querying Type 2 Dimensions

To get the current state of a customer:

SELECT *
FROM dim_customer
WHERE customer_id = 'C-1001'
  AND is_current = TRUE;

To get the state of a customer at a specific point in time:

SELECT *
FROM dim_customer
WHERE customer_id = 'C-1001'
  AND effective_date <= '2025-01-15'
  AND expiration_date > '2025-01-15';

To join a fact table with the correct dimension version:

SELECT
    f.order_id,
    f.order_date,
    f.total_amount,
    c.city,
    c.state
FROM fact_orders f
JOIN dim_customer c ON f.customer_key = c.customer_key;
-- No date filtering needed -- the fact already points to the right version

The Type 2 Implementation Pattern

When loading new data, you need to detect changes and handle them:

-- Step 1: Expire existing current records where attributes changed
UPDATE dim_customer AS target
SET
    expiration_date = CURRENT_DATE - INTERVAL '1 day',
    is_current = FALSE
FROM staging.customers AS source
WHERE target.customer_id = source.customer_id
  AND target.is_current = TRUE
  AND (
      target.city != source.city
      OR target.state != source.state
  );

-- Step 2: Insert new current records for changed customers
INSERT INTO dim_customer (
    customer_key, customer_id, name, city, state,
    effective_date, expiration_date, is_current
)
SELECT
    NEXTVAL('customer_key_seq'),
    source.customer_id,
    source.name,
    source.city,
    source.state,
    CURRENT_DATE,
    '9999-12-31',
    TRUE
FROM staging.customers AS source
JOIN dim_customer AS target
    ON source.customer_id = target.customer_id
WHERE target.is_current = FALSE
  AND target.expiration_date = CURRENT_DATE - INTERVAL '1 day';

Type 2 Is the Default

If you are unsure which SCD type to use, use Type 2. The reasons:

You cannot add history retroactively. Once you overwrite a value, it is gone.
Storage is cheap. A dimension table that grows 10% per year because of Type 2 rows is negligible.
Analysts will eventually ask "what was the state of X when Y happened?" If you do not have Type 2, the answer is "we do not know."
You can always collapse a Type 2 dimension to its current state (filter is_current = TRUE). You cannot expand a Type 1 dimension to show history.

SCD Type 3: Add Column for Previous Value

Instead of adding rows, you add columns. The dimension table has both the current value and the previous value.

CREATE TABLE dim_customer (
    customer_key     INT PRIMARY KEY,
    customer_id      VARCHAR(50),
    name             VARCHAR(200),
    current_city     VARCHAR(100),
    current_state    VARCHAR(50),
    previous_city    VARCHAR(100),
    previous_state   VARCHAR(50),
    change_date      DATE
);

customer_key | customer_id | current_city | current_state | previous_city | previous_state | change_date
1001         | C-1001      | Los Angeles  | CA            | New York      | NY             | 2025-02-01

When to use it:

When you only need to track the most recent change, not the full history
For "before and after" comparisons (customers who changed regions, products that changed categories)
When Type 2 would create too many rows for a frequently-changing attribute

The tradeoff: You lose all history beyond the previous value. If Jane moved from New York to Chicago to Los Angeles, you only see Chicago and Los Angeles. New York is gone.

Type 3 is rarely the right choice as a standalone approach. It is occasionally useful for specific attributes where "current vs previous" is the only question people ask.

SCD Type 6: Hybrid (1 + 2 + 3)

Type 6 combines Types 1, 2, and 3. You add new rows for history (Type 2), keep a current-value column that is overwritten across all rows (Type 1), and maintain a previous-value column (Type 3).

CREATE TABLE dim_customer (
    customer_key      INT PRIMARY KEY,
    customer_id       VARCHAR(50),
    name              VARCHAR(200),
    historical_city   VARCHAR(100),   -- Type 2: changes per row
    historical_state  VARCHAR(50),    -- Type 2: changes per row
    current_city      VARCHAR(100),   -- Type 1: overwritten on all rows
    current_state     VARCHAR(50),    -- Type 1: overwritten on all rows
    effective_date    DATE,
    expiration_date   DATE,
    is_current        BOOLEAN
);

customer_key | customer_id | historical_city | historical_state | current_city | current_state | effective  | expiration
1001         | C-1001      | New York        | NY               | Los Angeles  | CA            | 2024-01-01 | 2025-01-31
1002         | C-1001      | Los Angeles     | CA               | Los Angeles  | CA            | 2025-02-01 | 9999-12-31

This lets you answer both "what was the state when this order was placed?" (use historical columns) and "what is the customer's current state?" (use current columns) without needing an extra join or filter.

When to use it: When analysts frequently need both historical and current views in the same query. It is more complex to maintain but eliminates a common source of query errors.

Choosing the Right Type

Attribute Characteristic           Recommended Type
---------------------------------------------------------
Immutable (signup date, SSN)       Type 0
Corrections only (fixing typos)    Type 1
History matters for analysis       Type 2
Only care about previous value     Type 3
Need both historical and current   Type 6

In practice, most dimension tables use a mix. A customer dimension might track address with Type 2, email with Type 1 (corrections only), and original signup source with Type 0. You do not apply one SCD type to an entire table — you apply it per attribute.

Practical Considerations

Detecting Changes

You need to know when an attribute has changed. Common approaches:

Full comparison: Compare every column in the staging table against the dimension table. Reliable but slow for large tables.
Hash comparison: Compute a hash of the tracked columns. Compare hashes instead of individual columns. Faster for wide tables.
Source timestamps: If the source system provides a last_modified timestamp, use it to identify changed records. Efficient but requires trusting the source.

-- Hash comparison for change detection
SELECT
    s.customer_id,
    MD5(CONCAT(s.city, '|', s.state, '|', s.name)) AS source_hash,
    MD5(CONCAT(d.city, '|', d.state, '|', d.name)) AS dim_hash
FROM staging.customers s
JOIN dim_customer d
    ON s.customer_id = d.customer_id
    AND d.is_current = TRUE
WHERE MD5(CONCAT(s.city, '|', s.state, '|', s.name))
   != MD5(CONCAT(d.city, '|', d.state, '|', d.name));

Surrogate Keys in Type 2

Type 2 dimensions require surrogate keys. The natural key (customer_id) is no longer unique — one customer can have multiple rows. The surrogate key (customer_key) is what fact tables reference, ensuring each fact row points to the correct version of the dimension.

Mini-Dimensions for Rapidly Changing Attributes

Some attributes change frequently — a customer's loyalty tier, a product's inventory status. Type 2 tracking on these attributes would create a massive number of rows. The solution is a mini-dimension: a separate small dimension table for the rapidly changing attributes, joined to the fact table independently.

Common Pitfalls

Defaulting to Type 1 out of convenience. Overwriting is easy to implement, but once history is lost, it is gone. Default to Type 2 unless you have a clear reason not to.
Applying one SCD type to the entire table. Different attributes have different requirements. Track address changes with Type 2 and email corrections with Type 1 in the same table.
Forgetting to update surrogate keys in fact tables. When a new Type 2 row is created, new fact records must reference the new surrogate key, not the old one. The fact-to-dimension key assignment happens at load time.
Not indexing effective and expiration dates. Point-in-time queries against Type 2 dimensions need efficient date range lookups. Index these columns.
Using NULL for current row expiration. Use a sentinel value like 9999-12-31 instead of NULL. It simplifies range queries (you can use BETWEEN instead of handling NULL with COALESCE) and avoids NULL comparison pitfalls.
Tracking too many attributes as Type 2. If a customer has 50 attributes and you track all 50 as Type 2, a change to any one of them creates a new row. Be selective about which attributes warrant historical tracking.

Key Takeaways

Slowly Changing Dimensions handle the reality that dimension attributes change over time.
Type 1 (overwrite) is simple but destroys history. Use it for corrections and attributes where history is irrelevant.
Type 2 (new row with date range) preserves full history and should be your default choice for any attribute where historical context matters.
Type 3 (previous value column) tracks only the last change. Rarely sufficient on its own.
Most dimension tables use a mix of SCD types across different attributes.
Use surrogate keys in Type 2 dimensions so fact tables reference the correct historical version.
Default to Type 2 when in doubt. You can always ignore history, but you cannot reconstruct it after overwriting.