Delta, Iceberg & Hudi
Data lakes solved the storage problem. But they introduced a new one: you cannot update or delete individual rows in a bunch of Parquet files sitting in S3. You cannot run a query and guarantee you are reading a consistent snapshot. You cannot roll back a bad write. The lakehouse table formats (Delta Lake, Apache Iceberg, Apache Hudi) fix all of this by adding database-like capabilities on top of object storage.
The Problem with Raw Parquet Files
Parquet on object storage gives you cheap, scalable, columnar storage. But it lacks the features that databases have provided for decades:
Missing from raw Parquet on S3:
- ACID transactions (concurrent writes can corrupt data)
- Row-level updates and deletes (Parquet files are immutable)
- Schema evolution (adding a column means rewriting everything)
- Time travel (no way to query data as it existed yesterday)
- Consistent reads (a query might read partially-written data)
- Efficient upserts (no merge operation)
These are not theoretical concerns. Every team that runs a data lake at scale hits them within months. A pipeline fails halfway through writing 100 files; downstream queries read the 50 that landed and produce incorrect results. GDPR requires deleting a user's data; you cannot delete one row from a Parquet file without rewriting the entire file.
How Table Formats Work
All three formats follow the same core principle: they add a metadata layer on top of immutable Parquet (or ORC) files in object storage.
Traditional data lake:
Query Engine -> List files in S3 prefix -> Read Parquet files
Lakehouse table format:
Query Engine -> Read metadata (manifest/log) -> Read only relevant Parquet files
The metadata layer tracks:
- Which files belong to the current version of the table
- Statistics about each file (row counts, column min/max)
- The history of changes (what was added or removed in each transaction)
Immutable Files, Mutable Tables
The key insight: individual data files are never modified in place. Instead, operations create new files and update the metadata to point to the new set of files.
Update operation (conceptual):
1. Read the file containing the row to update
2. Write a NEW file with the updated row
3. Update metadata to point to the new file instead of the old one
4. The old file is kept (for time travel) or garbage collected later
Delete operation (conceptual):
1. Read the file containing the row to delete
2. Write a NEW file without that row
3. Update metadata to stop referencing the old file
This copy-on-write approach gives you the semantics of mutable tables on top of immutable storage.
Delta Lake
Delta Lake was created by Databricks. It uses a transaction log stored alongside the data files.
Structure
s3://warehouse/orders/
_delta_log/
00000000000000000000.json (initial table creation)
00000000000000000001.json (first batch of inserts)
00000000000000000002.json (second batch, plus an update)
00000000000000000010.checkpoint.parquet (checkpoint every 10 commits)
part-00000-abc123.parquet
part-00001-def456.parquet
part-00002-ghi789.parquet
The _delta_log directory contains a JSON file for each transaction. Each file describes which data files were added or removed. Checkpoints (in Parquet format) are written periodically to speed up metadata reads.
Key Features
-- Time travel: query previous versions
SELECT * FROM orders VERSION AS OF 5;
SELECT * FROM orders TIMESTAMP AS OF '2025-01-15 10:00:00';
-- Upserts via MERGE
MERGE INTO orders AS target
USING new_orders AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
-- Schema evolution
ALTER TABLE orders ADD COLUMN shipping_method STRING;
Delta Lake Ecosystem
Delta Lake is tightly integrated with Databricks but also works with Spark, Trino, and other engines through open-source connectors. The Delta Sharing protocol enables sharing tables across organizations without copying data.
Apache Iceberg
Apache Iceberg was created at Netflix and donated to the Apache Software Foundation. It was designed from the ground up as an open table format, not tied to any particular engine or vendor.
Structure
s3://warehouse/orders/
metadata/
v1.metadata.json (schema, partition spec, snapshot pointers)
v2.metadata.json (updated after next transaction)
snap-001.avro (manifest list for snapshot 1)
snap-002.avro (manifest list for snapshot 2)
manifest-abc.avro (manifest file: lists data files and stats)
manifest-def.avro
data/
part-00000-abc123.parquet
part-00001-def456.parquet
Iceberg uses a three-level metadata hierarchy:
Metadata file -> Manifest list -> Manifest files -> Data files
Metadata file: current schema, partition spec, pointer to current snapshot
Manifest list: which manifest files belong to this snapshot
Manifest files: list of data files with per-file statistics
Data files: actual Parquet files containing the data
This hierarchy enables efficient planning. The query engine reads metadata to determine exactly which data files to read, without listing the object storage directory.
Key Features
-- Time travel
SELECT * FROM orders FOR SYSTEM_TIME AS OF TIMESTAMP '2025-01-15 10:00:00';
SELECT * FROM orders FOR VERSION AS OF 42;
-- Schema evolution (add, drop, rename, reorder columns)
ALTER TABLE orders ADD COLUMN shipping_method STRING;
ALTER TABLE orders DROP COLUMN legacy_field;
ALTER TABLE orders RENAME COLUMN old_name TO new_name;
-- Partition evolution (change partitioning without rewriting data)
ALTER TABLE orders ADD PARTITION FIELD month(order_date);
-- Hidden partitioning (users do not need to know the partition scheme)
-- Query just uses: WHERE order_date = '2025-01-15'
-- Iceberg automatically prunes based on the partition spec
Partition Evolution
Iceberg's partition evolution is a standout feature. In Delta Lake and Hudi, changing the partition scheme typically requires rewriting the entire table. In Iceberg, you can add or change partition fields, and old data keeps its original partitioning while new data uses the new scheme. The query planner handles both transparently.
Original partition: daily (order_date)
New partition: monthly (order_date)
After evolution:
Old files: still partitioned by day
New files: partitioned by month
Queries: Iceberg prunes correctly across both schemes
Apache Hudi
Apache Hudi (Hadoop Upserts Deletes and Incrementals) was created at Uber to handle the specific challenge of upserting ride data at massive scale.
Structure
Hudi supports two table types:
Copy-on-Write (CoW):
- Data stored in Parquet files
- Updates rewrite entire files
- Read-optimized: queries are fast
- Write-amplified: updates are expensive
Merge-on-Read (MoR):
- Base files in Parquet + change logs in Avro
- Updates append to log files
- Write-optimized: updates are fast
- Reads may need to merge base files with logs (slower)
- Compaction merges logs into base files periodically
Key Features
Hudi strengths:
- Fast upserts (designed for high-throughput ingestion)
- Incremental queries (read only what changed since last query)
- Built-in compaction and cleaning
- Record-level indexing for fast lookups
- Strong CDC (change data capture) support
Hudi's incremental query capability is particularly useful for pipelines that need to process only new or changed records rather than scanning entire tables.
Comparing the Three Formats
Delta Lake Apache Iceberg Apache Hudi
Origin Databricks Netflix/Apache Uber/Apache
License Apache 2.0 Apache 2.0 Apache 2.0
Metadata format JSON log Avro manifests Timeline + metadata
Default file format Parquet Parquet (or ORC) Parquet (or ORC)
Schema evolution Add columns Full (add/drop/ Add columns
rename/reorder)
Partition evolution No (rewrite) Yes (no rewrite) Limited
Time travel Yes Yes Yes
Hidden partitioning No Yes No
Merge-on-Read Delta 3.0+ Yes (v2 deletes) Yes (native)
Ecosystem breadth Databricks-centric Broadest Narrower
Governance Databricks Open standard Open standard
How to Choose
The practical advice, stripped of vendor marketing:
Go with Iceberg if you are not locked into Databricks. Iceberg has the broadest engine support (Spark, Trino, Flink, Dremio, Snowflake, BigQuery, AWS Athena, StarRocks), the most flexible schema and partition evolution, and the strongest open governance. It is becoming the industry standard.
Go with Delta Lake if you are already on Databricks or plan to be. Delta Lake is deeply integrated with the Databricks runtime, Unity Catalog, and the broader Databricks ecosystem. You will get the best performance and features within that platform.
Consider Hudi if your primary use case is high-throughput CDC ingestion with frequent upserts. Hudi was designed for this workload and has optimizations (merge-on-read, record-level indexing) that the others are still catching up on.
The convergence trend: all three formats are adding each other's features. Delta Lake added liquid clustering and deletion vectors. Iceberg added row-level deletes. The differences are narrowing. What matters more than the format is the ecosystem you are building around.
Migration Between Formats
If you chose wrong (or if the landscape shifts), migration is not catastrophic. Tools like Apache XTable (formerly OneTable) can convert metadata between formats without rewriting the underlying Parquet files.
Apache XTable:
- Converts Delta -> Iceberg, Iceberg -> Delta, Hudi -> Iceberg, etc.
- Only converts metadata; data files stay in place
- Enables querying the same data with different engines
Snowflake and BigQuery both support reading Iceberg tables directly from object storage, which provides an exit path from warehouse lock-in.
Common Pitfalls
Choosing based on benchmarks instead of ecosystem fit. Micro-benchmarks comparing Delta vs Iceberg vs Hudi are misleading. Real-world performance depends on your data, query patterns, and engine. Choose based on ecosystem compatibility and operational simplicity.
Not running compaction. Merge-on-read tables accumulate small delta files. Without regular compaction, read performance degrades steadily. Set up automated compaction jobs from day one.
Ignoring garbage collection. Time travel keeps old file versions. Without a retention policy and vacuum/expire process, storage costs grow indefinitely.
-- Delta Lake: remove files older than 7 days
VACUUM orders RETAIN 168 HOURS;
-- Iceberg: expire snapshots older than 7 days
CALL catalog.system.expire_snapshots('orders', TIMESTAMP '2025-01-08 00:00:00');
Over-indexing on "open standard" vs "vendor." Delta Lake is open-source (Apache 2.0). Iceberg is an Apache project. Both are open. The real question is which engines and tools you use, not which format has a purer open-source pedigree.
Assuming table formats solve all data lake problems. Table formats give you ACID and time travel. They do not give you data quality, governance, access control, or discovery. You still need tools for those.
Key Takeaways
- Lakehouse table formats (Delta, Iceberg, Hudi) add ACID transactions, time travel, schema evolution, and upserts to data lakes
- All three work by adding a metadata layer on top of immutable Parquet files in object storage
- Apache Iceberg is the strongest choice for most new projects due to broad engine support and flexible partition evolution
- Delta Lake is the best choice within the Databricks ecosystem
- Apache Hudi excels at high-throughput upsert and CDC workloads
- Run compaction and garbage collection from day one; without them, performance and storage costs degrade over time
- The formats are converging in features; ecosystem fit matters more than feature checklists