6 min read
On this page

File System Implementations

This file surveys major file system implementations, their design decisions, and when to use each.

ext2/ext3/ext4 (Linux)

ext2 (1993)

Traditional UNIX-style file system. Block groups divide the disk into manageable segments, each with its own inode table and block bitmap.

Structure:

[Boot Block][Block Group 0][Block Group 1]...[Block Group N]

Block Group:
[Superblock copy][Group Descriptor][Block Bitmap][Inode Bitmap][Inode Table][Data Blocks]

No journaling — fsck on unclean shutdown.

ext3 (2001)

ext2 + journaling. Three journal modes: journal, ordered (default), writeback. Backward compatible with ext2.

ext4 (2008)

The Linux default file system. Major improvements over ext3:

Extents: Replace block lists with (start_block, length) pairs. Reduces metadata for large contiguous files. Up to 128 MB per extent (with 4K blocks).

Multiblock allocation: Allocate multiple blocks at once (reduces fragmentation).

Delayed allocation: Don't allocate blocks until data is flushed to disk. Better allocation decisions.

Maximum sizes: File: 16 TB. File system: 1 EB. Directory: unlimited (htree).

Checksums: Metadata checksums (superblock, group descriptors, inodes). Detects corruption.

Online resize: Grow the file system without unmounting.

XFS

High-performance file system originally from SGI (1994). Designed for large files and high throughput.

Key features:

  • B+ tree for everything: inodes, free space, directory entries, extent maps
  • Allocation groups: Parallel allocation across independent groups. Good SMP scalability
  • Delayed allocation: Like ext4. Reduces fragmentation
  • Real-time subvolume: Guaranteed-rate I/O for media streaming
  • Excellent scalability: Handles very large files (8 EB) and file systems (8 EB)

Best for: Large files, media servers, HPC storage, data warehouses.

Btrfs (B-tree File System)

Linux's "next-generation" file system. Copy-on-write (COW) design.

Key features:

  • COW B-trees: All metadata and data in B-trees. Never overwrites in place
  • Snapshots: Instant, space-efficient (share unchanged blocks). Writable snapshots (subvolume clones)
  • Checksums: Data and metadata checksummed (CRC32C, xxHash, SHA-256). Detects and optionally repairs corruption
  • RAID support: Built-in RAID 0/1/5/6/10 (RAID 5/6 still has known issues)
  • Compression: Transparent per-file compression (zlib, LZO, Zstandard)
  • Deduplication: Offline and online deduplication
  • Subvolumes: Lightweight, independent file system trees within one file system. Can be mounted separately
  • Send/receive: Incremental snapshot transfer (efficient backups)

Use cases: Desktop Linux, NAS (Synology uses Btrfs), snapshot-based backups, development environments.

ZFS

The most feature-rich file system. Originally from Sun Microsystems (2005). Available on FreeBSD, Linux (via OpenZFS), macOS.

Key features:

  • Pooled storage: Combine multiple disks into a zpool. File systems draw from the pool
  • Copy-on-write: Never overwrites live data. Atomic transactions
  • End-to-end checksums: Every block checksummed. Detects silent data corruption (bit rot)
  • RAID-Z: ZFS's RAID implementation (Z1/Z2/Z3 = 1/2/3 parity disks). No write hole
  • Snapshots and clones: Instant, zero-cost snapshots. Writable clones
  • Deduplication: Block-level dedup (memory-intensive — needs ~5 GB RAM per TB of data)
  • Compression: LZ4 (default), Zstandard, gzip. Transparent
  • Adaptive Replacement Cache (ARC): Advanced caching algorithm (better than LRU)
  • Self-healing: With redundancy, automatically repairs corrupted blocks using good copies

Use cases: NAS, enterprise storage, backup servers, databases (PostgreSQL on ZFS is popular).

Limitation on Linux: CDDL license incompatible with GPL (loaded as kernel module, not compiled into kernel).

NTFS (Windows)

Microsoft's default file system since Windows NT.

Key features:

  • Master File Table (MFT): Central structure. Each file has an MFT entry containing metadata and small file data
  • Resident files: Small files stored directly in the MFT entry (< ~900 bytes)
  • Journaling: Metadata journaling for crash recovery
  • Access Control Lists (ACLs): Fine-grained permissions (richer than UNIX rwx)
  • Alternate Data Streams (ADS): Multiple data streams per file (used for metadata, security concerns)
  • Compression and encryption: Per-file transparent compression (LZ77) and encryption (EFS)

Maximum sizes: File: 16 EB (theoretical), 256 TB (practical). Volume: 256 TB.

APFS (Apple File System)

Apple's file system since macOS 10.13 / iOS 10.3 (2017). Replaced HFS+.

Key features:

  • Copy-on-write: Atomic operations, snapshots
  • Space sharing: Multiple volumes share a container (pool of space)
  • Encryption: Native per-file and per-volume encryption (hardware-accelerated)
  • Snapshots: Instant snapshots (Time Machine uses these)
  • Clones: Instant file copies (no data duplication until modified)
  • Crash protection: COW metadata ensures consistency
  • Optimized for SSD: TRIM support, no journaling needed (COW provides consistency)

F2FS (Flash-Friendly File System)

Samsung-designed file system optimized for NAND flash (SSDs, eMMC, SD cards).

Key features:

  • Log-structured: Appends data sequentially (matches flash write patterns)
  • Multi-head logging: Multiple log areas for concurrent writes
  • Adaptive logging: Switches between append and in-place update based on utilization
  • Node Address Table (NAT): Indirection for efficient garbage collection
  • Designed for flash: Aligns I/O to erase block boundaries. Minimizes write amplification

Used in: Android phones (default on many Samsung, Google Pixel devices), SD cards.

Distributed File Systems

NFS (Network File System)

Client-server protocol. Remote file access transparent to applications.

NFSv4: Stateful. Compound operations. Delegation (client caching). Security (Kerberos). Performance improvements over v3.

CIFS/SMB (Windows File Sharing)

Microsoft's network file protocol. Used by Windows, macOS (via Samba on Linux).

SMB3: Encryption, multichannel, directory leasing. Used by Azure Files.

HDFS (Hadoop Distributed File System)

Designed for large-scale data processing. Files split into large blocks (128 MB default), replicated across nodes (3× default).

Architecture: Single NameNode (metadata) + many DataNodes (data). Optimized for sequential reads of large files. Write-once, read-many model.

Ceph

Distributed storage providing file, block, and object storage. No single point of failure.

CRUSH algorithm: Algorithmic data placement (no central metadata server for data placement). Self-healing, self-managing.

GlusterFS

Distributed file system aggregating storage from multiple servers. POSIX-compatible. Used for cloud storage and media streaming.

FUSE (Filesystem in Userspace)

Framework for implementing file systems in user space (no kernel code needed).

Application → VFS → FUSE kernel module → FUSE library → User-space FS daemon

Advantages: Easy development (any language). No kernel crashes from bugs. Portable. Disadvantages: Performance overhead (user-kernel context switches per operation).

Examples: sshfs (mount remote directories via SSH), s3fs (mount S3 buckets), ntfs-3g (NTFS on Linux), rclone mount.

Comparison

| FS | COW | Snapshots | Checksums | Compression | Max File | Best For | |---|---|---|---|---|---|---| | ext4 | No | No | Metadata | No | 16 TB | General Linux | | XFS | No | No | Metadata | No | 8 EB | Large files, HPC | | Btrfs | Yes | Yes | Data+Meta | Yes | 16 EB | Desktop, NAS | | ZFS | Yes | Yes | Data+Meta | Yes | 16 EB | Enterprise, NAS | | NTFS | No | VSS | Metadata | Yes | 256 TB | Windows | | APFS | Yes | Yes | Metadata | No | 8 EB | macOS/iOS | | F2FS | Hybrid | No | Metadata | Yes | 16 TB | Flash/Mobile |

Applications in CS

  • System administration: Choosing the right file system for workload (ext4 for general, XFS for databases, ZFS for integrity).
  • Backup and recovery: ZFS/Btrfs snapshots for instant backups. Send/receive for incremental replication.
  • Cloud storage: Distributed file systems (Ceph, HDFS) underpin cloud infrastructure.
  • Containers: OverlayFS provides Docker's layered image model. Each layer is a read-only file system snapshot.
  • Database storage: Direct I/O bypasses file system caching. COW file systems interact with database WAL.
  • Embedded: FAT for compatibility (SD cards, USB). LittleFS for microcontrollers (wear leveling, power-loss resilient).