File Systems
A file system organizes, stores, and retrieves data on storage devices. It provides the abstraction of files and directories over raw block devices.
File Concepts
File
A named collection of related data stored on secondary storage. From the OS perspective: a sequence of bytes (UNIX) or records (mainframes).
File attributes (metadata): Name, type, size, location, timestamps (created, modified, accessed), owner, permissions.
Access Methods
Sequential: Read/write data in order. Most common. tape-like.
Direct (random): Jump to any position. read(position, buffer). Essential for databases.
Indexed: Access via an index (key → position). Used in databases (B-tree indexes).
Directory Structure
Single-Level Directory
All files in one directory. Name collisions between users. No organization.
Two-Level Directory
Separate directory per user. Prevents name collisions. No subdirectories.
Tree-Structured Directory
Hierarchical. Most modern file systems. Subdirectories enable organization.
/
├── home/
│ ├── alice/
│ │ ├── documents/
│ │ └── code/
│ └── bob/
├── etc/
├── var/
│ └── log/
└── usr/
├── bin/
└── lib/
Path types:
- Absolute: /home/alice/code/main.rs (from root)
- Relative: ../bob/file.txt (from current directory)
Acyclic Graph Directory
Allow shared files/directories (hard links, symbolic links). Must handle dangling references and cycles.
Hard link: Multiple directory entries point to the same inode. File is deleted only when the last link is removed (reference counting).
Symbolic link (symlink): A file containing the path to another file. Can cross file system boundaries. Can dangle (target deleted).
File System Implementation
Allocation Methods
Contiguous Allocation
Each file occupies a contiguous set of blocks.
Advantages: Fast sequential and random access (just offset from start). Minimal metadata. Disadvantages: External fragmentation. Must know file size at creation. Hard to grow.
Used in: CD-ROMs, some embedded systems, ext4 extents (modified contiguous).
Linked Allocation
Each block contains a pointer to the next block. File = linked list of blocks.
Advantages: No fragmentation. Easy to grow. Disadvantages: Slow random access (must traverse list). Pointer overhead in each block. One bad pointer loses rest of file.
FAT (File Allocation Table): Move the pointers into a separate table in memory. Random access becomes table lookup. Used in FAT12/16/32, USB drives, SD cards.
Indexed Allocation
An index block stores pointers to all data blocks.
Index block: [ptr1][ptr2][ptr3][ptr4]...
↓ ↓ ↓ ↓
[data] [data] [data] [data]
Advantages: Fast random access. No fragmentation. Disadvantages: Index block overhead. For large files: multi-level indexing needed.
Inode Structure (UNIX)
Each file has an inode (index node) containing metadata and block pointers:
Inode:
├── Metadata (owner, permissions, timestamps, size)
├── Direct pointers (12): Point to data blocks directly
├── Single indirect: Points to a block of pointers
├── Double indirect: Points to a block of single-indirect blocks
└── Triple indirect: Points to a block of double-indirect blocks
Maximum file size (with 4KB blocks, 4-byte pointers, 1024 pointers per block):
- 12 direct: 48 KB
- Single indirect: +4 MB
- Double indirect: +4 GB
- Triple indirect: +4 TB
Free Space Management
Bit vector (bitmap): One bit per block. 1 = free, 0 = allocated. Efficient for finding contiguous free blocks. For 1 TB disk with 4 KB blocks: 32 MB bitmap.
Linked list: Free blocks form a linked list. No extra space (use free blocks themselves). Slow for finding contiguous blocks.
Grouping: Store addresses of n free blocks in the first free block. The last address points to another block of n addresses.
Counting: Store (start_block, count) pairs for runs of free blocks. Compact for contiguous free regions.
VFS (Virtual File System)
Linux VFS provides a unified interface for different file system implementations.
Application
│ (read, write, open, ...)
▼
VFS Layer (common interface)
│
├── ext4 driver
├── XFS driver
├── Btrfs driver
├── NFS driver
├── procfs driver
└── tmpfs driver
Key VFS objects:
- Superblock: File system metadata (size, block size, free blocks)
- Inode: File metadata (permissions, size, block locations)
- Dentry: Directory entry (name → inode mapping, cached for fast lookup)
- File: Open file instance (position, access mode)
Journaling
Protects file system integrity against crashes (power failure, kernel panic).
Problem
File system operations (create file, extend, delete) require multiple writes (data blocks, inode, directory, free map). A crash mid-operation leaves the file system inconsistent.
Without journaling: fsck must scan the entire disk on reboot to find and fix inconsistencies. Slow (hours for large disks).
Write-Ahead Logging (WAL)
Before modifying the file system, write the planned changes to a journal (log):
- Journal write: Write the transaction (all planned changes) to the journal.
- Journal commit: Mark the transaction as committed.
- Checkpoint: Apply the changes to the actual file system.
- Journal free: Remove the committed transaction from the journal.
On crash recovery: Replay committed transactions from the journal. Ignore uncommitted ones.
Modes:
- Data journaling: Log both metadata and data. Safest but slowest (data written twice).
- Ordered mode (ext4 default): Only metadata is journaled, but data is written before metadata. Ensures no stale data in new files.
- Writeback mode: Only metadata journaled, data written independently. Fastest but may expose stale data after crash.
Log-Structured File Systems (LFS)
Write all changes (data + metadata) sequentially to a log. The log IS the file system.
Advantages: Excellent write performance (all writes are sequential). Natural journaling. Disadvantages: Read performance depends on garbage collection (compaction). Background cleaning required.
Influenced: LevelDB/RocksDB (LSM-tree), F2FS (flash-friendly), SSD firmware (FTL).
Copy-on-Write File Systems
Instead of overwriting data in place, write new data to a new location and update pointers.
Never overwrites existing data — old data remains intact until explicitly freed.
Advantages: Natural snapshots (old pointers preserved). No journaling needed (atomic pointer update). Data integrity (old data never corrupted).
ZFS and Btrfs use this approach.
Applications in CS
- Databases: File systems provide the storage foundation. Databases often bypass the file system (direct I/O) for control over caching and write ordering.
- Cloud storage: Object stores (S3) are built on top of distributed file systems.
- Containers: Overlay file systems (OverlayFS) enable Docker's layered image model.
- Backup and recovery: Snapshots (ZFS, Btrfs, LVM) enable point-in-time recovery.
- Version control: Git stores objects in a content-addressable file system.
- Embedded systems: Simple file systems (FAT, LittleFS) for constrained devices.
- High-performance computing: Parallel file systems (Lustre, GPFS) for cluster storage.