7 min read
On this page

Virtual Memory

Virtual memory provides each process with an illusion of having its own large, private, contiguous address space — while the physical memory is shared, fragmented, and limited.

Address Translation

Each process uses virtual addresses. The hardware (MMU) translates them to physical addresses.

Virtual Address → MMU → Physical Address → Cache/Memory

Why Virtual Memory?

  1. Isolation: Processes can't access each other's memory (security, stability).
  2. Abstraction: Programs don't need to know physical memory layout.
  3. Overcommitment: Total virtual memory can exceed physical memory (paging to disk).
  4. Sharing: Multiple processes can map the same physical page (shared libraries, shared memory IPC).
  5. Simplification: Each process sees a uniform address space starting at 0.

Page Tables

Memory is divided into fixed-size pages (virtual) and frames (physical). Typical page size: 4 KB.

Virtual address = [Virtual Page Number (VPN) | Page Offset] Physical address = [Physical Page Number (PPN) | Page Offset]

The page table maps VPN → PPN.

Page Table Entry (PTE)

Each entry contains:

| Field | Purpose | |---|---| | PPN | Physical page number | | Valid (V) | Is this page in physical memory? | | Dirty (D) | Has the page been written? | | Referenced (R) | Has the page been accessed recently? | | Read/Write/Execute | Permission bits | | User/Supervisor | Privilege level access | | Present | Is the page resident in memory? |

Single-Level Page Table

For a 32-bit address space with 4 KB pages:

  • VPN: 20 bits → 2²⁰ = 1M entries
  • Each entry: 4 bytes → page table size = 4 MB per process

This is too large (especially since most of the address space is unused).

Multi-Level Page Tables

Hierarchical structure. Only allocate page table pages for regions actually in use.

Two-level (x86-32):

Virtual Address: [Dir (10 bits) | Table (10 bits) | Offset (12 bits)]

CR3 → Page Directory (1024 entries)
         ↓
      Page Table (1024 entries)
         ↓
      Physical Page
  • 1024 × 1024 = 1M pages addressable
  • Only allocate second-level tables for used regions
  • Sparse address spaces use very little memory

Four-level (x86-64):

Virtual Address (48 bits used, 16 sign-extended):
[PML4 (9) | PDPT (9) | PD (9) | PT (9) | Offset (12)]

Each level has 512 entries (9 bits). Supports 256 TB virtual address space.

Five-level (x86-64, recent): Adds PML5 level. Supports 128 PB virtual address space. Needed for very large memory servers.

Inverted Page Table

Instead of per-process page tables, one global table indexed by physical frame number.

  • Size proportional to physical memory (not virtual)
  • Lookup: hash VPN to find the entry
  • Used in PowerPC, IA-64

Saves memory but slower lookup (hash collision handling).

TLB (Translation Lookaside Buffer)

Page table walks are expensive (2-4 memory accesses for multi-level tables). The TLB caches recent translations.

Virtual Address → TLB lookup → PPN (on hit, fast)
                            → Page table walk (on miss, slow)

TLB Structure

| Property | Typical Value | |---|---| | Entries | 32-128 (L1 TLB), 512-2048 (L2 TLB) | | Associativity | Fully associative (L1), 4-8 way (L2) | | Hit time | 0.5-1 cycle | | Miss penalty | 10-100 cycles (page table walk) | | Miss rate | <1% (for most workloads) | | Separate I/D? | Often separate I-TLB and D-TLB |

TLB Miss Handling

Hardware-managed (x86, ARM): Hardware walks the page table automatically. Faster but complex hardware.

Software-managed (MIPS, RISC-V, some ARM modes): TLB miss triggers an exception. OS software walks the page table and loads the TLB entry. More flexible but higher overhead.

TLB Shootdown

When a page table mapping changes, all TLB entries for that mapping must be invalidated — including in other cores' TLBs. This is a TLB shootdown.

Cost: Expensive! Requires inter-processor interrupt (IPI). One of the scaling bottlenecks for large multiprocessor systems.

Page Replacement Algorithms

When physical memory is full and a new page is needed, the OS must evict a page.

FIFO (First In, First Out)

Evict the oldest page. Simple but can evict frequently used pages. Suffers from Bélády's anomaly (more frames can cause more faults).

Optimal (OPT / Bélády's)

Evict the page that won't be used for the longest time. Impossible to implement (requires future knowledge). Used as a theoretical benchmark.

LRU (Least Recently Used)

Evict the page used least recently. Good approximation of optimal.

Problem: True LRU requires tracking access order — expensive for many pages.

Approximations:

  • Clock algorithm (Second Chance): Circular list with reference bit. On eviction scan, if R=1, clear R and skip. If R=0, evict. Approximates LRU cheaply.
  • Enhanced clock: Consider both R and D bits. Prefer evicting clean, unreferenced pages.

Working Set Model

The working set W(t, Δ) is the set of pages accessed in the last Δ time units.

Keep the working set in memory. Evict pages outside the working set. If the total working set of all processes exceeds physical memory → thrashing.

Page Fault Frequency (PFF)

Monitor the page fault rate per process. If too high, give it more frames. If too low, take some away.

Demand Paging

Pages are loaded into memory only when accessed (on first page fault). No page is loaded preemptively.

Page fault handling:

  1. Access to unmapped page triggers exception
  2. OS finds the page on disk (swap space or file)
  3. OS finds a free frame (or evicts a page)
  4. OS loads the page from disk to the frame
  5. OS updates the page table entry
  6. Restart the faulting instruction

Cost: Page fault ≈ 5-10 ms (disk access). Regular memory access ≈ 100 ns. Page faults are ~100,000× more expensive.

If page fault rate is p: Effective access time = (1-p) × memory_time + p × fault_time.

Even p = 0.001 (1 in 1000) causes 10× slowdown.

Copy-on-Write (COW)

On fork(), don't copy the parent's pages. Instead, both processes share the same physical pages, marked read-only.

When either process writes to a shared page:

  1. Page fault (write to read-only page)
  2. OS copies the page to a new frame
  3. Updates the writing process's page table to point to the new copy (now read-write)
  4. The other process keeps the original

Benefit: fork() is nearly instantaneous regardless of address space size. Most pages are never written (especially if followed by exec()).

Memory-Mapped Files

Map a file directly into the virtual address space:

addr = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
// Now addr[i] accesses byte i of the file

The OS uses demand paging to load file pages as accessed. Writes (for MAP_SHARED) are written back to the file.

Advantages: No explicit read()/write() system calls. Efficient for random access. OS manages caching.

Used by: Database buffer pools, shared libraries (.so/.dylib), JIT compilers, memory-mapped I/O.

Segmentation

An older memory management scheme where the address space is divided into segments of variable size (code, data, stack, heap).

Virtual Address: [Segment Number | Offset]

Each segment has a base address and length limit. Segment table maps segment number → (base, limit).

Advantages: Supports logical divisions of memory. Different protection per segment. Disadvantages: External fragmentation. Variable-size allocation is complex.

Modern use: x86 still has segment registers but they're largely vestigial (flat memory model in 64-bit mode). Segmentation is combined with paging in x86-32 (segmented paging).

Huge Pages

Standard 4 KB pages → large page tables, many TLB misses for large datasets.

Huge pages: 2 MB or 1 GB pages. Fewer TLB entries needed, fewer page table levels walked.

Linux: hugetlbfs, Transparent Huge Pages (THP — OS automatically promotes to huge pages).

When to use: Large contiguous allocations (databases, VMs, scientific computing, ML). Not for general use (internal fragmentation, allocation complexity).

NUMA-Aware Allocation

In Non-Uniform Memory Access architectures, memory access time depends on which processor's local memory is being accessed.

Local access: Fast (~80 ns). Remote access: Slow (~130 ns, 1.5-2× local).

NUMA-aware allocation: Place pages close to the processor that accesses them. Linux: numactl, mbind(), first-touch policy (page allocated on the node that first accesses it).

Applications in CS

  • Process isolation: Virtual memory is the foundation of OS process isolation and security.
  • Memory overcommitment: VMs and containers can commit more virtual memory than physical RAM. Swap and ballooning manage physical memory.
  • Garbage collection: Uses page protection tricks (write barriers via mprotect), large heap management.
  • Database buffer management: Database systems manage their own "virtual memory" (buffer pool) for performance control.
  • Memory-mapped I/O: Device registers mapped into address space for fast access.
  • JIT compilation: Allocate pages, write machine code, then change permissions to executable (W^X).
  • Security: ASLR (Address Space Layout Randomization) randomizes virtual memory layout. NX bit prevents executing data pages. Guard pages detect stack overflow.