Virtualization

Virtualization creates virtual instances of hardware, allowing multiple operating systems to run on a single physical machine.

Virtualization Types

Full Virtualization

The virtual machine (VM) runs an unmodified guest OS. The hypervisor emulates the complete hardware.

Binary translation (VMware, early): Scan guest kernel code, replace privileged instructions with safe equivalents at runtime.

Hardware-assisted (modern): CPU supports virtualization natively (Intel VT-x, AMD-V). Guest runs directly on CPU but traps on privileged operations.

Paravirtualization

The guest OS is modified to call the hypervisor directly via hypercalls instead of executing privileged instructions.

Advantages: Lower overhead than full virtualization (no binary translation needed). Disadvantages: Requires guest OS modification.

Xen pioneered paravirtualization. Modern Xen uses hardware-assisted virtualization (HVM) with paravirtualized drivers for best performance.

Hardware-Assisted Virtualization

CPU extensions that support virtualization natively:

Intel VT-x / AMD-V: Root mode (hypervisor) and non-root mode (guest). VMExit: guest traps to hypervisor on privileged operations. VMEntry: resume guest execution.

Intel VT-d / AMD-Vi: IOMMU — DMA remapping for direct device assignment to VMs.

Hypervisors

Type 1 (Bare-Metal)

Runs directly on hardware. No host OS.

[VM1: Linux] [VM2: Windows] [VM3: Linux]
          ┌─────────────────┐
          │   Hypervisor    │
          └─────────────────┘
              Hardware

Examples: Xen, KVM (technically Type 1 — Linux IS the hypervisor), VMware ESXi, Microsoft Hyper-V.

Type 2 (Hosted)

Runs as an application on a host OS.

[VM1] [VM2]
┌─────────────────┐
│   Hypervisor    │ (application)
├─────────────────┤
│   Host OS       │
└─────────────────┘
    Hardware

Examples: VirtualBox, VMware Workstation/Fusion, Parallels, QEMU (without KVM).

KVM (Kernel-based Virtual Machine)

Linux kernel module that turns Linux itself into a Type 1 hypervisor.

Architecture: KVM module + QEMU (for device emulation) + virtio (paravirtualized devices).

QEMU process (user space) ← device emulation
    │
KVM module (kernel space) ← CPU/memory virtualization
    │
Hardware (VT-x/AMD-V)

Most popular hypervisor for cloud computing. Used by AWS (Nitro), GCP, Azure, DigitalOcean.

Memory Virtualization

Shadow Page Tables

Hypervisor maintains a shadow page table mapping guest virtual → host physical directly. Keeps it in sync with guest's page table changes.

Cost: Every guest page table modification traps to the hypervisor → overhead.

Extended/Nested Page Tables (EPT/NPT)

Hardware support (Intel EPT, AMD NPT): Two-level translation.

Guest Virtual → Guest Physical → Host Physical
    (guest PT)        (EPT/NPT)

The CPU walks both page tables in hardware. No hypervisor intervention for page table changes.

TLB: Caches the full guest virtual → host physical mapping. TLB miss walks both tables.

Cost: Extra memory for the second page table. Slightly longer TLB miss handling. But much better than shadow page tables overall.

Memory Overcommitment

Allocate more virtual RAM to VMs than physical RAM available.

Techniques:

Ballooning: Inflate a balloon driver in guest → guest frees pages → hypervisor reclaims.
Page sharing (KSM): Identify identical pages across VMs and share them (COW). Effective for many similar VMs.
Swap: Hypervisor swaps VM pages to disk (last resort — very slow).
Memory compression: Compress seldom-used pages instead of swapping.

I/O Virtualization

Emulated Devices

Hypervisor emulates a standard hardware device (e.g., Intel e1000 network card). Guest uses its existing driver.

Slow: Every I/O operation traps to the hypervisor.

Paravirtualized Devices (virtio)

Standardized interface for paravirtualized I/O. Guest uses a virtio driver that communicates efficiently with the hypervisor via shared memory ring buffers.

Virtio devices: virtio-net (network), virtio-blk (block storage), virtio-scsi, virtio-gpu, virtio-fs.

Performance: Near-native for network and storage. Standard across hypervisors (KVM, Xen, Hyper-V).

Device Passthrough (SR-IOV)

Assign a physical device directly to a VM. The VM accesses the device without hypervisor intervention.

SR-IOV (Single Root I/O Virtualization): Hardware partitions a physical device into multiple Virtual Functions (VFs). Each VF assigned to a different VM.

Performance: Native speed. Used for high-performance networking and GPU computing.

Disadvantage: Device is dedicated to one VM (can't be shared). Live migration is harder.

Live Migration

Move a running VM from one physical host to another without downtime.

Pre-Copy Migration

Copy all memory pages to destination (while VM runs).
Copy dirty pages (modified since last copy).
Repeat dirty page copying (each round has fewer dirty pages).
When dirty pages are few enough: pause VM, copy remaining dirty pages, resume on destination.

Downtime: Typically 50-200 ms. Depends on dirty page rate and network bandwidth.

Post-Copy Migration

Pause VM, copy CPU state to destination, resume on destination.
Pages are loaded on demand from the source (page faults trigger remote fetches).
Background push of remaining pages.

Advantage: Shorter total migration time for memory-intensive VMs. Disadvantage: VM is slow until all pages are migrated. Source failure = lost pages.

Applications in CS

Cloud computing: VMs are the foundation of IaaS (EC2, GCE, Azure VMs). Multi-tenancy.
Server consolidation: Run multiple workloads on fewer physical servers. Reduce cost and power.
Development/testing: Instant creation of test environments. Reproducible builds.
Disaster recovery: VM snapshots and replication. Failover to standby site.
Legacy support: Run old OS on modern hardware (Windows XP in a VM).
Security: Isolate untrusted software in VMs. Malware analysis.
Desktop virtualization: VDI (Virtual Desktop Infrastructure). Remote desktops.