I/O Systems
I/O (Input/Output) systems connect the processor to the outside world — storage devices, networks, displays, keyboards, sensors, and other computers.
I/O Device Characteristics
Device Classification
| Type | Data Rate | Examples | |---|---|---| | Human interface | Bytes/sec - KB/sec | Keyboard, mouse, touchscreen | | Storage | MB/sec - GB/sec | SSD, HDD, NVMe | | Network | Mb/sec - Tb/sec | Ethernet, Wi-Fi, InfiniBand | | Display | GB/sec | GPU framebuffer, HDMI | | Sensor/Actuator | Bytes/sec - MB/sec | Temperature, motor, camera |
Device Interface
Each device presents a set of registers to the processor:
- Data register: Read/write data
- Status register: Device state (ready, busy, error)
- Command register: Tell the device what to do
- Control register: Configure device parameters
I/O Techniques
Programmed I/O (Polling)
The processor repeatedly checks the device status register in a loop:
// Send a byte to a serial port
PROCEDURE SEND_BYTE(data)
// Busy-wait until device is ready
WHILE READ_STATUS_REGISTER() AND BUSY ≠ 0 DO
// spin
WRITE_DATA_REGISTER(data)
WRITE_COMMAND_REGISTER(SEND)
Advantages: Simple. No special hardware. Disadvantages: Wastes CPU time spinning. CPU is 100% busy waiting even if device is slow. Unacceptable for modern systems (except very fast devices where polling is cheaper than interrupts).
Interrupt-Driven I/O
The device interrupts the processor when it needs attention:
- CPU initiates I/O operation
- CPU continues executing other code
- Device signals completion via hardware interrupt
- CPU stops current work, runs interrupt handler (ISR)
- ISR processes the I/O, acknowledges the interrupt
- CPU resumes previous work
Timeline:
CPU: [work] [initiate I/O] [other work...] [ISR] [resume work]
Device: [idle] [processing.................] [done/IRQ]
Advantages: CPU isn't wasted while device operates. Good for moderate-speed devices. Disadvantages: Interrupt overhead per transfer (context save/restore, ISR execution). High-frequency interrupts can overwhelm the CPU (interrupt storms).
DMA (Direct Memory Access)
A DMA controller transfers data directly between device and memory without CPU involvement:
- CPU programs the DMA controller: source, destination, count, direction
- DMA controller transfers data word-by-word (or burst)
- When done, DMA controller interrupts the CPU
CPU → [Program DMA] → [Other work...] → [Handle DMA-complete interrupt]
DMA → [Transfer data between device and memory]
Advantages: CPU is free during the entire transfer. Efficient for large transfers (disk blocks, network packets). Disadvantages: DMA controller hardware. Bus contention (DMA and CPU compete for memory bus). Cache coherence issues (DMA writes to memory, but CPU cache may have stale copy).
Cache coherence with DMA: Before DMA read: flush (write-back) cache lines. Before DMA write (device → memory): invalidate cache lines. Or use non-cacheable memory regions. Or hardware cache-coherent DMA (some systems).
Bus Architecture
A bus is a shared communication channel connecting processors, memory, and I/O devices.
Bus Types
System bus (front-side bus, historically): Connects CPU to memory controller and I/O.
I/O bus: Connects I/O devices. Slower, more devices. Examples: PCI, USB.
Memory bus: Connects memory controller to DRAM modules. High bandwidth, dedicated.
Synchronous vs Asynchronous
Synchronous bus: Clock-driven. All transfers aligned to clock edges. Simple but limited by clock distribution (skew limits length/speed).
Asynchronous bus: Handshake-driven (request/acknowledge). No clock. Can span longer distances. More complex but flexible.
Split Transactions
Non-split: Bus is held for entire operation (request + response). Wastes bandwidth if memory is slow.
Split: Release the bus after the request. Response comes later on a separate transaction. Allows other transactions between request and response.
I/O Interfaces
PCI Express (PCIe)
The dominant high-performance I/O interconnect.
Architecture: Point-to-point serial links (not a shared bus). Each link has multiple lanes (×1, ×2, ×4, ×8, ×16).
| Generation | Per-lane bandwidth | ×16 bandwidth | |---|---|---| | PCIe 3.0 | ~1 GB/s | ~16 GB/s | | PCIe 4.0 | ~2 GB/s | ~32 GB/s | | PCIe 5.0 | ~4 GB/s | ~64 GB/s | | PCIe 6.0 | ~8 GB/s | ~128 GB/s |
Uses: GPUs (×16), NVMe SSDs (×4), network cards (×8/×16), accelerators.
Topology: Root complex (CPU) → switches → endpoints (devices). Tree structure.
USB (Universal Serial Bus)
Versions:
| Version | Speed | Name | |---|---|---| | USB 1.1 | 12 Mbps | Full Speed | | USB 2.0 | 480 Mbps | High Speed | | USB 3.0 | 5 Gbps | SuperSpeed | | USB 3.1 | 10 Gbps | SuperSpeed+ | | USB 3.2 | 20 Gbps | SuperSpeed+ (×2) | | USB4 | 40-80 Gbps | Based on Thunderbolt |
Topology: Tiered star. Host controller → hubs → devices.
Features: Hot-plug, power delivery (USB-PD up to 240W), multiple device classes (storage, HID, audio, video, networking).
SATA and NVMe
SATA: Serial ATA. For HDDs and SSDs. Max ~600 MB/s (SATA III). AHCI command interface. Limited queue depth (32 commands).
NVMe: Non-Volatile Memory Express. Designed specifically for SSDs over PCIe. Up to 64K queues, each with 64K entries. Much lower latency than SATA. Bypasses legacy storage stack.
Performance comparison:
| Interface | Sequential Read | Latency | |---|---|---| | SATA SSD | ~550 MB/s | ~100 μs | | NVMe SSD (PCIe 4.0 ×4) | ~7 GB/s | ~10 μs | | NVMe SSD (PCIe 5.0 ×4) | ~14 GB/s | ~5 μs |
Interrupt Controllers
Legacy PIC (8259A)
Programmable Interrupt Controller. Handles up to 8 IRQ lines (cascaded to 15).
APIC (Advanced PIC)
Used in modern x86 multiprocessor systems.
Local APIC: Per-CPU. Handles local timer, IPI (inter-processor interrupts), performance counters.
I/O APIC: Distributes device interrupts to CPUs. Supports routing (which CPU handles which interrupt), priority, and message-signaled interrupts (MSI).
MSI / MSI-X (Message Signaled Interrupts)
Instead of a dedicated interrupt line, the device writes a message to a special memory address to signal an interrupt.
Advantages: No dedicated interrupt lines needed. Each device can have multiple interrupt vectors (MSI-X: up to 2048). Better for virtualization.
Polling vs Interrupts
| Aspect | Polling | Interrupts | |---|---|---| | Latency | Can be very low (busy-wait) | Context switch overhead | | CPU usage | 100% while polling | Only during ISR | | Throughput | Good for high-rate devices | Good for low-rate devices | | Complexity | Simple | Complex (ISR, priority, nesting) |
Modern approach: Hybrid — poll during high-traffic, interrupt during low-traffic. Linux NAPI (network) uses this: switch to polling when packet rate is high, back to interrupts when rate drops.
DPDK: Kernel bypass for networking. Pure polling on dedicated cores. Achieves maximum throughput for high-speed networking.
Memory-Mapped I/O vs Port-Mapped I/O
Memory-Mapped I/O (MMIO)
Device registers are mapped into the physical address space. Access them with regular load/store instructions.
volatile uint32_t *uart_data = (uint32_t *)0x40001000;
*uart_data = 'A'; // Write to UART
Advantages: No special instructions. Full addressing mode support. Works with any architecture. Can use cache (though usually marked non-cacheable).
Used by: ARM, RISC-V, most modern architectures.
Port-Mapped I/O (PMIO)
Separate I/O address space accessed with special instructions (IN, OUT on x86).
outb(0x60, data); // Write to I/O port 0x60
data = inb(0x60); // Read from I/O port 0x60
Used by: x86 (legacy devices). Modern x86 devices use MMIO via PCIe.
Applications in CS
- Device drivers: Implement the software interface to hardware devices. Must handle interrupts, DMA, and MMIO correctly.
- Storage systems: Understanding I/O performance (IOPS, bandwidth, latency) is critical for database and file system design.
- Networking: NIC design, interrupt coalescing, NAPI polling, kernel bypass (DPDK, io_uring).
- Virtualization: Device emulation, paravirtualization (virtio), SR-IOV for direct device assignment.
- Embedded systems: GPIO, UART, SPI, I2C — all I/O interfaces to peripherals.
- Real-time systems: Interrupt latency bounds, DMA completion timing.
- Operating systems: I/O scheduler, buffer management, interrupt handling, device driver framework.