6 min read
On this page

I/O Systems

I/O (Input/Output) systems connect the processor to the outside world — storage devices, networks, displays, keyboards, sensors, and other computers.

I/O Device Characteristics

Device Classification

| Type | Data Rate | Examples | |---|---|---| | Human interface | Bytes/sec - KB/sec | Keyboard, mouse, touchscreen | | Storage | MB/sec - GB/sec | SSD, HDD, NVMe | | Network | Mb/sec - Tb/sec | Ethernet, Wi-Fi, InfiniBand | | Display | GB/sec | GPU framebuffer, HDMI | | Sensor/Actuator | Bytes/sec - MB/sec | Temperature, motor, camera |

Device Interface

Each device presents a set of registers to the processor:

  • Data register: Read/write data
  • Status register: Device state (ready, busy, error)
  • Command register: Tell the device what to do
  • Control register: Configure device parameters

I/O Techniques

Programmed I/O (Polling)

The processor repeatedly checks the device status register in a loop:

// Send a byte to a serial port
PROCEDURE SEND_BYTE(data)
    // Busy-wait until device is ready
    WHILE READ_STATUS_REGISTER() AND BUSY ≠ 0 DO
        // spin
    WRITE_DATA_REGISTER(data)
    WRITE_COMMAND_REGISTER(SEND)

Advantages: Simple. No special hardware. Disadvantages: Wastes CPU time spinning. CPU is 100% busy waiting even if device is slow. Unacceptable for modern systems (except very fast devices where polling is cheaper than interrupts).

Interrupt-Driven I/O

The device interrupts the processor when it needs attention:

  1. CPU initiates I/O operation
  2. CPU continues executing other code
  3. Device signals completion via hardware interrupt
  4. CPU stops current work, runs interrupt handler (ISR)
  5. ISR processes the I/O, acknowledges the interrupt
  6. CPU resumes previous work
Timeline:
CPU:    [work] [initiate I/O] [other work...] [ISR] [resume work]
Device: [idle] [processing.................] [done/IRQ]

Advantages: CPU isn't wasted while device operates. Good for moderate-speed devices. Disadvantages: Interrupt overhead per transfer (context save/restore, ISR execution). High-frequency interrupts can overwhelm the CPU (interrupt storms).

DMA (Direct Memory Access)

A DMA controller transfers data directly between device and memory without CPU involvement:

  1. CPU programs the DMA controller: source, destination, count, direction
  2. DMA controller transfers data word-by-word (or burst)
  3. When done, DMA controller interrupts the CPU
CPU → [Program DMA] → [Other work...] → [Handle DMA-complete interrupt]
DMA →                  [Transfer data between device and memory]

Advantages: CPU is free during the entire transfer. Efficient for large transfers (disk blocks, network packets). Disadvantages: DMA controller hardware. Bus contention (DMA and CPU compete for memory bus). Cache coherence issues (DMA writes to memory, but CPU cache may have stale copy).

Cache coherence with DMA: Before DMA read: flush (write-back) cache lines. Before DMA write (device → memory): invalidate cache lines. Or use non-cacheable memory regions. Or hardware cache-coherent DMA (some systems).

Bus Architecture

A bus is a shared communication channel connecting processors, memory, and I/O devices.

Bus Types

System bus (front-side bus, historically): Connects CPU to memory controller and I/O.

I/O bus: Connects I/O devices. Slower, more devices. Examples: PCI, USB.

Memory bus: Connects memory controller to DRAM modules. High bandwidth, dedicated.

Synchronous vs Asynchronous

Synchronous bus: Clock-driven. All transfers aligned to clock edges. Simple but limited by clock distribution (skew limits length/speed).

Asynchronous bus: Handshake-driven (request/acknowledge). No clock. Can span longer distances. More complex but flexible.

Split Transactions

Non-split: Bus is held for entire operation (request + response). Wastes bandwidth if memory is slow.

Split: Release the bus after the request. Response comes later on a separate transaction. Allows other transactions between request and response.

I/O Interfaces

PCI Express (PCIe)

The dominant high-performance I/O interconnect.

Architecture: Point-to-point serial links (not a shared bus). Each link has multiple lanes (×1, ×2, ×4, ×8, ×16).

| Generation | Per-lane bandwidth | ×16 bandwidth | |---|---|---| | PCIe 3.0 | ~1 GB/s | ~16 GB/s | | PCIe 4.0 | ~2 GB/s | ~32 GB/s | | PCIe 5.0 | ~4 GB/s | ~64 GB/s | | PCIe 6.0 | ~8 GB/s | ~128 GB/s |

Uses: GPUs (×16), NVMe SSDs (×4), network cards (×8/×16), accelerators.

Topology: Root complex (CPU) → switches → endpoints (devices). Tree structure.

USB (Universal Serial Bus)

Versions:

| Version | Speed | Name | |---|---|---| | USB 1.1 | 12 Mbps | Full Speed | | USB 2.0 | 480 Mbps | High Speed | | USB 3.0 | 5 Gbps | SuperSpeed | | USB 3.1 | 10 Gbps | SuperSpeed+ | | USB 3.2 | 20 Gbps | SuperSpeed+ (×2) | | USB4 | 40-80 Gbps | Based on Thunderbolt |

Topology: Tiered star. Host controller → hubs → devices.

Features: Hot-plug, power delivery (USB-PD up to 240W), multiple device classes (storage, HID, audio, video, networking).

SATA and NVMe

SATA: Serial ATA. For HDDs and SSDs. Max ~600 MB/s (SATA III). AHCI command interface. Limited queue depth (32 commands).

NVMe: Non-Volatile Memory Express. Designed specifically for SSDs over PCIe. Up to 64K queues, each with 64K entries. Much lower latency than SATA. Bypasses legacy storage stack.

Performance comparison:

| Interface | Sequential Read | Latency | |---|---|---| | SATA SSD | ~550 MB/s | ~100 μs | | NVMe SSD (PCIe 4.0 ×4) | ~7 GB/s | ~10 μs | | NVMe SSD (PCIe 5.0 ×4) | ~14 GB/s | ~5 μs |

Interrupt Controllers

Legacy PIC (8259A)

Programmable Interrupt Controller. Handles up to 8 IRQ lines (cascaded to 15).

APIC (Advanced PIC)

Used in modern x86 multiprocessor systems.

Local APIC: Per-CPU. Handles local timer, IPI (inter-processor interrupts), performance counters.

I/O APIC: Distributes device interrupts to CPUs. Supports routing (which CPU handles which interrupt), priority, and message-signaled interrupts (MSI).

MSI / MSI-X (Message Signaled Interrupts)

Instead of a dedicated interrupt line, the device writes a message to a special memory address to signal an interrupt.

Advantages: No dedicated interrupt lines needed. Each device can have multiple interrupt vectors (MSI-X: up to 2048). Better for virtualization.

Polling vs Interrupts

| Aspect | Polling | Interrupts | |---|---|---| | Latency | Can be very low (busy-wait) | Context switch overhead | | CPU usage | 100% while polling | Only during ISR | | Throughput | Good for high-rate devices | Good for low-rate devices | | Complexity | Simple | Complex (ISR, priority, nesting) |

Modern approach: Hybrid — poll during high-traffic, interrupt during low-traffic. Linux NAPI (network) uses this: switch to polling when packet rate is high, back to interrupts when rate drops.

DPDK: Kernel bypass for networking. Pure polling on dedicated cores. Achieves maximum throughput for high-speed networking.

Memory-Mapped I/O vs Port-Mapped I/O

Memory-Mapped I/O (MMIO)

Device registers are mapped into the physical address space. Access them with regular load/store instructions.

volatile uint32_t *uart_data = (uint32_t *)0x40001000;
*uart_data = 'A';  // Write to UART

Advantages: No special instructions. Full addressing mode support. Works with any architecture. Can use cache (though usually marked non-cacheable).

Used by: ARM, RISC-V, most modern architectures.

Port-Mapped I/O (PMIO)

Separate I/O address space accessed with special instructions (IN, OUT on x86).

outb(0x60, data);   // Write to I/O port 0x60
data = inb(0x60);   // Read from I/O port 0x60

Used by: x86 (legacy devices). Modern x86 devices use MMIO via PCIe.

Applications in CS

  • Device drivers: Implement the software interface to hardware devices. Must handle interrupts, DMA, and MMIO correctly.
  • Storage systems: Understanding I/O performance (IOPS, bandwidth, latency) is critical for database and file system design.
  • Networking: NIC design, interrupt coalescing, NAPI polling, kernel bypass (DPDK, io_uring).
  • Virtualization: Device emulation, paravirtualization (virtio), SR-IOV for direct device assignment.
  • Embedded systems: GPIO, UART, SPI, I2C — all I/O interfaces to peripherals.
  • Real-time systems: Interrupt latency bounds, DMA completion timing.
  • Operating systems: I/O scheduler, buffer management, interrupt handling, device driver framework.