6 min read
On this page

Datapath and Control

The datapath performs data operations (arithmetic, memory access, register read/write). The control unit orchestrates the datapath by generating the right signals at the right time.

Single-Cycle Datapath

Each instruction completes in one clock cycle. The clock period must accommodate the slowest instruction.

Components

┌──────┐    ┌──────────┐    ┌─────┐    ┌────────┐    ┌──────────┐
│Instr │    │ Register │    │     │    │  Data  │    │ Register │
│Memory│───→│  File    │───→│ ALU │───→│ Memory │───→│  File    │
│(IM)  │    │  (RF)    │    │     │    │  (DM)  │    │  (WB)    │
└──────┘    └──────────┘    └─────┘    └────────┘    └──────────┘
    ↑                           ↑
    │                           │
   PC                       Control

Instruction Execution Steps

R-type (ADD, SUB, AND, OR):

  1. Fetch instruction from IM at address PC
  2. Read two source registers from RF
  3. ALU performs the operation
  4. Write result to destination register in RF
  5. PC ← PC + 4

I-type Load (LW):

  1. Fetch instruction
  2. Read base register from RF
  3. ALU computes address: base + sign-extended offset
  4. Read data from DM at computed address
  5. Write loaded data to destination register
  6. PC ← PC + 4

S-type Store (SW):

  1. Fetch instruction
  2. Read base register and data register from RF
  3. ALU computes address: base + sign-extended offset
  4. Write data to DM at computed address
  5. PC ← PC + 4

B-type Branch (BEQ):

  1. Fetch instruction
  2. Read two registers from RF
  3. ALU compares (subtract, check zero flag)
  4. If condition met: PC ← PC + sign-extended offset × 2
  5. If not: PC ← PC + 4

Datapath Control Signals

| Signal | Purpose | |---|---| | RegWrite | Enable writing to register file | | ALUSrc | Select ALU input: register or immediate | | ALUOp | Select ALU operation (add, sub, and, etc.) | | MemRead | Enable reading from data memory | | MemWrite | Enable writing to data memory | | MemToReg | Select write-back source: ALU result or memory data | | Branch | Enable branch (PC update from branch target) | | Jump | Force PC to jump target |

Performance

Clock period = longest instruction delay = tIM + tRF_read + tALU + tDM + tRF_write.

Typically the load instruction is the critical path.

Problem: Simple instructions (ADD) take the same time as complex ones (LOAD). The clock is limited by the worst case. Very wasteful.

Multi-Cycle Datapath

Break each instruction into multiple shorter steps, one per clock cycle. Different instructions take different numbers of cycles.

Steps (5 stages)

  1. Instruction Fetch (IF): IR ← IM[PC]; PC ← PC + 4
  2. Instruction Decode / Register Read (ID): Read registers; decode control signals; compute branch target
  3. Execute (EX): ALU operation (arithmetic, address calculation, comparison)
  4. Memory Access (MEM): Read/write data memory (only for load/store)
  5. Write Back (WB): Write result to register file

Cycle Counts

| Instruction | Cycles | Steps Used | |---|---|---| | R-type (ADD) | 4 | IF, ID, EX, WB | | Load (LW) | 5 | IF, ID, EX, MEM, WB | | Store (SW) | 4 | IF, ID, EX, MEM | | Branch (BEQ) | 3 | IF, ID, EX | | Jump (J) | 3 | IF, ID, EX |

Advantages over Single-Cycle

  • Shorter clock period (each stage is shorter than the longest instruction)
  • Shared hardware (one ALU, one memory — used in different cycles)
  • CPI (cycles per instruction) varies but is lower on average

Disadvantages

  • More complex control (FSM or microcode)
  • Still sequential — next instruction waits until current one finishes

Control Unit

Hardwired Control

Control signals generated by combinational logic based on opcode and current state.

Single-cycle: Pure combinational. Opcode → control signals (one-level decode).

Multi-cycle: FSM. Current state + opcode → control signals + next state.

Opcode + State → Control Logic → Control Signals + Next State

Advantages: Fast, efficient for simple ISAs. Disadvantages: Hard to modify, complex for large ISAs.

Microprogrammed Control

Control signals stored in a control memory (microcode ROM). Each micro-instruction specifies control signals for one cycle.

Microprogram Counter → Control Memory → Control Signals
                              ↓
                         Micro-instruction
                         (control word)

Micro-instruction fields: ALUSrc, RegWrite, MemRead, next-μPC, branch-condition, etc.

Advantages: Flexible (change behavior by updating microcode). Natural for complex ISAs (x86). Disadvantages: Slower than hardwired (ROM access time). More area.

Modern use: x86 processors use microcode for complex instructions, hardwired logic for simple/common ones. Microcode updates can fix CPU bugs post-manufacture (Intel/AMD regularly issue microcode patches).

ALU Design

The ALU performs arithmetic and logic operations.

Simple ALU

Inputs: A, B (operands), ALUOp (operation select)
Output: Result, Zero flag, Overflow flag, Carry flag

Operations:
  000: AND    (A & B)
  001: OR     (A | B)
  010: ADD    (A + B)
  011: SUB    (A - B)     [Add with B inverted + carry-in]
  100: SLT    (Set if A < B)
  101: XOR    (A ^ B)
  110: SLL    (A << B)
  111: SRL    (A >> B)

Flags / Condition Codes

| Flag | Meaning | Set When | |---|---|---| | Zero (Z) | Result is zero | Result == 0 | | Negative (N) | Result is negative | MSB of result is 1 | | Carry (C) | Unsigned overflow | Carry out of MSB | | Overflow (V) | Signed overflow | Sign of result wrong |

RISC-V note: RISC-V does not use condition codes. Instead, it uses compare-and-branch instructions (BEQ, BLT) that combine comparison and branch.

Register File Design

A register file is a small, fast memory inside the processor.

Typical Structure

32 registers × 64 bits (for RV64)
2 read ports + 1 write port (for basic pipeline)

Read: Combinational — provide register number, get value immediately. Write: Sequential — data written on clock edge when RegWrite is asserted.

Register x0 (RISC-V): Hardwired to 0. Reads always return 0. Writes are discarded. Simplifies many operations (e.g., NOP = ADD x0, x0, 0; MOV = ADD rd, rs, x0).

Multi-Ported Register Files

Superscalar processors need multiple read/write ports:

  • 2-issue: 4 read + 2 write ports
  • 4-issue: 8 read + 4 write ports

Area grows as O(ports²). At some point, a register cache or physical register file with renaming is used.

Data Hazards Overview

In a multi-cycle or pipelined processor, instructions may depend on results not yet available:

ADD x1, x2, x3    // Writes x1
SUB x4, x1, x5    // Reads x1 — but x1 not yet written!

This is a Read After Write (RAW) data hazard. Solutions:

  • Stalling: Insert bubbles (NOPs) until data is available
  • Forwarding/Bypassing: Route the result directly from where it's produced to where it's needed
  • Compiler scheduling: Reorder instructions to avoid hazards

Detailed treatment in the pipelining file.

Performance Metrics

Execution Time

CPU Time = Instruction Count × CPI × Clock Period
         = IC × CPI / Clock Rate

CPI (Cycles Per Instruction)

CPI = Σ (CPIᵢ × Fᵢ)

where CPIᵢ is cycles for instruction class i and Fᵢ is its frequency.

Single-cycle: CPI = 1, but long clock period. Multi-cycle: CPI > 1 on average, but shorter clock period. Pipelined: CPI ≈ 1 ideally (throughput of 1 instruction/cycle). Superscalar: CPI < 1 (IPC > 1 — multiple instructions per cycle).

Amdahl's Law

Speedup from improving a fraction f of execution by a factor S:

Speedup = 1 / ((1 - f) + f/S)

Consequence: Improving a small fraction of execution provides limited overall speedup. "Make the common case fast."

Applications in CS

  • Compiler optimization: Understanding the datapath helps compilers generate efficient code (instruction selection, register allocation, scheduling).
  • Performance tuning: Knowing CPI breakdown helps identify bottlenecks (compute-bound vs memory-bound).
  • Hardware design: Datapath design determines the tradeoffs of a processor (single-cycle simplicity vs multi-cycle efficiency vs pipelined throughput).
  • Emulation: Software emulators implement the ISA's datapath in software. Understanding the hardware helps write efficient emulators.
  • Security: Microcode vulnerabilities (Spectre mitigations via microcode update). Side channels through ALU timing differences.