5 min read
On this page

Memory and Programmable Logic

Memory stores data, and programmable logic implements custom functions. Together they form the storage and computation fabric of digital systems.

Semiconductor Memory

SRAM (Static RAM)

Each bit stored in a cross-coupled inverter pair (6 transistors per cell — 6T SRAM).

Properties:

  • Fast (sub-nanosecond access for on-chip)
  • Data retained as long as power is on
  • No refresh needed
  • Larger area per bit than DRAM
  • Used for: CPU caches (L1, L2, L3), register files, small embedded memories

6T SRAM Cell: Two cross-coupled inverters form a bistable element. Two access transistors connect to bitlines when the wordline is active.

DRAM (Dynamic RAM)

Each bit stored as charge on a tiny capacitor (1 transistor + 1 capacitor per cell).

Properties:

  • High density (much smaller than SRAM per bit)
  • Slower than SRAM
  • Must be refreshed every ~64ms (charge leaks)
  • Used for: Main memory (DDR4, DDR5, LPDDR)
  • Organized in rows/columns. Opening a row (row activation) is expensive.

Access pattern: Row buffer acts as a cache. Sequential accesses within a row are fast (CAS latency). Random row accesses are slow (RAS + CAS).

ROM (Read-Only Memory)

Programmed during manufacturing. Cannot be changed.

Types progression:

| Type | Programmability | Erasure | |---|---|---| | ROM | Factory only | Never | | PROM | Once (fuse-based) | Never | | EPROM | Multiple (UV erasure) | UV light (whole chip) | | EEPROM | Multiple (electrical) | Byte-level, in-system | | Flash | Multiple (electrical) | Block-level, in-system |

Flash Memory

The dominant non-volatile storage technology.

NOR Flash: Random access, byte-programmable. Used for code storage (firmware, boot ROM). Fast read, slow write.

NAND Flash: Sequential access, page-programmable. Used for bulk storage (SSD, USB drives, SD cards). Higher density, lower cost.

Key operations:

  • Read: Fast (NOR: ~100ns, NAND: ~25μs for page)
  • Program (write): Slow (~200-500μs per page)
  • Erase: Very slow (~1-2ms per block), must erase whole block

Wear: Each cell has limited program/erase cycles (SLC: ~100K, MLC: ~10K, TLC: ~3K, QLC: ~1K). Wear leveling in SSD firmware distributes writes evenly.

Cells per bit:

  • SLC (1 bit/cell): Fastest, most durable, most expensive
  • MLC (2 bits/cell): Moderate
  • TLC (3 bits/cell): Higher density, slower
  • QLC (4 bits/cell): Highest density, slowest, least durable

Memory Organization

Address Decoding

For a memory with 2ⁿ locations:

  • n address bits select a location
  • A decoder activates the corresponding wordline
  • Data appears on bitlines

Example: 1K × 8 memory (1024 locations, each 8 bits):

  • 10 address lines (A₀-A₉)
  • 8 data lines (D₀-D₇)
  • Control: CS (chip select), WE (write enable), OE (output enable)

Memory Hierarchy Connection

Registers (flip-flops) → L1 Cache (SRAM) → L2/L3 (SRAM) → Main Memory (DRAM) → Storage (Flash/HDD)

Each level: larger, slower, cheaper per bit.

Programmable Logic Devices

PLA (Programmable Logic Array)

Two programmable planes:

  1. AND array: Programmable connections to create product terms
  2. OR array: Programmable connections to OR product terms into outputs

Implements any SOP form. Very flexible but relatively expensive.

PAL (Programmable Array Logic)

Programmable AND array + fixed OR array. Less flexible than PLA but faster and cheaper.

Each output OR gate has a fixed number of AND terms. If a function needs more terms, it must be decomposed.

GAL (Generic Array Logic)

Like PAL but electrically erasable and reprogrammable. Each output has a configurable macrocell (can be registered or combinational, active high or low).

CPLD (Complex Programmable Logic Device)

Multiple PAL-like blocks interconnected by a global routing matrix.

Properties:

  • Deterministic timing (predictable delays)
  • Non-volatile configuration (instant-on)
  • Moderate complexity (hundreds to thousands of logic elements)
  • Used for: Glue logic, I/O interfacing, simple state machines

FPGA (Field-Programmable Gate Array)

The most flexible programmable logic device.

Architecture:

┌─────────────────────────┐
│  I/O Block  I/O Block   │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐│
│ │CLB│ │CLB│ │CLB│ │CLB││
│ └───┘ └───┘ └───┘ └───┘│
│    Routing channels     │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐│
│ │CLB│ │CLB│ │CLB│ │CLB││
│ └───┘ └───┘ └───┘ └───┘│
│  I/O Block  I/O Block   │
└─────────────────────────┘

Configurable Logic Block (CLB): Contains:

  • Lookup Tables (LUTs): Typically 4-6 input LUTs. Each implements any Boolean function of its inputs (a small ROM).
  • Flip-flops: For registered outputs
  • MUXes: For routing and additional logic
  • Carry chains: Fast arithmetic

Routing: Programmable interconnect (switch matrices, routing channels). Often the performance bottleneck.

Hard blocks: Modern FPGAs include dedicated blocks:

  • Block RAM (BRAM): On-chip SRAM blocks
  • DSP slices: Multiply-accumulate units
  • PLL/MMCM: Clock management
  • Transceivers: High-speed serial I/O
  • PCIe, Ethernet, DDR controllers

Configuration: Stored in SRAM (volatile — loaded from flash at power-up) or flash-based (non-volatile).

Modern FPGAs: Millions of LUTs, hundreds of MHz, used for:

  • Prototyping ASICs
  • Acceleration (ML inference, network processing, HFT)
  • Embedded systems
  • Signal processing
  • Cryptographic acceleration

FPGA vs ASIC

| Aspect | FPGA | ASIC | |---|---|---| | Development cost | Low | Very high (mask costs) | | Unit cost | High | Low (at volume) | | Performance | Moderate | Highest | | Power efficiency | Moderate | Best | | Time to market | Fast (weeks) | Slow (months) | | Flexibility | Reprogrammable | Fixed | | Best for | Low volume, prototyping | High volume production |

Applications in CS

  • Caching: SRAM for L1/L2/L3 caches. Design tradeoffs between size, speed, and associativity.
  • Main memory: DRAM for system RAM. DDR generations (DDR3/4/5) increase bandwidth.
  • Storage: Flash-based SSDs. FTL (Flash Translation Layer) manages wear leveling and garbage collection.
  • FPGA acceleration: AWS F1 instances, Microsoft Catapult (Bing search), network processing (SmartNICs).
  • FPGA for ML: Inference acceleration with quantized models. Xilinx Vitis AI, Intel OpenVINO.
  • Prototyping: Verify ASIC designs on FPGA before tape-out.
  • Embedded systems: CPLDs/FPGAs for custom I/O interfaces, protocol bridges.
  • Cryptocurrency mining: ASICs for SHA-256 (Bitcoin), FPGAs for other algorithms.