Memory and Programmable Logic

Memory stores data, and programmable logic implements custom functions. Together they form the storage and computation fabric of digital systems.

Semiconductor Memory

SRAM (Static RAM)

Each bit stored in a cross-coupled inverter pair (6 transistors per cell — 6T SRAM).

Properties:

Fast (sub-nanosecond access for on-chip)
Data retained as long as power is on
No refresh needed
Larger area per bit than DRAM
Used for: CPU caches (L1, L2, L3), register files, small embedded memories

6T SRAM Cell: Two cross-coupled inverters form a bistable element. Two access transistors connect to bitlines when the wordline is active.

DRAM (Dynamic RAM)

Each bit stored as charge on a tiny capacitor (1 transistor + 1 capacitor per cell).

Properties:

High density (much smaller than SRAM per bit)
Slower than SRAM
Must be refreshed every ~64ms (charge leaks)
Used for: Main memory (DDR4, DDR5, LPDDR)
Organized in rows/columns. Opening a row (row activation) is expensive.

Access pattern: Row buffer acts as a cache. Sequential accesses within a row are fast (CAS latency). Random row accesses are slow (RAS + CAS).

ROM (Read-Only Memory)

Programmed during manufacturing. Cannot be changed.

Types progression:

Type	Programmability	Erasure
ROM	Factory only	Never
PROM	Once (fuse-based)	Never
EPROM	Multiple (UV erasure)	UV light (whole chip)
EEPROM	Multiple (electrical)	Byte-level, in-system
Flash	Multiple (electrical)	Block-level, in-system

Flash Memory

The dominant non-volatile storage technology.

NOR Flash: Random access, byte-programmable. Used for code storage (firmware, boot ROM). Fast read, slow write.

NAND Flash: Sequential access, page-programmable. Used for bulk storage (SSD, USB drives, SD cards). Higher density, lower cost.

Key operations:

Read: Fast (NOR: ~100ns, NAND: ~25μs for page)
Program (write): Slow (~200-500μs per page)
Erase: Very slow (~1-2ms per block), must erase whole block

Wear: Each cell has limited program/erase cycles (SLC: ~100K, MLC: ~10K, TLC: ~3K, QLC: ~1K). Wear leveling in SSD firmware distributes writes evenly.

Cells per bit:

SLC (1 bit/cell): Fastest, most durable, most expensive
MLC (2 bits/cell): Moderate
TLC (3 bits/cell): Higher density, slower
QLC (4 bits/cell): Highest density, slowest, least durable

Memory Organization

Address Decoding

For a memory with 2ⁿ locations:

n address bits select a location
A decoder activates the corresponding wordline
Data appears on bitlines

Example: 1K × 8 memory (1024 locations, each 8 bits):

10 address lines (A₀-A₉)
8 data lines (D₀-D₇)
Control: CS (chip select), WE (write enable), OE (output enable)

Memory Hierarchy Connection

Registers (flip-flops) → L1 Cache (SRAM) → L2/L3 (SRAM) → Main Memory (DRAM) → Storage (Flash/HDD)

Each level: larger, slower, cheaper per bit.

Programmable Logic Devices

PLA (Programmable Logic Array)

Two programmable planes:

AND array: Programmable connections to create product terms
OR array: Programmable connections to OR product terms into outputs

Implements any SOP form. Very flexible but relatively expensive.

PAL (Programmable Array Logic)

Programmable AND array + fixed OR array. Less flexible than PLA but faster and cheaper.

Each output OR gate has a fixed number of AND terms. If a function needs more terms, it must be decomposed.

GAL (Generic Array Logic)

Like PAL but electrically erasable and reprogrammable. Each output has a configurable macrocell (can be registered or combinational, active high or low).

CPLD (Complex Programmable Logic Device)

Multiple PAL-like blocks interconnected by a global routing matrix.

Properties:

Deterministic timing (predictable delays)
Non-volatile configuration (instant-on)
Moderate complexity (hundreds to thousands of logic elements)
Used for: Glue logic, I/O interfacing, simple state machines

FPGA (Field-Programmable Gate Array)

The most flexible programmable logic device.

Architecture:

┌─────────────────────────┐
│  I/O Block  I/O Block   │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐│
│ │CLB│ │CLB│ │CLB│ │CLB││
│ └───┘ └───┘ └───┘ └───┘│
│    Routing channels     │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐│
│ │CLB│ │CLB│ │CLB│ │CLB││
│ └───┘ └───┘ └───┘ └───┘│
│  I/O Block  I/O Block   │
└─────────────────────────┘

Configurable Logic Block (CLB): Contains:

Lookup Tables (LUTs): Typically 4-6 input LUTs. Each implements any Boolean function of its inputs (a small ROM).
Flip-flops: For registered outputs
MUXes: For routing and additional logic
Carry chains: Fast arithmetic

Routing: Programmable interconnect (switch matrices, routing channels). Often the performance bottleneck.

Hard blocks: Modern FPGAs include dedicated blocks:

Block RAM (BRAM): On-chip SRAM blocks
DSP slices: Multiply-accumulate units
PLL/MMCM: Clock management
Transceivers: High-speed serial I/O
PCIe, Ethernet, DDR controllers

Configuration: Stored in SRAM (volatile — loaded from flash at power-up) or flash-based (non-volatile).

Modern FPGAs: Millions of LUTs, hundreds of MHz, used for:

Prototyping ASICs
Acceleration (ML inference, network processing, HFT)
Embedded systems
Signal processing
Cryptographic acceleration

FPGA vs ASIC

Aspect	FPGA	ASIC
Development cost	Low	Very high (mask costs)
Unit cost	High	Low (at volume)
Performance	Moderate	Highest
Power efficiency	Moderate	Best
Time to market	Fast (weeks)	Slow (months)
Flexibility	Reprogrammable	Fixed
Best for	Low volume, prototyping	High volume production

Applications in CS

Caching: SRAM for L1/L2/L3 caches. Design tradeoffs between size, speed, and associativity.
Main memory: DRAM for system RAM. DDR generations (DDR3/4/5) increase bandwidth.
Storage: Flash-based SSDs. FTL (Flash Translation Layer) manages wear leveling and garbage collection.
FPGA acceleration: AWS F1 instances, Microsoft Catapult (Bing search), network processing (SmartNICs).
FPGA for ML: Inference acceleration with quantized models. Xilinx Vitis AI, Intel OpenVINO.
Prototyping: Verify ASIC designs on FPGA before tape-out.
Embedded systems: CPLDs/FPGAs for custom I/O interfaces, protocol bridges.
Cryptocurrency mining: ASICs for SHA-256 (Bitcoin), FPGAs for other algorithms.