Timing and Synchronization
Correct timing is the difference between a working digital system and a failing one. Most subtle hardware bugs are timing-related.
Clock Distribution
The clock signal must reach all flip-flops simultaneously and with clean edges.
Clock Tree
A balanced tree of buffers distributing the clock from a single source to all flip-flops.
CLK source
│
┌─────┼─────┐
│ │ │
┌─┴─┐ ┌─┴─┐ ┌─┴─┐
│buf│ │buf│ │buf│
└─┬─┘ └─┬─┘ └─┬─┘
┌──┼──┐ ┌┼─┐ ┌┼─┐
FF FF FF FF FF FF FF
H-tree: A fractal tree structure providing equal-length paths to all endpoints. Used for symmetric clock distribution.
Clock tree synthesis (CTS): EDA tools automatically build and balance the clock tree during physical design.
Clock Skew
Skew is the difference in clock arrival times at different flip-flops.
Skew = t_CLK(FF₂) - t_CLK(FF₁)
Positive skew (receiving FF gets clock later): Helps setup time, hurts hold time. Negative skew (receiving FF gets clock earlier): Hurts setup time, helps hold time.
Timing Constraints with Skew
Setup: tCQ + t_comb + tsu ≤ tCLK + t_skew
Hold: tCQ + t_comb_min ≥ thd + t_skew
Zero-skew design: The ideal. Clock tree synthesis tries to minimize skew.
Useful skew: Deliberately adding skew to critical paths can improve timing. But makes hold timing harder.
Clock Jitter
Jitter is the variation in clock edge timing from cycle to cycle.
Actual period = T_nominal ± jitter
Sources: PLL noise, power supply noise, thermal noise, electromagnetic interference.
Jitter reduces the effective clock period for setup analysis:
t_effective = T - t_jitter
Metastability
When a flip-flop's setup or hold time is violated, it may enter a metastable state — balanced between 0 and 1.
Q: ────╱╲╱╲╱╲──── (oscillates before settling)
Resolution time: The flip-flop eventually settles to 0 or 1, but the time is unbounded (exponentially decaying probability of remaining metastable).
P(metastable after time t) = e^(-t/τ)
where τ is the metastability time constant (technology-dependent, ~20-100 ps for modern CMOS).
Mean Time Between Failures (MTBF):
MTBF = e^(t_resolution / τ) / (f_CLK × f_data × T₀)
Longer resolution time → exponentially higher MTBF.
Synchronizers
The Synchronization Problem
When a signal crosses from one clock domain to another (or from an asynchronous source), setup/hold violations are inevitable — you can't guarantee timing alignment.
Never use an asynchronous signal directly as a flip-flop input in a synchronous design.
Two-Flop Synchronizer
The simplest and most common solution:
Async input → [FF₁] → [FF₂] → Synchronized output
↑ ↑
CLK CLK
FF₁ may go metastable, but it has a full clock period to resolve before FF₂ samples it.
MTBF: With two stages, MTBF typically exceeds the lifetime of the universe for reasonable clock frequencies.
Latency: 2 clock cycles.
Limitation: Only works for single-bit signals. For multi-bit data, use other techniques.
Multi-Bit Synchronization
Problem: Two-flop synchronizers on individual bits of a multi-bit bus can produce inconsistent intermediate values (one bit resolves to the new value, another to the old).
Solutions:
- Gray code: Encode multi-bit values so only one bit changes per transition. Then synchronize each bit independently.
- Handshake: Use request/acknowledge signals (synchronized) to transfer data.
- Asynchronous FIFO: Dual-clock FIFO with Gray-coded pointers synchronized between domains.
- Pulse synchronizer: For single-pulse events crossing domains.
Clock Domain Crossing (CDC)
Asynchronous FIFO
The standard solution for transferring data between clock domains:
Write domain Read domain
(CLK_W) (CLK_R)
│ │
▼ ▼
[Write ptr] ─sync→ [Read ptr copy]
│ │
▼ ▼
[RAM] ◄────────────► [RAM]
│ │
▼ ▼
[Write ptr copy] ←sync─ [Read ptr]
│ │
▼ ▼
Full? Empty?
Key: Pointers are Gray-coded so synchronization of individual bits is safe.
Full condition: Write pointer + 1 = read pointer (in write domain, using synchronized read pointer). Empty condition: Read pointer = write pointer (in read domain, using synchronized write pointer).
Reset Synchronization
An asynchronous reset must be synchronized to the clock domain:
Asynchronous assert, synchronous deassert: Reset asserts immediately (asynchronous) but is released through a synchronizer to avoid metastability on deassert.
async_reset → [FF₁] → [FF₂] → sync_reset
CLK CLK
(FF₁ and FF₂ have async reset inputs)
Asynchronous Design
Motivation
Synchronous design requires a global clock — which is increasingly difficult at high frequencies and large chip sizes. Asynchronous design eliminates the clock.
Handshaking Protocols
Four-phase (return-to-zero):
1. Sender asserts request + data
2. Receiver acknowledges
3. Sender deasserts request
4. Receiver deasserts acknowledge
Two-phase (transition signaling):
1. Sender toggles request (data valid)
2. Receiver toggles acknowledge (data consumed)
Two-phase is faster (no return-to-zero) but harder to implement.
Delay-Insensitive Design
Circuits that work correctly regardless of gate and wire delays. Very conservative but highly robust.
Dual-rail encoding: Each bit uses two wires (true, false). Valid data = exactly one wire high. Spacer = both low. Self-timed: data arrival is implicit in the encoding.
Advantages/Disadvantages
Advantages: No clock distribution, no clock skew, lower power (activity-based), no worst-case timing constraints, natural for mixed-timing systems.
Disadvantages: More complex design and verification, larger area (dual-rail), harder to achieve high performance, limited EDA tool support.
Practical Use
Most designs are globally asynchronous, locally synchronous (GALS): Synchronous islands connected by asynchronous interfaces (FIFOs, handshakes).
PLL and Clock Management
Phase-Locked Loop (PLL)
Generates a clock signal locked to a reference:
Reference → Phase Detector → Loop Filter → VCO → Output Clock
↑ │
└────────── Frequency Divider ←──────┘
Functions:
- Frequency multiplication: Output = N × reference
- Frequency division: Output = reference / M
- Jitter filtering: Clean up a noisy clock
- Phase alignment: Lock output phase to reference
Clock Generation Example
External crystal: 25 MHz. PLL generates:
- CPU clock: 3.0 GHz (120× multiplication)
- Memory clock: 800 MHz
- Bus clock: 100 MHz
All phase-locked to the same reference (coherent).
Applications in CS
- Multi-core processors: Each core may have its own clock domain. CDC techniques synchronize shared caches and interconnects.
- I/O interfaces: External signals (USB, PCIe, Ethernet) are asynchronous to the internal clock. Synchronizers and elastic buffers handle CDC.
- DRAM controllers: DRAM has its own clock. The controller bridges the CPU and memory clock domains.
- Network-on-Chip: Multiple clock domains in SoCs. Asynchronous FIFOs at domain boundaries.
- FPGA design: Multiple clock domains are common. CDC verification tools (Questa CDC, SpyGlass) catch synchronization bugs.
- Low-power design: Clock gating stops the clock to inactive modules. Power domains with different frequencies.
- High-reliability systems: Triple modular redundancy (TMR) with voting. Radiation-hardened designs with careful timing margins.