5 min read
On this page

Timing and Synchronization

Correct timing is the difference between a working digital system and a failing one. Most subtle hardware bugs are timing-related.

Clock Distribution

The clock signal must reach all flip-flops simultaneously and with clean edges.

Clock Tree

A balanced tree of buffers distributing the clock from a single source to all flip-flops.

        CLK source
            │
      ┌─────┼─────┐
      │     │     │
    ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
    │buf│ │buf│ │buf│
    └─┬─┘ └─┬─┘ └─┬─┘
   ┌──┼──┐ ┌┼─┐  ┌┼─┐
   FF FF FF FF FF FF FF

H-tree: A fractal tree structure providing equal-length paths to all endpoints. Used for symmetric clock distribution.

Clock tree synthesis (CTS): EDA tools automatically build and balance the clock tree during physical design.

Clock Skew

Skew is the difference in clock arrival times at different flip-flops.

Skew = t_CLK(FF₂) - t_CLK(FF₁)

Positive skew (receiving FF gets clock later): Helps setup time, hurts hold time. Negative skew (receiving FF gets clock earlier): Hurts setup time, helps hold time.

Timing Constraints with Skew

Setup: tCQ + t_comb + tsu ≤ tCLK + t_skew

Hold: tCQ + t_comb_min ≥ thd + t_skew

Zero-skew design: The ideal. Clock tree synthesis tries to minimize skew.

Useful skew: Deliberately adding skew to critical paths can improve timing. But makes hold timing harder.

Clock Jitter

Jitter is the variation in clock edge timing from cycle to cycle.

Actual period = T_nominal ± jitter

Sources: PLL noise, power supply noise, thermal noise, electromagnetic interference.

Jitter reduces the effective clock period for setup analysis:

t_effective = T - t_jitter

Metastability

When a flip-flop's setup or hold time is violated, it may enter a metastable state — balanced between 0 and 1.

Q: ────╱╲╱╲╱╲──── (oscillates before settling)

Resolution time: The flip-flop eventually settles to 0 or 1, but the time is unbounded (exponentially decaying probability of remaining metastable).

P(metastable after time t) = e^(-t/τ)

where τ is the metastability time constant (technology-dependent, ~20-100 ps for modern CMOS).

Mean Time Between Failures (MTBF):

MTBF = e^(t_resolution / τ) / (f_CLK × f_data × T₀)

Longer resolution time → exponentially higher MTBF.

Synchronizers

The Synchronization Problem

When a signal crosses from one clock domain to another (or from an asynchronous source), setup/hold violations are inevitable — you can't guarantee timing alignment.

Never use an asynchronous signal directly as a flip-flop input in a synchronous design.

Two-Flop Synchronizer

The simplest and most common solution:

Async input → [FF₁] → [FF₂] → Synchronized output
                ↑         ↑
               CLK       CLK

FF₁ may go metastable, but it has a full clock period to resolve before FF₂ samples it.

MTBF: With two stages, MTBF typically exceeds the lifetime of the universe for reasonable clock frequencies.

Latency: 2 clock cycles.

Limitation: Only works for single-bit signals. For multi-bit data, use other techniques.

Multi-Bit Synchronization

Problem: Two-flop synchronizers on individual bits of a multi-bit bus can produce inconsistent intermediate values (one bit resolves to the new value, another to the old).

Solutions:

  • Gray code: Encode multi-bit values so only one bit changes per transition. Then synchronize each bit independently.
  • Handshake: Use request/acknowledge signals (synchronized) to transfer data.
  • Asynchronous FIFO: Dual-clock FIFO with Gray-coded pointers synchronized between domains.
  • Pulse synchronizer: For single-pulse events crossing domains.

Clock Domain Crossing (CDC)

Asynchronous FIFO

The standard solution for transferring data between clock domains:

Write domain          Read domain
  (CLK_W)               (CLK_R)
    │                      │
    ▼                      ▼
[Write ptr] ─sync→ [Read ptr copy]
    │                      │
    ▼                      ▼
  [RAM]  ◄────────────►  [RAM]
    │                      │
    ▼                      ▼
[Write ptr copy] ←sync─ [Read ptr]
    │                      │
    ▼                      ▼
  Full?                  Empty?

Key: Pointers are Gray-coded so synchronization of individual bits is safe.

Full condition: Write pointer + 1 = read pointer (in write domain, using synchronized read pointer). Empty condition: Read pointer = write pointer (in read domain, using synchronized write pointer).

Reset Synchronization

An asynchronous reset must be synchronized to the clock domain:

Asynchronous assert, synchronous deassert: Reset asserts immediately (asynchronous) but is released through a synchronizer to avoid metastability on deassert.

async_reset → [FF₁] → [FF₂] → sync_reset
               CLK     CLK
(FF₁ and FF₂ have async reset inputs)

Asynchronous Design

Motivation

Synchronous design requires a global clock — which is increasingly difficult at high frequencies and large chip sizes. Asynchronous design eliminates the clock.

Handshaking Protocols

Four-phase (return-to-zero):

1. Sender asserts request + data
2. Receiver acknowledges
3. Sender deasserts request
4. Receiver deasserts acknowledge

Two-phase (transition signaling):

1. Sender toggles request (data valid)
2. Receiver toggles acknowledge (data consumed)

Two-phase is faster (no return-to-zero) but harder to implement.

Delay-Insensitive Design

Circuits that work correctly regardless of gate and wire delays. Very conservative but highly robust.

Dual-rail encoding: Each bit uses two wires (true, false). Valid data = exactly one wire high. Spacer = both low. Self-timed: data arrival is implicit in the encoding.

Advantages/Disadvantages

Advantages: No clock distribution, no clock skew, lower power (activity-based), no worst-case timing constraints, natural for mixed-timing systems.

Disadvantages: More complex design and verification, larger area (dual-rail), harder to achieve high performance, limited EDA tool support.

Practical Use

Most designs are globally asynchronous, locally synchronous (GALS): Synchronous islands connected by asynchronous interfaces (FIFOs, handshakes).

PLL and Clock Management

Phase-Locked Loop (PLL)

Generates a clock signal locked to a reference:

Reference → Phase Detector → Loop Filter → VCO → Output Clock
                ↑                                    │
                └────────── Frequency Divider ←──────┘

Functions:

  • Frequency multiplication: Output = N × reference
  • Frequency division: Output = reference / M
  • Jitter filtering: Clean up a noisy clock
  • Phase alignment: Lock output phase to reference

Clock Generation Example

External crystal: 25 MHz. PLL generates:

  • CPU clock: 3.0 GHz (120× multiplication)
  • Memory clock: 800 MHz
  • Bus clock: 100 MHz

All phase-locked to the same reference (coherent).

Applications in CS

  • Multi-core processors: Each core may have its own clock domain. CDC techniques synchronize shared caches and interconnects.
  • I/O interfaces: External signals (USB, PCIe, Ethernet) are asynchronous to the internal clock. Synchronizers and elastic buffers handle CDC.
  • DRAM controllers: DRAM has its own clock. The controller bridges the CPU and memory clock domains.
  • Network-on-Chip: Multiple clock domains in SoCs. Asynchronous FIFOs at domain boundaries.
  • FPGA design: Multiple clock domains are common. CDC verification tools (Questa CDC, SpyGlass) catch synchronization bugs.
  • Low-power design: Clock gating stops the clock to inactive modules. Power domains with different frequencies.
  • High-reliability systems: Triple modular redundancy (TMR) with voting. Radiation-hardened designs with careful timing margins.