Rasterization
Overview
Rasterization converts continuous geometric primitives (lines, triangles) into discrete fragments (pixels) for display. It is the core operation of real-time rendering pipelines and determines which pixels a primitive covers along with interpolated attribute values.
Line Drawing
Digital Differential Analyzer (DDA)
Simple approach: step along the major axis and compute the other coordinate incrementally.
dx = x1 - x0, dy = y1 - y0
steps = max(|dx|, |dy|)
x_inc = dx / steps, y_inc = dy / steps
x = x0, y = y0
for i in 0..steps:
plot(round(x), round(y))
x += x_inc
y += y_inc
Drawback: floating-point arithmetic per pixel.
Bresenham's Line Algorithm
Integer-only algorithm. For a line with slope 0 < dy/dx < 1:
dx = x1 - x0, dy = y1 - y0
D = 2*dy - dx // decision parameter
y = y0
for x = x0 to x1:
plot(x, y)
if D > 0:
y += 1
D -= 2*dx
D += 2*dy
The decision parameter D tracks accumulated error. When D > 0, step in the minor axis. Generalizes to all octants by swapping axes and adjusting signs.
Midpoint formulation: Equivalent derivation using the implicit line equation F(x,y) = dyx - dxy + c. Evaluate F at the midpoint between the two candidate pixels.
Triangle Rasterization
Edge Function Method
For a triangle with vertices v0, v1, v2, the edge function for edge (va, vb) evaluated at point p:
E(p) = (p.x - va.x)(vb.y - va.y) - (p.y - va.y)(vb.x - va.x)
A point is inside the triangle if all three edge functions have the same sign (positive for counter-clockwise winding). This is equivalent to computing the cross product of edge vectors with the point vector.
The edge function is an affine function of (x, y), enabling incremental evaluation:
E(x+1, y) = E(x, y) + (vb.y - va.y) // step in x
E(x, y+1) = E(x, y) - (vb.x - va.x) // step in y
Barycentric Coordinates
Any point inside a triangle can be expressed as:
p = u*v0 + v*v1 + w*v2, where u + v + w = 1
Computed from edge functions:
u = E12(p) / E12(v0)
v = E20(p) / E20(v1)
w = E01(p) / E01(v2)
Or equivalently: u = area(p,v1,v2) / area(v0,v1,v2) and so on.
Barycentric coordinates are used to interpolate all vertex attributes (color, texture coordinates, normals) across the triangle.
Perspective-Correct Interpolation
Linear interpolation in screen space produces incorrect results for perspective projection. The correct formula:
A(p) = (u*A0/w0 + v*A1/w1 + w_bary*A2/w2) / (u/w0 + v/w1 + w_bary/w2)
Where w0, w1, w2 are the clip-space w values at each vertex. The GPU performs this by interpolating A/w and 1/w linearly, then dividing.
Top-Left Fill Rule
To avoid double-drawing pixels on shared edges between adjacent triangles, the top-left rule states: a pixel is drawn only if it falls on a "top" edge (horizontal, to the left) or a "left" edge (going up). Implemented by biasing the edge function by a small amount.
Tiled/Hierarchical Rasterization
Modern GPUs use a two-level approach:
- Coarse rasterization: Test tile coverage (e.g., 8x8 pixel blocks) against the triangle
- Fine rasterization: Test individual pixels within covered tiles
This reduces the number of edge function evaluations dramatically.
Anti-Aliasing
Aliasing occurs because pixels are discrete samples of a continuous signal. Jagged edges ("jaggies") and Moire patterns result from undersampling.
SSAA (Supersampling Anti-Aliasing)
Render at a higher resolution (e.g., 2x or 4x), then downsample with a box or tent filter. Highest quality but extremely expensive: 4x SSAA requires 4x the fill rate and memory.
MSAA (Multisample Anti-Aliasing)
Sample coverage at multiple subpixel locations per pixel, but run the fragment shader only once per pixel (at the centroid or pixel center). Coverage mask determines how much of each sample the triangle covers.
For each pixel with N samples:
coverage_mask = 0
for each sample point s_i:
if triangle_covers(s_i):
coverage_mask |= (1 << i)
if coverage_mask != 0:
color = run_fragment_shader(pixel_center)
for each covered sample:
framebuffer[sample] = color // if depth test passes
Cost: N depth/stencil samples but only 1 shader invocation per pixel. Common counts: 2x, 4x, 8x.
FXAA (Fast Approximate Anti-Aliasing)
Post-processing filter applied to the final image. Detects edges using luminance contrast, then blends along the edge direction. Very cheap (single full-screen pass) but blurs fine detail.
TAA (Temporal Anti-Aliasing)
Jitters the camera subpixel position each frame and accumulates results over multiple frames using a history buffer. Uses motion vectors to reproject previous frame data.
color = alpha * current_sample + (1 - alpha) * reproject(history, motion_vector)
Challenges: ghosting on fast-moving objects, disocclusion. Mitigated by neighborhood clamping/clipping of the history sample to the current frame's color bounding box.
DLSS / FSR
Neural network (DLSS) or spatial upscaling (FSR) techniques that render at lower resolution and reconstruct a higher-resolution output. Combine temporal accumulation with learned or heuristic upsampling.
Z-Buffer (Depth Buffer)
Per-pixel depth test for hidden surface removal.
for each fragment at (x, y) with depth z:
if z < depth_buffer[x][y]: // closer to camera
depth_buffer[x][y] = z
color_buffer[x][y] = fragment_color
Depth Buffer Formats
| Format | Bits | Range | Notes | |-------------|------|----------------|------------------------------| | D16 | 16 | [0, 1] | Low precision, mobile use | | D24 | 24 | [0, 1] | Common desktop format | | D32F | 32 | floating-point | Best precision, reverse-Z | | D24S8 | 32 | 24 depth + 8 stencil | Combined format |
Depth Precision
After perspective projection, depth is distributed nonlinearly:
z_buffer = (1/z - 1/near) / (1/far - 1/near)
Most precision is near the near plane. Solutions:
- Push near plane as far as possible
- Reverse-Z with floating-point buffer
- Logarithmic depth: write
log(z/near) / log(far/near)in the vertex shader
Alpha Blending
Compositing transparent surfaces over opaque ones:
C_out = alpha_src * C_src + (1 - alpha_src) * C_dst
A_out = alpha_src + (1 - alpha_src) * alpha_dst
Order Dependency
Alpha blending is order-dependent. Correct rendering requires back-to-front sorting. Alternatives:
- Depth peeling: Multiple passes to peel layers front-to-back
- Weighted blended OIT: Approximate order-independent transparency using weighted sums
- Per-pixel linked lists: Store all fragments per pixel in a GPU buffer, sort and composite
Pre-multiplied Alpha
Store color as (RA, GA, B*A, A). Blending simplifies to:
C_out = C_src + (1 - alpha_src) * C_dst
Avoids dark fringes at edges and composes correctly under filtering.
Stencil Buffer
An 8-bit per-pixel buffer for masking and counting operations. Common uses:
- Stencil shadows: Mark shadow volumes by incrementing/decrementing stencil on front/back faces (Carmack's reverse)
- Portal rendering: Restrict rendering to a portal shape
- Decals: Mask decal rendering to specific surfaces
- Outline rendering: Render object to stencil, then draw a slightly larger version where stencil != written value
Operations
Stencil test compares a reference value against the buffer using a comparison function. On pass/fail, the buffer can be incremented, decremented, zeroed, replaced, inverted, or kept.
if stencil_func(stencil_buffer[x][y], ref, mask):
apply stencil_op_pass
proceed to depth test
else:
apply stencil_op_fail
discard fragment
Scanline vs. Fragment-Parallel Rasterization
- Scanline: Process one row of pixels at a time (historical CPUs)
- Fragment-parallel: Modern GPUs process many fragments simultaneously using SIMD; the edge function method maps naturally to parallel evaluation across a tile of pixels