Rasterization

Overview

Rasterization converts continuous geometric primitives (lines, triangles) into discrete fragments (pixels) for display. It is the core operation of real-time rendering pipelines and determines which pixels a primitive covers along with interpolated attribute values.

Line Drawing

Digital Differential Analyzer (DDA)

Simple approach: step along the major axis and compute the other coordinate incrementally.

dx = x1 - x0,  dy = y1 - y0
steps = max(|dx|, |dy|)
x_inc = dx / steps,  y_inc = dy / steps

x = x0, y = y0
for i in 0..steps:
    plot(round(x), round(y))
    x += x_inc
    y += y_inc

Drawback: floating-point arithmetic per pixel.

Bresenham's Line Algorithm

Integer-only algorithm. For a line with slope 0 < dy/dx < 1:

dx = x1 - x0,  dy = y1 - y0
D = 2*dy - dx           // decision parameter
y = y0

for x = x0 to x1:
    plot(x, y)
    if D > 0:
        y += 1
        D -= 2*dx
    D += 2*dy

The decision parameter D tracks accumulated error. When D > 0, step in the minor axis. Generalizes to all octants by swapping axes and adjusting signs.

Midpoint formulation: Equivalent derivation using the implicit line equation F(x,y) = dyx - dxy + c. Evaluate F at the midpoint between the two candidate pixels.

Triangle Rasterization

Edge Function Method

For a triangle with vertices v0, v1, v2, the edge function for edge (va, vb) evaluated at point p:

E(p) = (p.x - va.x)(vb.y - va.y) - (p.y - va.y)(vb.x - va.x)

A point is inside the triangle if all three edge functions have the same sign (positive for counter-clockwise winding). This is equivalent to computing the cross product of edge vectors with the point vector.

The edge function is an affine function of (x, y), enabling incremental evaluation:

E(x+1, y) = E(x, y) + (vb.y - va.y)       // step in x
E(x, y+1) = E(x, y) - (vb.x - va.x)       // step in y

Barycentric Coordinates

Any point inside a triangle can be expressed as:

p = u*v0 + v*v1 + w*v2,    where u + v + w = 1

Computed from edge functions:

u = E12(p) / E12(v0)
v = E20(p) / E20(v1)
w = E01(p) / E01(v2)

Or equivalently: u = area(p,v1,v2) / area(v0,v1,v2) and so on.

Barycentric coordinates are used to interpolate all vertex attributes (color, texture coordinates, normals) across the triangle.

Perspective-Correct Interpolation

Linear interpolation in screen space produces incorrect results for perspective projection. The correct formula:

A(p) = (u*A0/w0 + v*A1/w1 + w_bary*A2/w2) / (u/w0 + v/w1 + w_bary/w2)

Where w0, w1, w2 are the clip-space w values at each vertex. The GPU performs this by interpolating A/w and 1/w linearly, then dividing.

Top-Left Fill Rule

To avoid double-drawing pixels on shared edges between adjacent triangles, the top-left rule states: a pixel is drawn only if it falls on a "top" edge (horizontal, to the left) or a "left" edge (going up). Implemented by biasing the edge function by a small amount.

Tiled/Hierarchical Rasterization

Modern GPUs use a two-level approach:

Coarse rasterization: Test tile coverage (e.g., 8x8 pixel blocks) against the triangle
Fine rasterization: Test individual pixels within covered tiles

This reduces the number of edge function evaluations dramatically.

Anti-Aliasing

Aliasing occurs because pixels are discrete samples of a continuous signal. Jagged edges ("jaggies") and Moire patterns result from undersampling.

SSAA (Supersampling Anti-Aliasing)

Render at a higher resolution (e.g., 2x or 4x), then downsample with a box or tent filter. Highest quality but extremely expensive: 4x SSAA requires 4x the fill rate and memory.

MSAA (Multisample Anti-Aliasing)

Sample coverage at multiple subpixel locations per pixel, but run the fragment shader only once per pixel (at the centroid or pixel center). Coverage mask determines how much of each sample the triangle covers.

For each pixel with N samples:
    coverage_mask = 0
    for each sample point s_i:
        if triangle_covers(s_i):
            coverage_mask |= (1 << i)
    if coverage_mask != 0:
        color = run_fragment_shader(pixel_center)
        for each covered sample:
            framebuffer[sample] = color     // if depth test passes

Cost: N depth/stencil samples but only 1 shader invocation per pixel. Common counts: 2x, 4x, 8x.

FXAA (Fast Approximate Anti-Aliasing)

Post-processing filter applied to the final image. Detects edges using luminance contrast, then blends along the edge direction. Very cheap (single full-screen pass) but blurs fine detail.

TAA (Temporal Anti-Aliasing)

Jitters the camera subpixel position each frame and accumulates results over multiple frames using a history buffer. Uses motion vectors to reproject previous frame data.

color = alpha * current_sample + (1 - alpha) * reproject(history, motion_vector)

Challenges: ghosting on fast-moving objects, disocclusion. Mitigated by neighborhood clamping/clipping of the history sample to the current frame's color bounding box.

DLSS / FSR

Neural network (DLSS) or spatial upscaling (FSR) techniques that render at lower resolution and reconstruct a higher-resolution output. Combine temporal accumulation with learned or heuristic upsampling.

Z-Buffer (Depth Buffer)

Per-pixel depth test for hidden surface removal.

for each fragment at (x, y) with depth z:
    if z < depth_buffer[x][y]:       // closer to camera
        depth_buffer[x][y] = z
        color_buffer[x][y] = fragment_color

Depth Buffer Formats

Format	Bits	Range	Notes
D16	16	[0, 1]	Low precision, mobile use
D24	24	[0, 1]	Common desktop format
D32F	32	floating-point	Best precision, reverse-Z
D24S8	32	24 depth + 8 stencil	Combined format

Depth Precision

After perspective projection, depth is distributed nonlinearly:

z_buffer = (1/z - 1/near) / (1/far - 1/near)

Most precision is near the near plane. Solutions:

Push near plane as far as possible
Reverse-Z with floating-point buffer
Logarithmic depth: write log(z/near) / log(far/near) in the vertex shader

Alpha Blending

Compositing transparent surfaces over opaque ones:

C_out = alpha_src * C_src + (1 - alpha_src) * C_dst
A_out = alpha_src + (1 - alpha_src) * alpha_dst

Order Dependency

Alpha blending is order-dependent. Correct rendering requires back-to-front sorting. Alternatives:

Depth peeling: Multiple passes to peel layers front-to-back
Weighted blended OIT: Approximate order-independent transparency using weighted sums
Per-pixel linked lists: Store all fragments per pixel in a GPU buffer, sort and composite

Pre-multiplied Alpha

Store color as (RA, GA, B*A, A). Blending simplifies to:

C_out = C_src + (1 - alpha_src) * C_dst

Avoids dark fringes at edges and composes correctly under filtering.

Stencil Buffer

An 8-bit per-pixel buffer for masking and counting operations. Common uses:

Stencil shadows: Mark shadow volumes by incrementing/decrementing stencil on front/back faces (Carmack's reverse)
Portal rendering: Restrict rendering to a portal shape
Decals: Mask decal rendering to specific surfaces
Outline rendering: Render object to stencil, then draw a slightly larger version where stencil != written value

Operations

Stencil test compares a reference value against the buffer using a comparison function. On pass/fail, the buffer can be incremented, decremented, zeroed, replaced, inverted, or kept.

if stencil_func(stencil_buffer[x][y], ref, mask):
    apply stencil_op_pass
    proceed to depth test
else:
    apply stencil_op_fail
    discard fragment

Scanline vs. Fragment-Parallel Rasterization

Scanline: Process one row of pixels at a time (historical CPUs)
Fragment-parallel: Modern GPUs process many fragments simultaneously using SIMD; the edge function method maps naturally to parallel evaluation across a tile of pixels