4 min read
On this page

Graphics Pipeline

Overview

The graphics pipeline transforms 3D scene data into a 2D image on screen. Modern GPUs implement this as a series of programmable and fixed-function stages operating on streams of vertices and fragments.

Pipeline Stages

Full Graphics Pipeline Stages

High-Level Architecture

Application --> Geometry Processing --> Rasterization --> Pixel Processing --> Framebuffer

Application Stage (CPU)

  • Scene graph traversal and visibility determination
  • Input handling, physics, AI updates
  • Issues draw calls to the GPU via graphics API
  • Feeds vertex data, textures, and shader programs to the pipeline

Geometry Processing (GPU)

  1. Vertex Shader - Per-vertex transformations, skinning, displacement
  2. Tessellation (optional) - Subdivides patches into finer geometry
  3. Geometry Shader (optional) - Operates on whole primitives, can emit new geometry
  4. Clipping - Removes geometry outside the view frustum
  5. Screen Mapping - Maps NDC to window coordinates

Rasterization (Fixed-Function)

  • Determines which pixels (fragments) a primitive covers
  • Interpolates vertex attributes across the primitive using barycentric coordinates
  • Generates fragment data for the next stage

Pixel Processing (GPU)

  1. Fragment Shader - Computes per-pixel color, applies textures and lighting
  2. Depth/Stencil Test - Discards occluded or masked fragments
  3. Blending - Combines fragment color with framebuffer (for transparency)
  4. Output - Writes final color to framebuffer/render target

Coordinate Systems and Transformations

Transformation Chain

Model Space --> World Space --> View Space --> Clip Space --> NDC --> Screen Space
   (M_model)     (M_view)      (M_proj)     (persp div)  (viewport)

Each transformation is represented by a 4x4 matrix applied to homogeneous coordinates.

Model (Object) Space

Local coordinate system of each mesh. The model matrix M transforms from object space to world space:

M_model = T * R * S

Where T = translation, R = rotation, S = scale. Order matters (right-to-left application).

World Space

Common coordinate system for all objects. Positions lights and cameras in the scene.

View (Camera/Eye) Space

Camera is at the origin, looking along -Z (OpenGL convention) or +Z (DirectX).

The view matrix is the inverse of the camera's world transform:

M_view = [R|t]^(-1) = [R^T | -R^T * t]

Using lookAt(eye, center, up):

f = normalize(center - eye)        // forward
r = normalize(f x up)              // right
u = r x f                          // recalculated up

        | r_x   r_y   r_z   -r.eye |
M_view =| u_x   u_y   u_z   -u.eye |
        |-f_x  -f_y  -f_z    f.eye |
        |  0     0     0       1    |

Clip Space and Projection

Projection transforms the view frustum into a canonical clip volume.

Homogeneous Coordinates

A 3D point (x, y, z) is represented as (x, y, z, w) where w != 0. The Cartesian point is recovered by dividing: (x/w, y/w, z/w).

Key properties:

  • Points at infinity: w = 0 (represents directions/vectors)
  • Enables translation via matrix multiplication
  • Perspective division (dividing by w) produces foreshortening
| x' |   | m00 m01 m02 m03 | | x |
| y' | = | m10 m11 m12 m13 | | y |
| z' |   | m20 m21 m22 m23 | | z |
| w' |   | m30 m31 m32 m33 | | 1 |

Projection Matrices

Perspective Projection

Maps a frustum to the clip cube. After perspective division by w, produces NDC in [-1,1]^3 (OpenGL) or [-1,1]^2 x [0,1] (DirectX).

Given field of view (fov), aspect ratio (a), near (n), and far (f) planes:

t = n * tan(fov/2)          // top
b = -t                      // bottom
r = t * a                   // right
l = -r                      // left

            | 2n/(r-l)    0       (r+l)/(r-l)      0        |
M_persp =   |    0     2n/(t-b)   (t+b)/(t-b)      0        |
            |    0        0      -(f+n)/(f-n)  -2fn/(f-n)    |
            |    0        0          -1              0        |

Symmetric frustum simplification (l = -r, b = -t):

            | 1/(a*tan(fov/2))       0              0             0       |
M_persp =   |        0          1/tan(fov/2)        0             0       |
            |        0               0         -(f+n)/(f-n)  -2fn/(f-n)   |
            |        0               0             -1             0       |

After multiplication, w_clip = -z_eye, and the perspective divide produces the depth nonlinearity (more precision near the near plane).

Reverse-Z

Maps near plane to z=1 and far plane to z=0. Exploits floating-point precision distribution to reduce z-fighting at large distances. Requires a floating-point depth buffer and GL_GREATER depth test.

Orthographic Projection

No perspective foreshortening. Maps an axis-aligned box to the clip cube:

            | 2/(r-l)     0        0      -(r+l)/(r-l) |
M_ortho =   |    0     2/(t-b)     0      -(t+b)/(t-b) |
            |    0        0     -2/(f-n)   -(f+n)/(f-n) |
            |    0        0        0            1        |

Infinite Far Plane

Set f -> infinity. Useful for skyboxes and shadow maps:

M_persp_inf: replace row 3 with | 0  0  -1  -2n |

Viewport Transform

Maps NDC [-1,1]^2 to window coordinates [x, x+w] x [y, y+h]:

x_screen = (x_ndc + 1) / 2 * width + x_offset
y_screen = (y_ndc + 1) / 2 * height + y_offset
z_screen = (z_ndc + 1) / 2 * (far - near) + near     // depth range mapping

Clipping

Sutherland-Hodgman Algorithm

Clips polygons against each frustum plane sequentially. In clip space, the six frustum planes are:

-w <= x <= w
-w <= y <= w
-w <= z <= w      (OpenGL)
 0 <= z <= w      (DirectX/Vulkan)

For each plane, process each edge of the polygon:

  1. Both vertices inside: keep the second vertex
  2. First inside, second outside: output intersection point
  3. Both outside: output nothing
  4. First outside, second inside: output intersection and second vertex

Intersection parameter: t = d1 / (d1 - d2) where d is signed distance to the plane.

Guard-Band Clipping

Extends the clip region beyond the viewport to reduce the number of triangles that need geometric clipping. Only triangles extending beyond the guard band are clipped; others are simply rasterized and scissored.

Putting It All Together

The full transformation for a vertex:

v_clip = M_proj * M_view * M_model * v_local
v_ndc  = v_clip.xyz / v_clip.w
v_screen = viewport(v_ndc)

The combined Model-View-Projection (MVP) matrix is often precomputed on the CPU and passed as a uniform to the vertex shader for efficiency.

Normal Transformation

Normals do not transform the same way as positions. The correct normal matrix is:

M_normal = (M_modelview^(-1))^T

This preserves perpendicularity under non-uniform scaling. For uniform scale and rotation only, the upper-left 3x3 of the model-view matrix suffices.

Practical Considerations

  • Depth precision: Logarithmic depth or reverse-Z mitigates z-fighting
  • Early-Z: GPUs can reject fragments before the fragment shader runs if depth is not modified in the shader
  • Instancing: Reuses the same geometry with different model matrices via a single draw call
  • Indirect rendering: GPU-driven draw calls reduce CPU overhead
  • Tile-based GPUs (mobile): Divide the screen into tiles, process each tile's geometry in on-chip memory to minimize bandwidth