Advanced Rendering

Overview

Advanced rendering techniques build on the basic pipeline to achieve higher visual quality, handle complex lighting scenarios, and optimize performance. These methods range from alternative pipeline architectures to screen-space effects and volumetric rendering.

Deferred Rendering

Motivation

Forward rendering evaluates lighting for every fragment, even those that are subsequently occluded. With many lights, this becomes O(fragments x lights), which is extremely expensive.

G-Buffer

Deferred rendering splits shading into two passes:

Geometry pass: Render scene geometry and store surface attributes into multiple render targets (the G-Buffer):

G-Buffer contents (typical):
  RT0: Albedo.rgb + Metallic          (RGBA8)
  RT1: Normal.xyz + Roughness         (RGB10A2 or RGBA16F)
  RT2: Emissive.rgb + AO             (RGBA8)
  Depth: Hardware depth buffer         (D32F)

Lighting pass: Render a full-screen quad (or light volumes). For each pixel, read G-Buffer data and compute lighting. Each light only shades the pixels it actually affects.

Advantages and Disadvantages

Advantages:

Lighting cost is O(pixels x lights_affecting_pixel)
Decouples geometry complexity from lighting complexity
Easy to add many lights

Disadvantages:

High memory bandwidth (reading multiple textures per pixel)
Difficult to handle transparency (no blending in G-Buffer)
MSAA is expensive (must store G-Buffer per sample)
Limited material variety (G-Buffer layout constrains material parameters)

Light Volumes

Instead of a full-screen pass per light, render the bounding geometry of each light:

Point lights: sphere
Spot lights: cone
Directional lights: full-screen quad

Stencil optimization: mark pixels inside the light volume to skip shading where the light has no effect.

Forward+ (Tiled Forward)

Combines forward rendering's flexibility with deferred's light culling.

Algorithm

Depth pre-pass: Render scene depth only
Light culling (compute shader): Divide the screen into tiles (e.g., 16x16 pixels). For each tile, test all lights against the tile's frustum and min/max depth. Build a per-tile light list.
Shading pass: Forward-render the scene. Each fragment reads its tile's light list and shades only relevant lights.

tile_index = (pixel.x / TILE_SIZE) + (pixel.y / TILE_SIZE) * num_tiles_x
light_list = tile_light_lists[tile_index]
for each light_index in light_list:
    color += shade(fragment, lights[light_index])

Advantages over Deferred

Supports MSAA natively
Handles transparency
No G-Buffer bandwidth cost
More material flexibility

Clustered Shading

Extends tiling into 3D by subdividing the frustum along the depth axis (log-spaced slices). Each cluster is a small frustum volume. Reduces false positives from depth discontinuities within tiles.

Tile-Based Rendering (Mobile GPUs)

Mobile GPUs (Arm Mali, Apple, Qualcomm Adreno) divide the framebuffer into tiles processed in on-chip memory.

Geometry pass: Bin all triangles into tiles
Per-tile rendering: For each tile, rasterize and shade entirely in on-chip tile memory
Resolve: Write tile memory to main framebuffer

Implications for developers:

Minimize render target loads/stores (use LOAD_CLEAR and STORE_DONT_CARE)
Use subpasses for deferred rendering to keep data on-chip
Avoid reading from the framebuffer mid-pass

Screen-Space Effects

Screen-Space Ambient Occlusion (SSAO)

Approximates ambient occlusion using only depth buffer information.

Algorithm (Crytek-style):

For each pixel, sample random points in a hemisphere around the surface normal
Project each sample to screen space, read its depth
If the sample is closer than the stored depth, it contributes occlusion

occlusion = 0
for each sample_i in kernel:
    sample_pos = fragment_pos + TBN * sample_i * radius
    projected = project(sample_pos)       // to screen space
    sample_depth = depth_buffer[projected.xy]
    if sample_depth >= projected.z:       // sample is occluded
        occlusion += range_check(sample_depth, fragment_depth)
ao = 1.0 - occlusion / num_samples

Noise from random kernel rotation is removed with a bilateral blur pass.

GTAO (Ground Truth AO): Integrates the visibility function along horizon angles in screen space. More accurate than random sampling, fewer artifacts.

HBAO+ (Horizon-Based AO): Ray-marches along the depth buffer in multiple directions per pixel. Physically motivated approach based on the horizon angle.

Screen-Space Reflections (SSR)

Ray-march along the reflection vector in screen space using the depth buffer.

Hi-Z tracing: Use a hierarchical depth buffer (min-max mip chain) for faster traversal. Start at a coarse mip level, step large intervals; refine at finer levels when a potential intersection is found.

Limitations: cannot reflect off-screen content, fails at edges, noisy. Fall back to reflection probes or IBL where SSR fails.

Screen-Space Global Illumination (SSGI)

Trace short rays in screen space to approximate one bounce of indirect lighting. Combines SSR-like tracing with diffuse sampling.

Post-Processing

Bloom

Simulates light bleeding from bright areas.

1. Extract bright pixels: bright_pass = max(color - threshold, 0)
2. Downsample progressively (e.g., 1/2, 1/4, 1/8, 1/16 resolution)
3. Apply Gaussian blur at each level
4. Upsample and accumulate all levels
5. Add to the original image

The multi-resolution approach captures bloom at different scales efficiently.

Tone Mapping

Maps HDR radiance values to displayable LDR range [0,1].

Common operators:

Reinhard:           L_mapped = L / (1 + L)
Reinhard extended:  L_mapped = L * (1 + L/L_white^2) / (1 + L)
ACES filmic:        Attempt to match film response curves
Uncharted 2:       Custom curve with toe, shoulder, and linear section
AgX:               Modern filmic mapping with improved color handling

HDR Rendering Pipeline

Render to HDR buffer (RGBA16F) --> SSAO --> Lighting --> Bloom -->
Auto-exposure --> Tone mapping --> Color grading --> Gamma/sRGB output

Auto-exposure: Compute average luminance (log average or histogram), adjust exposure to match a target. Use temporal smoothing for gradual adaptation.

Depth of Field (DOF)

Simulates camera lens focus. Out-of-focus regions are blurred proportionally to the Circle of Confusion (CoC):

CoC = |aperture * focal_length * (focus_dist - z)| / (z * (focus_dist - focal_length))

Methods:

Gather-based: For each pixel, sample neighbors weighted by CoC (expensive)
Scatter-based: Splat each pixel as a bokeh-shaped sprite scaled by CoC
Separable filter: Approximate with horizontal + vertical passes at half resolution

Motion Blur

Smear pixels along their velocity vectors.

Per-object: Store velocity in a G-Buffer channel (current vs previous frame position).

velocity = (current_screen_pos - previous_screen_pos) / delta_time
blurred = 0
for i in 0..N:
    offset = velocity * (i / N - 0.5)
    blurred += sample(color_buffer, uv + offset)
blurred /= N

Camera motion blur: Derive velocity from the difference in view-projection matrices between frames.

Volumetric Rendering

Participating Media

Media (fog, smoke, clouds) that absorb, emit, and scatter light. Governed by the radiative transfer equation:

dL/ds = -sigma_t * L + sigma_a * L_e + sigma_s * integral f_p(w, w') L(w') dw'

Where:

sigma_t = sigma_a + sigma_s (extinction = absorption + scattering)
f_p = phase function (angular scattering distribution)

Beer-Lambert Law

Transmittance through homogeneous media:

T(d) = exp(-sigma_t * d)

Phase Functions

Henyey-Greenstein: Single parameter g in [-1, 1] controlling forward/backward scattering:

f_HG(cos_theta) = (1 - g^2) / (4*pi * (1 + g^2 - 2g*cos_theta)^(3/2))

g=0 is isotropic, g>0 is forward scattering (typical for fog/clouds).

Ray Marching

Step along a ray through the volume, accumulating color and transmittance:

color = 0, transmittance = 1
for each step along the ray:
    density = sample_volume(position)
    T_step = exp(-sigma_t * density * step_size)
    in_scatter = compute_lighting(position) * sigma_s * density
    color += transmittance * in_scatter * step_size
    transmittance *= T_step
    if transmittance < epsilon: break     // early exit

Volumetric Fog (Froxels)

Divide the view frustum into a 3D grid of frustum-aligned voxels (froxels). For each froxel:

Accumulate density and in-scattering from lights
March front-to-back, computing accumulated extinction and in-scattering
Store result in a 3D texture
Sample during shading to apply fog

This amortizes the cost across the entire view and supports temporal reprojection for stability.

Cloud Rendering

Model cloud density with layered noise (Perlin-Worley, curl noise for shape)
Ray-march through the cloud layer with adaptive step sizes
Multi-scattering approximation: use an octave-based approach where each bounce reduces energy and broadens the phase function
Beer-powder approximation: energy = 2 * exp(-d) * (1 - exp(-2d)) for the bright edges of clouds