Advanced Rendering
Overview
Advanced rendering techniques build on the basic pipeline to achieve higher visual quality, handle complex lighting scenarios, and optimize performance. These methods range from alternative pipeline architectures to screen-space effects and volumetric rendering.
Deferred Rendering
Motivation
Forward rendering evaluates lighting for every fragment, even those that are subsequently occluded. With many lights, this becomes O(fragments x lights), which is extremely expensive.
G-Buffer
Deferred rendering splits shading into two passes:
Geometry pass: Render scene geometry and store surface attributes into multiple render targets (the G-Buffer):
G-Buffer contents (typical):
RT0: Albedo.rgb + Metallic (RGBA8)
RT1: Normal.xyz + Roughness (RGB10A2 or RGBA16F)
RT2: Emissive.rgb + AO (RGBA8)
Depth: Hardware depth buffer (D32F)
Lighting pass: Render a full-screen quad (or light volumes). For each pixel, read G-Buffer data and compute lighting. Each light only shades the pixels it actually affects.
Advantages and Disadvantages
Advantages:
- Lighting cost is O(pixels x lights_affecting_pixel)
- Decouples geometry complexity from lighting complexity
- Easy to add many lights
Disadvantages:
- High memory bandwidth (reading multiple textures per pixel)
- Difficult to handle transparency (no blending in G-Buffer)
- MSAA is expensive (must store G-Buffer per sample)
- Limited material variety (G-Buffer layout constrains material parameters)
Light Volumes
Instead of a full-screen pass per light, render the bounding geometry of each light:
- Point lights: sphere
- Spot lights: cone
- Directional lights: full-screen quad
Stencil optimization: mark pixels inside the light volume to skip shading where the light has no effect.
Forward+ (Tiled Forward)
Combines forward rendering's flexibility with deferred's light culling.
Algorithm
- Depth pre-pass: Render scene depth only
- Light culling (compute shader): Divide the screen into tiles (e.g., 16x16 pixels). For each tile, test all lights against the tile's frustum and min/max depth. Build a per-tile light list.
- Shading pass: Forward-render the scene. Each fragment reads its tile's light list and shades only relevant lights.
tile_index = (pixel.x / TILE_SIZE) + (pixel.y / TILE_SIZE) * num_tiles_x
light_list = tile_light_lists[tile_index]
for each light_index in light_list:
color += shade(fragment, lights[light_index])
Advantages over Deferred
- Supports MSAA natively
- Handles transparency
- No G-Buffer bandwidth cost
- More material flexibility
Clustered Shading
Extends tiling into 3D by subdividing the frustum along the depth axis (log-spaced slices). Each cluster is a small frustum volume. Reduces false positives from depth discontinuities within tiles.
Tile-Based Rendering (Mobile GPUs)
Mobile GPUs (Arm Mali, Apple, Qualcomm Adreno) divide the framebuffer into tiles processed in on-chip memory.
Geometry pass: Bin all triangles into tiles
Per-tile rendering: For each tile, rasterize and shade entirely in on-chip tile memory
Resolve: Write tile memory to main framebuffer
Implications for developers:
- Minimize render target loads/stores (use LOAD_CLEAR and STORE_DONT_CARE)
- Use subpasses for deferred rendering to keep data on-chip
- Avoid reading from the framebuffer mid-pass
Screen-Space Effects
Screen-Space Ambient Occlusion (SSAO)
Approximates ambient occlusion using only depth buffer information.
Algorithm (Crytek-style):
- For each pixel, sample random points in a hemisphere around the surface normal
- Project each sample to screen space, read its depth
- If the sample is closer than the stored depth, it contributes occlusion
occlusion = 0
for each sample_i in kernel:
sample_pos = fragment_pos + TBN * sample_i * radius
projected = project(sample_pos) // to screen space
sample_depth = depth_buffer[projected.xy]
if sample_depth >= projected.z: // sample is occluded
occlusion += range_check(sample_depth, fragment_depth)
ao = 1.0 - occlusion / num_samples
Noise from random kernel rotation is removed with a bilateral blur pass.
GTAO (Ground Truth AO): Integrates the visibility function along horizon angles in screen space. More accurate than random sampling, fewer artifacts.
HBAO+ (Horizon-Based AO): Ray-marches along the depth buffer in multiple directions per pixel. Physically motivated approach based on the horizon angle.
Screen-Space Reflections (SSR)
Ray-march along the reflection vector in screen space using the depth buffer.
Hi-Z tracing: Use a hierarchical depth buffer (min-max mip chain) for faster traversal. Start at a coarse mip level, step large intervals; refine at finer levels when a potential intersection is found.
Limitations: cannot reflect off-screen content, fails at edges, noisy. Fall back to reflection probes or IBL where SSR fails.
Screen-Space Global Illumination (SSGI)
Trace short rays in screen space to approximate one bounce of indirect lighting. Combines SSR-like tracing with diffuse sampling.
Post-Processing
Bloom
Simulates light bleeding from bright areas.
1. Extract bright pixels: bright_pass = max(color - threshold, 0)
2. Downsample progressively (e.g., 1/2, 1/4, 1/8, 1/16 resolution)
3. Apply Gaussian blur at each level
4. Upsample and accumulate all levels
5. Add to the original image
The multi-resolution approach captures bloom at different scales efficiently.
Tone Mapping
Maps HDR radiance values to displayable LDR range [0,1].
Common operators:
Reinhard: L_mapped = L / (1 + L)
Reinhard extended: L_mapped = L * (1 + L/L_white^2) / (1 + L)
ACES filmic: Attempt to match film response curves
Uncharted 2: Custom curve with toe, shoulder, and linear section
AgX: Modern filmic mapping with improved color handling
HDR Rendering Pipeline
Render to HDR buffer (RGBA16F) --> SSAO --> Lighting --> Bloom -->
Auto-exposure --> Tone mapping --> Color grading --> Gamma/sRGB output
Auto-exposure: Compute average luminance (log average or histogram), adjust exposure to match a target. Use temporal smoothing for gradual adaptation.
Depth of Field (DOF)
Simulates camera lens focus. Out-of-focus regions are blurred proportionally to the Circle of Confusion (CoC):
CoC = |aperture * focal_length * (focus_dist - z)| / (z * (focus_dist - focal_length))
Methods:
- Gather-based: For each pixel, sample neighbors weighted by CoC (expensive)
- Scatter-based: Splat each pixel as a bokeh-shaped sprite scaled by CoC
- Separable filter: Approximate with horizontal + vertical passes at half resolution
Motion Blur
Smear pixels along their velocity vectors.
Per-object: Store velocity in a G-Buffer channel (current vs previous frame position).
velocity = (current_screen_pos - previous_screen_pos) / delta_time
blurred = 0
for i in 0..N:
offset = velocity * (i / N - 0.5)
blurred += sample(color_buffer, uv + offset)
blurred /= N
Camera motion blur: Derive velocity from the difference in view-projection matrices between frames.
Volumetric Rendering
Participating Media
Media (fog, smoke, clouds) that absorb, emit, and scatter light. Governed by the radiative transfer equation:
dL/ds = -sigma_t * L + sigma_a * L_e + sigma_s * integral f_p(w, w') L(w') dw'
Where:
- sigma_t = sigma_a + sigma_s (extinction = absorption + scattering)
- f_p = phase function (angular scattering distribution)
Beer-Lambert Law
Transmittance through homogeneous media:
T(d) = exp(-sigma_t * d)
Phase Functions
Henyey-Greenstein: Single parameter g in [-1, 1] controlling forward/backward scattering:
f_HG(cos_theta) = (1 - g^2) / (4*pi * (1 + g^2 - 2g*cos_theta)^(3/2))
g=0 is isotropic, g>0 is forward scattering (typical for fog/clouds).
Ray Marching
Step along a ray through the volume, accumulating color and transmittance:
color = 0, transmittance = 1
for each step along the ray:
density = sample_volume(position)
T_step = exp(-sigma_t * density * step_size)
in_scatter = compute_lighting(position) * sigma_s * density
color += transmittance * in_scatter * step_size
transmittance *= T_step
if transmittance < epsilon: break // early exit
Volumetric Fog (Froxels)
Divide the view frustum into a 3D grid of frustum-aligned voxels (froxels). For each froxel:
- Accumulate density and in-scattering from lights
- March front-to-back, computing accumulated extinction and in-scattering
- Store result in a 3D texture
- Sample during shading to apply fog
This amortizes the cost across the entire view and supports temporal reprojection for stability.
Cloud Rendering
- Model cloud density with layered noise (Perlin-Worley, curl noise for shape)
- Ray-march through the cloud layer with adaptive step sizes
- Multi-scattering approximation: use an octave-based approach where each bounce reduces energy and broadens the phase function
- Beer-powder approximation:
energy = 2 * exp(-d) * (1 - exp(-2d))for the bright edges of clouds