Advanced Data Structures
This file covers powerful data structures used in competitive programming, databases, graphics, and scientific computing.
Segment Trees
A segment tree answers range queries and supports point/range updates on an array in O(log n) per operation.
Structure
A complete binary tree where:
- Leaves store array elements.
- Internal nodes store aggregate values (sum, min, max, GCD, etc.) for their range.
Array: [1, 3, 5, 7, 9, 11]
Segment tree (range sums):
[36] (sum of [0..5])
/ \
[9] [27] (sum of [0..2], [3..5])
/ \ / \
[4] [5] [16] [11] (sum of [0..1], [2..2], [3..4], [5..5])
/ \ / \
[1] [3] [7] [9] (leaves)
Operations
Build: O(n). Query(l, r): O(log n) — combine relevant subtree nodes. Update(i, val): O(log n) — update leaf, propagate up.
STRUCTURE SegTree
n : integer
tree : array of integers
FUNCTION BUILD(arr)
n ← LENGTH(arr)
tree ← array of 4*n zeros
PROCEDURE BUILD_REC(v, l, r)
IF l = r THEN
tree[v] ← arr[l]
ELSE
mid ← (l + r) / 2
BUILD_REC(2*v, l, mid)
BUILD_REC(2*v + 1, mid + 1, r)
tree[v] ← tree[2*v] + tree[2*v + 1]
BUILD_REC(1, 0, n - 1)
RETURN new SegTree(n, tree)
FUNCTION QUERY(v, l, r, ql, qr)
IF ql > r OR qr < l THEN RETURN 0 // identity for sum
IF ql ≤ l AND r ≤ qr THEN RETURN tree[v]
mid ← (l + r) / 2
RETURN QUERY(2*v, l, mid, ql, qr) + QUERY(2*v + 1, mid + 1, r, ql, qr)
PROCEDURE UPDATE(v, l, r, pos, val)
IF l = r THEN
tree[v] ← val
ELSE
mid ← (l + r) / 2
IF pos ≤ mid THEN UPDATE(2*v, l, mid, pos, val)
ELSE UPDATE(2*v + 1, mid + 1, r, pos, val)
tree[v] ← tree[2*v] + tree[2*v + 1]
Lazy Propagation
For range updates (e.g., add val to all elements in [l, r]):
Store a "lazy" tag at each node. On update, mark the node lazy without propagating to children. On query, push lazy values down before processing.
Enables both range query and range update in O(log n).
Persistent Segment Tree
Create new versions of the tree without modifying previous versions. On update, create new path from root to updated leaf. Share unchanged subtrees.
Space: O(n + q log n) for n initial elements and q updates.
Applications: k-th smallest in a range, version control, functional data structures.
Fenwick Trees (Binary Indexed Trees)
A simpler alternative to segment trees for prefix queries and point updates.
Structure
Array-based. BIT[i] stores the sum of elements in a specific range determined by the lowest set bit of i.
BIT[1] = a[1]
BIT[2] = a[1] + a[2]
BIT[3] = a[3]
BIT[4] = a[1] + a[2] + a[3] + a[4]
BIT[5] = a[5]
BIT[6] = a[5] + a[6]
BIT[7] = a[7]
BIT[8] = a[1] + ... + a[8]
Operations
Prefix sum query(i): O(log n). Walk up by removing the lowest set bit. Point update(i, delta): O(log n). Walk up by adding the lowest set bit. Range query(l, r): query(r) - query(l-1).
STRUCTURE Fenwick
tree : array of integers
n : integer
FUNCTION NEW_FENWICK(n)
RETURN new Fenwick(array of (n + 1) zeros, n)
PROCEDURE UPDATE(i, delta)
WHILE i ≤ n DO
tree[i] ← tree[i] + delta
i ← i + LOWEST_SET_BIT(i) // i + (i AND -i)
FUNCTION QUERY(i)
sum ← 0
WHILE i > 0 DO
sum ← sum + tree[i]
i ← i - LOWEST_SET_BIT(i) // i - (i AND -i)
RETURN sum
FUNCTION RANGE_QUERY(l, r)
RETURN QUERY(r) - QUERY(l - 1)
Advantages over segment tree: Half the memory, faster constant factor, simpler code. Disadvantages: Harder to generalize to non-invertible operations (can't easily do range min/max).
2D Fenwick Tree
Nested Fenwick trees for 2D prefix sums. Update and query in O(log² n).
Interval Trees
Store intervals and efficiently find all intervals overlapping a query point or interval.
Augmented BST Approach
BST keyed by interval start. Each node stores the maximum end in its subtree.
Overlap query(point): O(k + log n) where k = number of overlapping intervals.
Overlap query(interval): Search both subtrees if they could contain overlapping intervals.
Applications: Calendar scheduling (find conflicting events), computational geometry, database temporal queries.
K-D Trees
A space-partitioning tree for k-dimensional points. Alternates splitting dimension at each level.
Construction
- Choose a dimension (cycle through dimensions by depth).
- Find the median point along that dimension.
- Median becomes the root. Left subtree = points below median. Right subtree = points above.
- Recurse.
Time: O(n log n).
Nearest Neighbor Query
- Recurse to the leaf containing the query point.
- Backtrack, checking if the other subtree could contain a closer point (using distance to splitting plane).
- Prune subtrees that can't contain closer points.
Average: O(log n). Worst case: O(n^(1-1/d) + k) for k nearest neighbors in d dimensions. Degrades in high dimensions.
Range Query
Find all points in a rectangular region. O(n^(1-1/d) + k).
Applications: Nearest neighbor search, range search, collision detection, ray tracing (photon mapping).
Range Trees
2D (or higher) range searching with O(log² n + k) query time.
Structure: Primary BST on x-coordinates. Each node stores a secondary BST on y-coordinates of all points in its subtree.
Fractional cascading: Technique to reduce query time from O(log² n) to O(log n + k) by threading pointers between secondary structures.
R-Trees
Spatial index for rectangles (bounding boxes) in 2D/3D.
Structure: Balanced tree where each node stores a minimum bounding rectangle (MBR) enclosing all children's MBRs.
Operations: Insert, delete, range query, nearest neighbor. All O(log n) expected.
Variants: R*-tree (optimized insertion), R+-tree (no overlap between siblings), Hilbert R-tree.
Used in: PostGIS (spatial database), spatial indexes in GIS, collision detection in games.
Quad-Trees and Octrees
Quad-Tree
Recursively subdivide 2D space into four quadrants.
Point quad-tree: Each node stores a point and divides space into 4 quadrants. Region quad-tree: Divide a grid into 4 equal sub-regions. Leaf stores a uniform value.
Applications: Image compression, spatial indexing, collision detection, Barnes-Hut algorithm (N-body simulation).
Octree
3D analog of quad-tree (8 children per node).
Applications: 3D spatial indexing, voxel rendering, point cloud processing, physics simulation (spatial hashing).
Wavelet Trees
A succinct data structure supporting rank, select, and access queries on sequences over alphabets.
Structure: Binary tree of bit vectors. Root splits alphabet in half. Left subtree handles lower half, right subtree handles upper half.
Operations: O(log σ) for access, rank, select (σ = alphabet size).
Applications: Compressed full-text indexes (FM-index), document retrieval, quantile queries, range frequency queries.
Comparison
| Structure | Build | Query | Update | Space | Best For | |---|---|---|---|---|---| | Segment Tree | O(n) | O(log n) | O(log n) | O(n) | Range queries + updates | | Fenwick Tree | O(n) | O(log n) | O(log n) | O(n) | Prefix sums | | Interval Tree | O(n log n) | O(log n + k) | O(log n) | O(n) | Interval overlap | | K-D Tree | O(n log n) | O(√n + k) | O(log n) | O(n) | Spatial nearest neighbor | | R-Tree | O(n log n) | O(log n + k) | O(log n) | O(n) | Spatial rectangles | | Quad/Octree | O(n log n) | O(log n) | O(log n) | O(n) | Spatial subdivision |
Applications in CS
- Competitive programming: Segment trees and Fenwick trees are among the most used data structures in contests.
- Databases: R-trees for spatial queries (PostGIS). Interval trees for temporal queries. Segment trees for range aggregations.
- Computer graphics: Octrees for spatial partitioning, K-D trees for photon mapping, BVH (related to R-trees) for ray tracing.
- Game development: Quad/octrees for collision detection, spatial partitioning, level-of-detail.
- GIS: R-trees for geographic data indexing. Quad-trees for map tiling.
- Computational geometry: Range trees for orthogonal range searching. Interval trees for sweep line algorithms.
- Text indexing: Wavelet trees for compressed indexes, document retrieval.