5 min read
On this page

Advanced Data Structures

This file covers powerful data structures used in competitive programming, databases, graphics, and scientific computing.

Segment Trees

A segment tree answers range queries and supports point/range updates on an array in O(log n) per operation.

Structure

A complete binary tree where:

  • Leaves store array elements.
  • Internal nodes store aggregate values (sum, min, max, GCD, etc.) for their range.
Array: [1, 3, 5, 7, 9, 11]

Segment tree (range sums):
           [36]           (sum of [0..5])
          /     \
       [9]      [27]      (sum of [0..2], [3..5])
      /   \    /    \
    [4]  [5] [16]  [11]   (sum of [0..1], [2..2], [3..4], [5..5])
   / \       /  \
 [1] [3]   [7]  [9]       (leaves)

Operations

Build: O(n). Query(l, r): O(log n) — combine relevant subtree nodes. Update(i, val): O(log n) — update leaf, propagate up.

STRUCTURE SegTree
    n    : integer
    tree : array of integers

FUNCTION BUILD(arr)
    n ← LENGTH(arr)
    tree ← array of 4*n zeros

    PROCEDURE BUILD_REC(v, l, r)
        IF l = r THEN
            tree[v] ← arr[l]
        ELSE
            mid ← (l + r) / 2
            BUILD_REC(2*v, l, mid)
            BUILD_REC(2*v + 1, mid + 1, r)
            tree[v] ← tree[2*v] + tree[2*v + 1]

    BUILD_REC(1, 0, n - 1)
    RETURN new SegTree(n, tree)

FUNCTION QUERY(v, l, r, ql, qr)
    IF ql > r OR qr < l THEN RETURN 0  // identity for sum
    IF ql ≤ l AND r ≤ qr THEN RETURN tree[v]
    mid ← (l + r) / 2
    RETURN QUERY(2*v, l, mid, ql, qr) + QUERY(2*v + 1, mid + 1, r, ql, qr)

PROCEDURE UPDATE(v, l, r, pos, val)
    IF l = r THEN
        tree[v] ← val
    ELSE
        mid ← (l + r) / 2
        IF pos ≤ mid THEN UPDATE(2*v, l, mid, pos, val)
        ELSE UPDATE(2*v + 1, mid + 1, r, pos, val)
        tree[v] ← tree[2*v] + tree[2*v + 1]

Lazy Propagation

For range updates (e.g., add val to all elements in [l, r]):

Store a "lazy" tag at each node. On update, mark the node lazy without propagating to children. On query, push lazy values down before processing.

Enables both range query and range update in O(log n).

Persistent Segment Tree

Create new versions of the tree without modifying previous versions. On update, create new path from root to updated leaf. Share unchanged subtrees.

Space: O(n + q log n) for n initial elements and q updates.

Applications: k-th smallest in a range, version control, functional data structures.

Fenwick Trees (Binary Indexed Trees)

A simpler alternative to segment trees for prefix queries and point updates.

Structure

Array-based. BIT[i] stores the sum of elements in a specific range determined by the lowest set bit of i.

BIT[1] = a[1]
BIT[2] = a[1] + a[2]
BIT[3] = a[3]
BIT[4] = a[1] + a[2] + a[3] + a[4]
BIT[5] = a[5]
BIT[6] = a[5] + a[6]
BIT[7] = a[7]
BIT[8] = a[1] + ... + a[8]

Operations

Prefix sum query(i): O(log n). Walk up by removing the lowest set bit. Point update(i, delta): O(log n). Walk up by adding the lowest set bit. Range query(l, r): query(r) - query(l-1).

STRUCTURE Fenwick
    tree : array of integers
    n    : integer

FUNCTION NEW_FENWICK(n)
    RETURN new Fenwick(array of (n + 1) zeros, n)

PROCEDURE UPDATE(i, delta)
    WHILE i ≤ n DO
        tree[i] ← tree[i] + delta
        i ← i + LOWEST_SET_BIT(i)  // i + (i AND -i)

FUNCTION QUERY(i)
    sum ← 0
    WHILE i > 0 DO
        sum ← sum + tree[i]
        i ← i - LOWEST_SET_BIT(i)  // i - (i AND -i)
    RETURN sum

FUNCTION RANGE_QUERY(l, r)
    RETURN QUERY(r) - QUERY(l - 1)

Advantages over segment tree: Half the memory, faster constant factor, simpler code. Disadvantages: Harder to generalize to non-invertible operations (can't easily do range min/max).

2D Fenwick Tree

Nested Fenwick trees for 2D prefix sums. Update and query in O(log² n).

Interval Trees

Store intervals and efficiently find all intervals overlapping a query point or interval.

Augmented BST Approach

BST keyed by interval start. Each node stores the maximum end in its subtree.

Overlap query(point): O(k + log n) where k = number of overlapping intervals.

Overlap query(interval): Search both subtrees if they could contain overlapping intervals.

Applications: Calendar scheduling (find conflicting events), computational geometry, database temporal queries.

K-D Trees

A space-partitioning tree for k-dimensional points. Alternates splitting dimension at each level.

Construction

  1. Choose a dimension (cycle through dimensions by depth).
  2. Find the median point along that dimension.
  3. Median becomes the root. Left subtree = points below median. Right subtree = points above.
  4. Recurse.

Time: O(n log n).

Nearest Neighbor Query

  1. Recurse to the leaf containing the query point.
  2. Backtrack, checking if the other subtree could contain a closer point (using distance to splitting plane).
  3. Prune subtrees that can't contain closer points.

Average: O(log n). Worst case: O(n^(1-1/d) + k) for k nearest neighbors in d dimensions. Degrades in high dimensions.

Range Query

Find all points in a rectangular region. O(n^(1-1/d) + k).

Applications: Nearest neighbor search, range search, collision detection, ray tracing (photon mapping).

Range Trees

2D (or higher) range searching with O(log² n + k) query time.

Structure: Primary BST on x-coordinates. Each node stores a secondary BST on y-coordinates of all points in its subtree.

Fractional cascading: Technique to reduce query time from O(log² n) to O(log n + k) by threading pointers between secondary structures.

R-Trees

Spatial index for rectangles (bounding boxes) in 2D/3D.

Structure: Balanced tree where each node stores a minimum bounding rectangle (MBR) enclosing all children's MBRs.

Operations: Insert, delete, range query, nearest neighbor. All O(log n) expected.

Variants: R*-tree (optimized insertion), R+-tree (no overlap between siblings), Hilbert R-tree.

Used in: PostGIS (spatial database), spatial indexes in GIS, collision detection in games.

Quad-Trees and Octrees

Quad-Tree

Recursively subdivide 2D space into four quadrants.

Point quad-tree: Each node stores a point and divides space into 4 quadrants. Region quad-tree: Divide a grid into 4 equal sub-regions. Leaf stores a uniform value.

Applications: Image compression, spatial indexing, collision detection, Barnes-Hut algorithm (N-body simulation).

Octree

3D analog of quad-tree (8 children per node).

Applications: 3D spatial indexing, voxel rendering, point cloud processing, physics simulation (spatial hashing).

Wavelet Trees

A succinct data structure supporting rank, select, and access queries on sequences over alphabets.

Structure: Binary tree of bit vectors. Root splits alphabet in half. Left subtree handles lower half, right subtree handles upper half.

Operations: O(log σ) for access, rank, select (σ = alphabet size).

Applications: Compressed full-text indexes (FM-index), document retrieval, quantile queries, range frequency queries.

Comparison

| Structure | Build | Query | Update | Space | Best For | |---|---|---|---|---|---| | Segment Tree | O(n) | O(log n) | O(log n) | O(n) | Range queries + updates | | Fenwick Tree | O(n) | O(log n) | O(log n) | O(n) | Prefix sums | | Interval Tree | O(n log n) | O(log n + k) | O(log n) | O(n) | Interval overlap | | K-D Tree | O(n log n) | O(√n + k) | O(log n) | O(n) | Spatial nearest neighbor | | R-Tree | O(n log n) | O(log n + k) | O(log n) | O(n) | Spatial rectangles | | Quad/Octree | O(n log n) | O(log n) | O(log n) | O(n) | Spatial subdivision |

Applications in CS

  • Competitive programming: Segment trees and Fenwick trees are among the most used data structures in contests.
  • Databases: R-trees for spatial queries (PostGIS). Interval trees for temporal queries. Segment trees for range aggregations.
  • Computer graphics: Octrees for spatial partitioning, K-D trees for photon mapping, BVH (related to R-trees) for ray tracing.
  • Game development: Quad/octrees for collision detection, spatial partitioning, level-of-detail.
  • GIS: R-trees for geographic data indexing. Quad-trees for map tiling.
  • Computational geometry: Range trees for orthogonal range searching. Interval trees for sweep line algorithms.
  • Text indexing: Wavelet trees for compressed indexes, document retrieval.