7 min read
On this page

Range Searching

Problem Statement

Spatial Data Structures Overview

Given a set SS of nn points in Rd\mathbb{R}^d and a query region QQ, report/count the points in SQS \cap Q. The region type defines the problem variant:

  • Orthogonal range searching: QQ is an axis-aligned box [a1,b1]×[a2,b2]××[ad,bd][a_1, b_1] \times [a_2, b_2] \times \cdots \times [a_d, b_d]
  • Halfplane/halfspace range searching: QQ is a halfplane {p:apb}\{p : a \cdot p \leq b\}
  • Simplex range searching: QQ is a simplex (triangle in 2D, tetrahedron in 3D)
  • Circular/spherical range searching: QQ is a disk/ball

Trade-offs are measured by preprocessing time, space, and query time.

Range Trees

1D Range Searching

A balanced BST on the points supports range queries [a,b][a, b]:

  • Search for aa and bb in the tree; the answer consists of all points in the subtrees between the two search paths
  • Query time: O(logn+k)O(\log n + k) where kk is the output size
  • Space: O(n)O(n)

2D Range Trees

A two-level tree:

  • Primary tree: balanced BST on the xx-coordinates
  • Secondary (associated) structure: each internal node vv stores a BST on the yy-coordinates of all points in vv's subtree

Query for [x1,x2]×[y1,y2][x_1, x_2] \times [y_1, y_2]:

  1. Search for x1x_1 and x2x_2 in the primary tree, identifying O(logn)O(\log n) canonical subtrees
  2. In each canonical subtree's associated BST, query for [y1,y2][y_1, y_2]

Complexity: O(log2n+k)O(\log^2 n + k) query time, O(nlogn)O(n \log n) space (each point stored in O(logn)O(\log n) secondary structures).

d-Dimensional Range Trees

Recursively nest BSTs for each dimension:

  • Query: O(logdn+k)O(\log^d n + k)
  • Space: O(nlogd1n)O(n \log^{d-1} n)
  • Preprocessing: O(nlogd1n)O(n \log^{d-1} n)

Fractional Cascading

Fractional cascading (Chazelle and Guibas, 1986) reduces the log\log factor at the last level by threading sorted arrays with cross-pointers.

Idea: instead of binary searching independently in each secondary structure, propagate the search position from one structure to the next via precomputed pointers.

For 2D range trees: reduces query to O(logn+k)O(\log n + k) (shaving one log\log factor). Space remains O(nlogn)O(n \log n).

For dd-dimensional: query becomes O(logd1n+k)O(\log^{d-1} n + k).

General fractional cascading: given a catalog graph where each node has a sorted list, and lists at adjacent nodes share elements, precompute merge-based pointers to eliminate binary searches at each step. Reduces iterated search from O(klogn)O(k \log n) to O(k+logn)O(k + \log n) for kk queries in a path of catalogs.

kd-Trees

A kd-tree partitions Rd\mathbb{R}^d by alternating axis-aligned hyperplane splits.

Construction: at each level, split along the next coordinate dimension at the median. Recursively build left and right subtrees.

  • Space: O(n)O(n)
  • Build time: O(nlogn)O(n \log n) (with linear-time median selection at each level)
  • Depth: O(logn)O(\log n)

Orthogonal Range Query

At each node, determine if the query box:

  1. Contains the node's region entirely \to report all points in subtree
  2. Intersects the node's region \to recurse on both children
  3. Is disjoint from the node's region \to prune

Query time: O(n11/d+k)O(n^{1-1/d} + k) for dd-dimensional orthogonal range queries. For d=2d=2: O(n+k)O(\sqrt{n} + k), significantly worse than range trees.

Nearest Neighbor Query

Kd-trees are widely used for nearest neighbor search:

  1. Descend to the leaf containing the query point
  2. Backtrack, pruning branches whose bounding box is farther than the current best

Expected time: O(logn)O(\log n) for random point sets in low dimensions. Worst case: O(n11/d)O(n^{1-1/d}) but rarely encountered in practice.

Limitations: kd-trees degrade badly in high dimensions (d20d \gtrsim 20). The fraction of the tree pruned during backtracking vanishes as dd grows.

Priority Search Trees

A priority search tree (McCreight, 1985) combines a heap on one coordinate with a BST on the other. Specifically, for points (x,y)(x, y):

  • Heap-ordered on yy (minimum yy at root)
  • BST-ordered on xx for splitting

Supports: three-sided range queries [x1,x2]×(,y2][x_1, x_2] \times (-\infty, y_2] (unbounded below in one coordinate) in O(logn+k)O(\log n + k) time with O(n)O(n) space.

This is optimal and more space-efficient than range trees for this specific query type. A general 2D range query [x1,x2]×[y1,y2][x_1, x_2] \times [y_1, y_2] can be decomposed into two three-sided queries using range tree techniques.

Applications: windowing queries in databases, time-interval queries (map intervals to points via (start,end)(start, end)).

Halfplane Range Searching

Query: given a halfplane h+h^+, report/count points in Sh+S \cap h^+.

Partition trees (Matousek, 1992): achieve O(nlogO(1)n+k)O(\sqrt{n} \log^{O(1)} n + k) query time with O(n)O(n) space (for 2D halfplane). This is near-optimal by lower bounds.

Simplicial partition: partition SS into O(r)O(r) subsets, each of size O(n/r)O(n/r), such that any hyperplane crosses at most O(r11/d)O(r^{1-1/d}) subset bounding simplices. For d=2d = 2: any line crosses O(r)O(\sqrt{r}) regions.

Duality: halfplane range searching is equivalent (via point-line duality) to finding all lines in a set that pass below a query point.

Simplex Range Searching

Query: given a triangle (in 2D) or simplex (in Rd\mathbb{R}^d), report/count points inside.

Known bounds for counting (2D):

| Space | Query Time | |---|---| | O(n)O(n) | O(n)O(\sqrt{n}) | | O(nlogn)O(n \log n) | O(n/logn)O(\sqrt{n / \log n}) | | O(n2)O(n^2) | O(logn)O(\log n) |

For reporting, add +k+k to query time. These bounds are essentially tight (matching lower bounds in the semigroup model).

Matousek's partition trees: recursively partition the point set using a simplicial partition. At each level, determine which partition cells are entirely inside/outside the query simplex and recurse on the O(r11/d)O(r^{1-1/d}) crossing cells.

Epsilon-Nets and VC Dimension

VC Dimension

The Vapnik-Chervonenkis dimension of a range space (X,R)(X, \mathcal{R}) is the largest dd such that some set of dd points is shattered by R\mathcal{R} (every subset is realizable as SRS \cap R for some RRR \in \mathcal{R}).

Examples:

  • Halfplanes in R2\mathbb{R}^2: VC dimension 3 (can shatter 3 non-collinear points, cannot shatter 4)
  • Disks in R2\mathbb{R}^2: VC dimension 3
  • Axis-aligned rectangles in R2\mathbb{R}^2: VC dimension 4
  • Halfspaces in Rd\mathbb{R}^d: VC dimension d+1d + 1
  • Convex sets in Rd\mathbb{R}^d: VC dimension \infty (for d2d \geq 2)

Epsilon-Nets

An ϵ\epsilon-net for a range space (S,R)(S, \mathcal{R}) is a subset NSN \subseteq S such that every range RRR \in \mathcal{R} with RSϵS|R \cap S| \geq \epsilon |S| satisfies RNR \cap N \neq \emptyset.

ϵ\epsilon-net theorem (Haussler and Welzl, 1987): if the VC dimension is dd, then a random sample of size O(dϵlogdϵ)O(\frac{d}{\epsilon} \log \frac{d}{\epsilon}) is an ϵ\epsilon-net with constant probability.

Improved bound (Komarath, Pach, and others): O(dϵlog1ϵ)O(\frac{d}{\epsilon} \log \frac{1}{\epsilon}) suffices for many geometric range spaces. For halfplanes: O(1ϵ)O(\frac{1}{\epsilon})-size ϵ\epsilon-nets exist (tight).

Applications to Range Searching

  • Partition construction: ϵ\epsilon-nets guide the construction of simplicial partitions. A (1/r)(1/r)-net of size O(r)O(r) defines a partition where each cell is crossed by few ranges
  • Approximation algorithms: ϵ\epsilon-nets provide set cover approximations for geometric covering problems
  • Geometric discrepancy: the maximum deviation between the fraction of points in a range and the measure of that range

Orthogonal Range Counting Lower Bounds

In the pointer machine model, orthogonal range counting in Rd\mathbb{R}^d requires:

Query time=Ω(logd1nloglogn)\text{Query time} = \Omega\left(\frac{\log^{d-1} n}{\log \log n}\right)

for space O(nlogO(1)n)O(n \log^{O(1)} n) (Afshani, 2012). This essentially matches the upper bound of O(logd1n)O(\log^{d-1} n) from range trees with fractional cascading.

Practical Data Structures

R-trees

B-tree-like hierarchical structure for spatial data:

  • Each node contains a bounding box of its children
  • Supports range queries, nearest neighbor, spatial joins
  • Not worst-case optimal but excellent in practice, especially for disk-based storage
  • Variants: R*-tree (optimized insertion), R+-tree (no overlap), bulk-loaded R-trees

Interval Trees

For 1D interval stabbing queries (which intervals contain a query point):

  • O(n)O(n) space, O(logn+k)O(\log n + k) query
  • Each node stores intervals spanning the node's split point, sorted by left and right endpoints
  • Generalization: segment trees for more complex interval operations

Segment Trees

A balanced binary tree on the elementary intervals induced by nn interval endpoints:

  • Each interval is stored in O(logn)O(\log n) nodes (canonical decomposition)
  • Supports stabbing queries in O(logn+k)O(\log n + k)
  • With fractional cascading, supports 2D windowing queries
  • Persistent segment trees: support queries on historical versions, used in computational geometry sweep algorithms

Multi-Level Data Structures

Many range searching structures compose via multi-level construction:

  1. Primary structure handles one dimension/constraint
  2. Each node's subtree is augmented with a secondary structure for the next dimension

Examples:

  • Range tree = BST (level 1) + BST (level 2) + ... (level dd)
  • Priority search tree = heap (level 1) + BST (level 2)
  • Segment tree + range tree for windowing queries

Each level typically adds a logn\log n factor to space and query time (mitigated by fractional cascading at the last level).