Range Searching

Problem Statement

Spatial Data Structures Overview

Given a set $S$ of $n$ points in $\mathbb{R}^d$ and a query region $Q$ , report/count the points in $S \cap Q$ . The region type defines the problem variant:

Orthogonal range searching: $Q$ is an axis-aligned box $[a_1, b_1] \times [a_2, b_2] \times \cdots \times [a_d, b_d]$
Halfplane/halfspace range searching: $Q$ is a halfplane $\{p : a \cdot p \leq b\}$
Simplex range searching: $Q$ is a simplex (triangle in 2D, tetrahedron in 3D)
Circular/spherical range searching: $Q$ is a disk/ball

Trade-offs are measured by preprocessing time, space, and query time.

Range Trees

1D Range Searching

A balanced BST on the points supports range queries $[a, b]$ :

Search for $a$ and $b$ in the tree; the answer consists of all points in the subtrees between the two search paths
Query time: $O(\log n + k)$ where $k$ is the output size
Space: $O(n)$

2D Range Trees

A two-level tree:

Primary tree: balanced BST on the $x$ -coordinates
Secondary (associated) structure: each internal node $v$ stores a BST on the $y$ -coordinates of all points in $v$ 's subtree

Query for $[x_1, x_2] \times [y_1, y_2]$ :

Search for $x_1$ and $x_2$ in the primary tree, identifying $O(\log n)$ canonical subtrees
In each canonical subtree's associated BST, query for $[y_1, y_2]$

Complexity: $O(\log^2 n + k)$ query time, $O(n \log n)$ space (each point stored in $O(\log n)$ secondary structures).

d-Dimensional Range Trees

Recursively nest BSTs for each dimension:

Query: $O(\log^d n + k)$
Space: $O(n \log^{d-1} n)$
Preprocessing: $O(n \log^{d-1} n)$

Fractional Cascading

Fractional cascading (Chazelle and Guibas, 1986) reduces the $\log$ factor at the last level by threading sorted arrays with cross-pointers.

Idea: instead of binary searching independently in each secondary structure, propagate the search position from one structure to the next via precomputed pointers.

For 2D range trees: reduces query to $O(\log n + k)$ (shaving one $\log$ factor). Space remains $O(n \log n)$ .

For $d$ -dimensional: query becomes $O(\log^{d-1} n + k)$ .

General fractional cascading: given a catalog graph where each node has a sorted list, and lists at adjacent nodes share elements, precompute merge-based pointers to eliminate binary searches at each step. Reduces iterated search from $O(k \log n)$ to $O(k + \log n)$ for $k$ queries in a path of catalogs.

kd-Trees

A kd-tree partitions $\mathbb{R}^d$ by alternating axis-aligned hyperplane splits.

Construction: at each level, split along the next coordinate dimension at the median. Recursively build left and right subtrees.

Space: $O(n)$
Build time: $O(n \log n)$ (with linear-time median selection at each level)
Depth: $O(\log n)$

Orthogonal Range Query

At each node, determine if the query box:

Contains the node's region entirely $\to$ report all points in subtree
Intersects the node's region $\to$ recurse on both children
Is disjoint from the node's region $\to$ prune

Query time: $O(n^{1-1/d} + k)$ for $d$ -dimensional orthogonal range queries. For $d=2$ : $O(\sqrt{n} + k)$ , significantly worse than range trees.

Nearest Neighbor Query

Kd-trees are widely used for nearest neighbor search:

Descend to the leaf containing the query point
Backtrack, pruning branches whose bounding box is farther than the current best

Expected time: $O(\log n)$ for random point sets in low dimensions. Worst case: $O(n^{1-1/d})$ but rarely encountered in practice.

Limitations: kd-trees degrade badly in high dimensions ( $d \gtrsim 20$ ). The fraction of the tree pruned during backtracking vanishes as $d$ grows.

Priority Search Trees

A priority search tree (McCreight, 1985) combines a heap on one coordinate with a BST on the other. Specifically, for points $(x, y)$ :

Heap-ordered on $y$ (minimum $y$ at root)
BST-ordered on $x$ for splitting

Supports: three-sided range queries $[x_1, x_2] \times (-\infty, y_2]$ (unbounded below in one coordinate) in $O(\log n + k)$ time with $O(n)$ space.

This is optimal and more space-efficient than range trees for this specific query type. A general 2D range query $[x_1, x_2] \times [y_1, y_2]$ can be decomposed into two three-sided queries using range tree techniques.

Applications: windowing queries in databases, time-interval queries (map intervals to points via $(start, end)$ ).

Halfplane Range Searching

Query: given a halfplane $h^+$ , report/count points in $S \cap h^+$ .

Partition trees (Matousek, 1992): achieve $O(\sqrt{n} \log^{O(1)} n + k)$ query time with $O(n)$ space (for 2D halfplane). This is near-optimal by lower bounds.

Simplicial partition: partition $S$ into $O(r)$ subsets, each of size $O(n/r)$ , such that any hyperplane crosses at most $O(r^{1-1/d})$ subset bounding simplices. For $d = 2$ : any line crosses $O(\sqrt{r})$ regions.

Duality: halfplane range searching is equivalent (via point-line duality) to finding all lines in a set that pass below a query point.

Simplex Range Searching

Query: given a triangle (in 2D) or simplex (in $\mathbb{R}^d$ ), report/count points inside.

Known bounds for counting (2D):

Space	Query Time
$O(n)$	$O(\sqrt{n})$
$O(n \log n)$	$O(\sqrt{n / \log n})$
$O(n^2)$	$O(\log n)$

For reporting, add $+k$ to query time. These bounds are essentially tight (matching lower bounds in the semigroup model).

Matousek's partition trees: recursively partition the point set using a simplicial partition. At each level, determine which partition cells are entirely inside/outside the query simplex and recurse on the $O(r^{1-1/d})$ crossing cells.

Epsilon-Nets and VC Dimension

VC Dimension

The Vapnik-Chervonenkis dimension of a range space $(X, \mathcal{R})$ is the largest $d$ such that some set of $d$ points is shattered by $\mathcal{R}$ (every subset is realizable as $S \cap R$ for some $R \in \mathcal{R}$ ).

Examples:

Halfplanes in $\mathbb{R}^2$ : VC dimension 3 (can shatter 3 non-collinear points, cannot shatter 4)
Disks in $\mathbb{R}^2$ : VC dimension 3
Axis-aligned rectangles in $\mathbb{R}^2$ : VC dimension 4
Halfspaces in $\mathbb{R}^d$ : VC dimension $d + 1$
Convex sets in $\mathbb{R}^d$ : VC dimension $\infty$ (for $d \geq 2$ )

Epsilon-Nets

An $\epsilon$ -net for a range space $(S, \mathcal{R})$ is a subset $N \subseteq S$ such that every range $R \in \mathcal{R}$ with $|R \cap S| \geq \epsilon |S|$ satisfies $R \cap N \neq \emptyset$ .

$\epsilon$ -net theorem (Haussler and Welzl, 1987): if the VC dimension is $d$ , then a random sample of size $O(\frac{d}{\epsilon} \log \frac{d}{\epsilon})$ is an $\epsilon$ -net with constant probability.

Improved bound (Komarath, Pach, and others): $O(\frac{d}{\epsilon} \log \frac{1}{\epsilon})$ suffices for many geometric range spaces. For halfplanes: $O(\frac{1}{\epsilon})$ -size $\epsilon$ -nets exist (tight).

Applications to Range Searching

Partition construction: $\epsilon$ -nets guide the construction of simplicial partitions. A $(1/r)$ -net of size $O(r)$ defines a partition where each cell is crossed by few ranges
Approximation algorithms: $\epsilon$ -nets provide set cover approximations for geometric covering problems
Geometric discrepancy: the maximum deviation between the fraction of points in a range and the measure of that range

Orthogonal Range Counting Lower Bounds

In the pointer machine model, orthogonal range counting in $\mathbb{R}^d$ requires:

$\text{Query time} = \Omega\left(\frac{\log^{d-1} n}{\log \log n}\right)$

for space $O(n \log^{O(1)} n)$ (Afshani, 2012). This essentially matches the upper bound of $O(\log^{d-1} n)$ from range trees with fractional cascading.

Practical Data Structures

R-trees

B-tree-like hierarchical structure for spatial data:

Each node contains a bounding box of its children
Supports range queries, nearest neighbor, spatial joins
Not worst-case optimal but excellent in practice, especially for disk-based storage
Variants: R*-tree (optimized insertion), R+-tree (no overlap), bulk-loaded R-trees

Interval Trees

For 1D interval stabbing queries (which intervals contain a query point):

$O(n)$ space, $O(\log n + k)$ query
Each node stores intervals spanning the node's split point, sorted by left and right endpoints
Generalization: segment trees for more complex interval operations

Segment Trees

A balanced binary tree on the elementary intervals induced by $n$ interval endpoints:

Each interval is stored in $O(\log n)$ nodes (canonical decomposition)
Supports stabbing queries in $O(\log n + k)$
With fractional cascading, supports 2D windowing queries
Persistent segment trees: support queries on historical versions, used in computational geometry sweep algorithms

Multi-Level Data Structures

Many range searching structures compose via multi-level construction:

Primary structure handles one dimension/constraint
Each node's subtree is augmented with a secondary structure for the next dimension

Examples:

Range tree = BST (level 1) + BST (level 2) + ... (level $d$ )
Priority search tree = heap (level 1) + BST (level 2)
Segment tree + range tree for windowing queries

Each level typically adds a $\log n$ factor to space and query time (mitigated by fractional cascading at the last level).