Range Searching
Problem Statement

Given a set of points in and a query region , report/count the points in . The region type defines the problem variant:
- Orthogonal range searching: is an axis-aligned box
- Halfplane/halfspace range searching: is a halfplane
- Simplex range searching: is a simplex (triangle in 2D, tetrahedron in 3D)
- Circular/spherical range searching: is a disk/ball
Trade-offs are measured by preprocessing time, space, and query time.
Range Trees
1D Range Searching
A balanced BST on the points supports range queries :
- Search for and in the tree; the answer consists of all points in the subtrees between the two search paths
- Query time: where is the output size
- Space:
2D Range Trees
A two-level tree:
- Primary tree: balanced BST on the -coordinates
- Secondary (associated) structure: each internal node stores a BST on the -coordinates of all points in 's subtree
Query for :
- Search for and in the primary tree, identifying canonical subtrees
- In each canonical subtree's associated BST, query for
Complexity: query time, space (each point stored in secondary structures).
d-Dimensional Range Trees
Recursively nest BSTs for each dimension:
- Query:
- Space:
- Preprocessing:
Fractional Cascading
Fractional cascading (Chazelle and Guibas, 1986) reduces the factor at the last level by threading sorted arrays with cross-pointers.
Idea: instead of binary searching independently in each secondary structure, propagate the search position from one structure to the next via precomputed pointers.
For 2D range trees: reduces query to (shaving one factor). Space remains .
For -dimensional: query becomes .
General fractional cascading: given a catalog graph where each node has a sorted list, and lists at adjacent nodes share elements, precompute merge-based pointers to eliminate binary searches at each step. Reduces iterated search from to for queries in a path of catalogs.
kd-Trees
A kd-tree partitions by alternating axis-aligned hyperplane splits.
Construction: at each level, split along the next coordinate dimension at the median. Recursively build left and right subtrees.
- Space:
- Build time: (with linear-time median selection at each level)
- Depth:
Orthogonal Range Query
At each node, determine if the query box:
- Contains the node's region entirely report all points in subtree
- Intersects the node's region recurse on both children
- Is disjoint from the node's region prune
Query time: for -dimensional orthogonal range queries. For : , significantly worse than range trees.
Nearest Neighbor Query
Kd-trees are widely used for nearest neighbor search:
- Descend to the leaf containing the query point
- Backtrack, pruning branches whose bounding box is farther than the current best
Expected time: for random point sets in low dimensions. Worst case: but rarely encountered in practice.
Limitations: kd-trees degrade badly in high dimensions (). The fraction of the tree pruned during backtracking vanishes as grows.
Priority Search Trees
A priority search tree (McCreight, 1985) combines a heap on one coordinate with a BST on the other. Specifically, for points :
- Heap-ordered on (minimum at root)
- BST-ordered on for splitting
Supports: three-sided range queries (unbounded below in one coordinate) in time with space.
This is optimal and more space-efficient than range trees for this specific query type. A general 2D range query can be decomposed into two three-sided queries using range tree techniques.
Applications: windowing queries in databases, time-interval queries (map intervals to points via ).
Halfplane Range Searching
Query: given a halfplane , report/count points in .
Partition trees (Matousek, 1992): achieve query time with space (for 2D halfplane). This is near-optimal by lower bounds.
Simplicial partition: partition into subsets, each of size , such that any hyperplane crosses at most subset bounding simplices. For : any line crosses regions.
Duality: halfplane range searching is equivalent (via point-line duality) to finding all lines in a set that pass below a query point.
Simplex Range Searching
Query: given a triangle (in 2D) or simplex (in ), report/count points inside.
Known bounds for counting (2D):
| Space | Query Time | |---|---| | | | | | | | | |
For reporting, add to query time. These bounds are essentially tight (matching lower bounds in the semigroup model).
Matousek's partition trees: recursively partition the point set using a simplicial partition. At each level, determine which partition cells are entirely inside/outside the query simplex and recurse on the crossing cells.
Epsilon-Nets and VC Dimension
VC Dimension
The Vapnik-Chervonenkis dimension of a range space is the largest such that some set of points is shattered by (every subset is realizable as for some ).
Examples:
- Halfplanes in : VC dimension 3 (can shatter 3 non-collinear points, cannot shatter 4)
- Disks in : VC dimension 3
- Axis-aligned rectangles in : VC dimension 4
- Halfspaces in : VC dimension
- Convex sets in : VC dimension (for )
Epsilon-Nets
An -net for a range space is a subset such that every range with satisfies .
-net theorem (Haussler and Welzl, 1987): if the VC dimension is , then a random sample of size is an -net with constant probability.
Improved bound (Komarath, Pach, and others): suffices for many geometric range spaces. For halfplanes: -size -nets exist (tight).
Applications to Range Searching
- Partition construction: -nets guide the construction of simplicial partitions. A -net of size defines a partition where each cell is crossed by few ranges
- Approximation algorithms: -nets provide set cover approximations for geometric covering problems
- Geometric discrepancy: the maximum deviation between the fraction of points in a range and the measure of that range
Orthogonal Range Counting Lower Bounds
In the pointer machine model, orthogonal range counting in requires:
for space (Afshani, 2012). This essentially matches the upper bound of from range trees with fractional cascading.
Practical Data Structures
R-trees
B-tree-like hierarchical structure for spatial data:
- Each node contains a bounding box of its children
- Supports range queries, nearest neighbor, spatial joins
- Not worst-case optimal but excellent in practice, especially for disk-based storage
- Variants: R*-tree (optimized insertion), R+-tree (no overlap), bulk-loaded R-trees
Interval Trees
For 1D interval stabbing queries (which intervals contain a query point):
- space, query
- Each node stores intervals spanning the node's split point, sorted by left and right endpoints
- Generalization: segment trees for more complex interval operations
Segment Trees
A balanced binary tree on the elementary intervals induced by interval endpoints:
- Each interval is stored in nodes (canonical decomposition)
- Supports stabbing queries in
- With fractional cascading, supports 2D windowing queries
- Persistent segment trees: support queries on historical versions, used in computational geometry sweep algorithms
Multi-Level Data Structures
Many range searching structures compose via multi-level construction:
- Primary structure handles one dimension/constraint
- Each node's subtree is augmented with a secondary structure for the next dimension
Examples:
- Range tree = BST (level 1) + BST (level 2) + ... (level )
- Priority search tree = heap (level 1) + BST (level 2)
- Segment tree + range tree for windowing queries
Each level typically adds a factor to space and query time (mitigated by fractional cascading at the last level).