6 min read
On this page

Sublinear Algorithms

Overview

Sublinear algorithms solve problems using resources (time, space, or communication) significantly less than the input size. They cannot read the entire input, so they rely on sampling, random access queries, and structural properties to provide approximate answers or distinguish between cases.

Property Testing

Given query access to an object, determine whether it satisfies a property P or is epsilon-far from satisfying P (i.e., more than epsilon fraction of the representation must change to satisfy P).

Framework

  • Input: object x accessible via queries
  • Property: P (set of valid objects)
  • Distance: fraction of representation that must change
  • epsilon-tester: accepts if x in P, rejects with probability >= 2/3 if x is epsilon-far from P, may do either if x is close but not in P

Query complexity: number of queries as a function of epsilon (and sometimes n).

Graph Property Testing

Dense Graph Model

  • Graph on n vertices, query: "is (i,j) an edge?" (adjacency matrix)
  • Distance: fraction of n^2 entries that must change

Testing Bipartiteness

  • Algorithm: sample O(1/epsilon) random vertices, check induced subgraph for odd cycles via BFS
  • Query complexity: O(1/epsilon^3) — independent of n
  • Reject if any odd cycle found in sampled subgraph

Testing Triangle-Freeness

  • Sample O(1/epsilon) triples, query all three edges
  • For dense graphs: O(1/epsilon^6) queries suffice
  • Fundamental result: every testable property in dense graphs has a tester with query complexity depending only on epsilon

Regularity Lemma Connection

  • Szemeredi's regularity lemma underlies many dense graph testers
  • Every dense graph can be approximated by a bounded partition into pseudorandom bipartite graphs
  • Property testing for any first-order property: constant queries (depends on epsilon)

Sparse/Bounded-Degree Graph Model

  • Graph with max degree d, queries: i-th neighbor of vertex v, degree of v
  • Distance: fraction of dn edges that must change

Testing Bipartiteness (Sparse)

  • Random walks from random vertices detect odd cycles
  • Query complexity: O(sqrt(n) * poly(1/epsilon))
  • Depends on n — unavoidable in sparse model

Testing Connectivity

  • O(1/epsilon) random BFS explorations
  • Check if large connected component exists
  • Constant query complexity for bounded-degree graphs

Testing Expansion

  • Test if graph is an expander or far from it
  • Random walks: mixing time reveals expansion
  • O(sqrt(n) * poly(log n, 1/epsilon)) queries

Testing Planarity

  • Can test if a bounded-degree graph is planar or epsilon-far from planar
  • Query complexity: O(sqrt(n) / epsilon^O(1))
  • Uses partition oracles and local exploration

Algebraic Property Testing

Testing Linearity (Blum-Luby-Rubinfeld)

Test if f: F_2^n -> F_2 is linear (f(x+y) = f(x) + f(y) for all x, y).

Repeat O(1/epsilon) times:
    Pick random x, y from F_2^n
    Query f(x), f(y), f(x + y)
    Check if f(x) + f(y) = f(x + y)
Accept if all checks pass
  • 3 queries per test, O(1/epsilon) total
  • Self-correction: if f is close to linear, can compute the nearest linear function at any point with O(1) additional queries
  • Foundation of PCP theorem and algebraic property testing

Testing Low-Degree Polynomials

  • Test if f: F^n -> F is a polynomial of degree <= d
  • Query complexity: O(1/epsilon) for fixed d over large fields
  • Uses self-correction along random lines/planes

Testing Reed-Solomon Codes

  • Given word, test proximity to a Reed-Solomon codeword
  • Poly(1/epsilon) queries
  • Central to PCP constructions and error-correcting codes

Sublinear-Time Algorithms

Estimating Average Degree

  • Sample O(sqrt(n) / epsilon) random vertices, query their degrees
  • Estimate average degree within (1 +/- epsilon) factor
  • Cannot do better than Omega(sqrt(n)) queries in general

Estimating the Number of Connected Components

  • Sample random vertices, explore local neighborhoods
  • Estimate the number of components within additive epsilon * n error
  • Query complexity: O(d / epsilon^2) for bounded-degree d graphs

Minimum Spanning Tree Weight Estimation

  • Estimate MST weight within (1+epsilon) factor without computing MST
  • O(d * n / epsilon^O(1)) time for graphs with max degree d
  • Based on counting connected components at different edge weight thresholds

Sublinear-Time Approximation of Maximum Matching

  • Estimate matching size within (1+epsilon) factor
  • O(n * poly(1/epsilon)) time for bounded-degree graphs
  • Uses local augmenting path exploration

Communication Complexity

Studies the minimum communication needed between parties to compute a function.

Two-Party Model

  • Alice holds x in {0,1}^n, Bob holds y in {0,1}^n
  • They exchange messages to compute f(x, y)
  • Communication complexity: minimum number of bits exchanged (worst case)

Key Problems and Bounds

| Problem | Deterministic | Randomized | |----------------------|---------------|----------------| | Equality | n + 1 | O(log n) | | Disjointness | n | Omega(n) | | Inner Product | n | Omega(n) | | Greater Than | O(log n) | O(log log n) | | Set Intersection Size | Theta(n) | Omega(n) |

Techniques for Lower Bounds

  • Fooling sets: find large fooling set implies high complexity
  • Rank method: D(f) >= log2(rank(M_f)) where M_f is communication matrix
  • Discrepancy method: for randomized lower bounds
  • Information complexity: mutual information between inputs and transcript

Applications to Streaming/Algorithms

  • Disjointness lower bound implies Omega(n) space for exact frequency moments
  • Index problem lower bound implies one-pass streaming lower bounds
  • Multi-party communication yields multi-pass streaming bounds

Decision Tree Complexity

Minimum number of input bits queried to determine f(x) in the worst case.

Measures

  • D(f): deterministic query complexity
  • R(f): randomized query complexity (bounded error)
  • Q(f): quantum query complexity
  • C(f): certificate complexity
  • bs(f): block sensitivity
  • s(f): sensitivity

Key Relationships

  • D(f) = O(R(f)^3) — not known to be tight in general
  • R(f) = O(Q(f)^6) (later improved)
  • Sensitivity conjecture (proved by Huang 2019): s(f)^4 >= bs(f)
    • Implies s(f) = poly(D(f), R(f), Q(f), C(f), bs(f))
    • All Boolean function complexity measures are polynomially related

Sensitivity Theorem (Huang 2019)

For any Boolean function f: {0,1}^n -> {0,1}:

  • s(f) >= sqrt(bs(f))
  • Proved via elegant spectral argument on the hypercube graph
  • Resolved a 30-year open problem in complexity theory

Information Complexity

Measures the minimum amount of information that must be revealed about inputs to compute a function.

Definition

IC(f, epsilon): minimum mutual information I(X; Pi | Y) + I(Y; Pi | X) over all protocols Pi computing f with error epsilon, for worst-case distribution.

Properties

  • IC(f, epsilon) <= CC(f, epsilon) (information <= communication)
  • Direct sum: IC(f^n) = n * IC(f) for independent copies
  • Used to prove tight communication bounds (e.g., disjointness)

Interactive Information Complexity

  • Extends to multi-round protocols
  • Information cost of each message depends on prior messages
  • Compression theorems: communication can be compressed to information cost

Distribution Testing

Given sample access to a distribution, determine its properties.

Identity Testing

  • Is distribution D equal to known distribution D*, or epsilon-far in total variation?
  • Optimal: O(sqrt(n) / epsilon^2) samples where n is support size

Closeness Testing

  • Are two unknown distributions D1, D2 equal or epsilon-far?
  • Optimal: O(n^(2/3) / epsilon^(4/3)) samples

Uniformity Testing

  • Special case of identity testing with D* = uniform
  • O(sqrt(n) / epsilon^2) samples

Monotonicity Testing of Distributions

  • Is distribution over [n] monotone non-increasing?
  • O(sqrt(n) / epsilon^2) samples

Local Computation Algorithms (LCA)

Compute individual parts of a global solution using sublinear time and space, while maintaining global consistency.

Examples

  • Maximal Independent Set LCA: query whether vertex v is in the MIS in O(poly(d * log n)) time
  • Graph Coloring LCA: query color of vertex v in sublinear time
  • Consistency: repeated queries always give the same answer; the union of all answers forms a valid global solution

Summary

Sublinear algorithms demonstrate that many problems can be solved approximately or decided with resources far less than the input size. Property testing provides the theoretical foundation for approximate decision-making. Communication complexity serves as both an independent area and a tool for proving lower bounds in streaming and other models. These techniques are essential for handling modern massive datasets.