4 min read
On this page

Searching

Searching algorithms find elements in collections. The right algorithm depends on whether the data is sorted, the access pattern, and the data structure.

Scan every element sequentially.

FUNCTION LINEAR_SEARCH(arr, target)
    FOR i ← 0 TO length(arr) - 1
        IF arr[i] = target
            RETURN i
    RETURN NOT_FOUND

Time: O(n). Space: O(1). Works on unsorted data.

When to use: Small arrays (n < ~32), unsorted data, or data that can't be sorted/indexed.

Sentinel optimization: Place the target at the end of the array. Eliminates the bounds check in the loop. Minor speedup.

For sorted arrays. Repeatedly halve the search space.

FUNCTION BINARY_SEARCH(arr, target)
    lo ← 0, hi ← length(arr)
    WHILE lo < hi
        mid ← lo + (hi - lo) / 2   // avoid overflow vs (lo + hi) / 2
        IF arr[mid] = target
            RETURN mid
        ELSE IF arr[mid] < target
            lo ← mid + 1
        ELSE
            hi ← mid
    RETURN NOT_FOUND

Time: O(log n). Space: O(1).

Variants

Lower Bound

Find the first position where arr[i] ≥ target (insertion point for target).

FUNCTION LOWER_BOUND(arr, target)
    lo ← 0, hi ← length(arr)
    WHILE lo < hi
        mid ← lo + (hi - lo) / 2
        IF arr[mid] < target
            lo ← mid + 1
        ELSE
            hi ← mid
    RETURN lo

Upper Bound

Find the first position where arr[i] > target.

FUNCTION UPPER_BOUND(arr, target)
    lo ← 0, hi ← length(arr)
    WHILE lo < hi
        mid ← lo + (hi - lo) / 2
        IF arr[mid] ≤ target
            lo ← mid + 1
        ELSE
            hi ← mid
    RETURN lo

First/Last Occurrence

  • First occurrence: Use lower_bound, then check if arr[result] == target.
  • Last occurrence: Use upper_bound - 1, then check.
  • Count occurrences: upper_bound - lower_bound.

Binary Search on Answer

Apply binary search to the answer space rather than an array. Works when the predicate is monotonic (false for small values, true for large values — or vice versa).

Example: Find the smallest x such that f(x) ≥ target, where f is monotonically increasing.

FUNCTION BINARY_SEARCH_ON_ANSWER(lo, hi, PREDICATE)
    WHILE lo < hi
        mid ← lo + (hi - lo) / 2
        IF PREDICATE(mid) = true
            hi ← mid
        ELSE
            lo ← mid + 1
    RETURN lo

Applications: Minimize maximum (minimax problems), capacity/threshold problems, square root computation, koko eating bananas, etc.

Common Pitfalls

  1. Integer overflow: Use lo + (hi - lo) / 2 instead of (lo + hi) / 2.
  2. Off-by-one errors: Use half-open intervals [lo, hi) consistently. Carefully choose lo = mid + 1 vs lo = mid.
  3. Infinite loops: Ensure search space shrinks each iteration. When lo + 1 == hi with inclusive bounds, mid == lo → if lo = mid, no progress.

Estimate the position based on the value (like looking up a name in a phone book — start near the right letter).

pos = lo + ((target - arr[lo]) × (hi - lo)) / (arr[hi] - arr[lo])

Time: O(log log n) for uniformly distributed data. O(n) worst case (non-uniform).

When to use: Uniformly distributed sorted data. Otherwise, binary search is safer.

Find the range containing the target, then binary search within it.

FUNCTION EXPONENTIAL_SEARCH(arr, target)
    n ← length(arr)
    IF n = 0 THEN RETURN NOT_FOUND
    IF arr[0] = target THEN RETURN 0

    bound ← 1
    WHILE bound < n AND arr[bound] < target
        bound ← bound * 2
    // Binary search in [bound/2, min(bound, n-1)]
    lo ← bound / 2
    hi ← MIN(bound, n - 1)
    RETURN BINARY_SEARCH(arr[lo..hi], target)

Time: O(log i) where i is the position of the target. Better than binary search when the target is near the beginning.

Use case: Unbounded or semi-infinite sorted lists. When the target is likely near the start.

For unimodal functions (one peak or valley). Finds the maximum (or minimum) of f(x) on [lo, hi].

FUNCTION TERNARY_SEARCH_MAX(lo, hi, f)
    REPEAT 200 TIMES   // sufficient iterations for precision
        m1 ← lo + (hi - lo) / 3
        m2 ← hi - (hi - lo) / 3
        IF f(m1) < f(m2)
            lo ← m1
        ELSE
            hi ← m2
    RETURN (lo + hi) / 2

Time: O(log n) (reduces range by 1/3 each iteration). Converges slower than binary search on monotonic functions, but works on unimodal functions.

Alternative: For differentiable functions, binary search on the derivative (find where f'(x) = 0) is usually better.

Search in Rotated Sorted Array

A sorted array rotated at an unknown pivot: [4, 5, 6, 7, 0, 1, 2].

Modified binary search: At each step, determine which half is sorted and whether the target is in that half.

FUNCTION SEARCH_ROTATED(arr, target)
    lo ← 0, hi ← length(arr) - 1
    WHILE lo ≤ hi
        mid ← lo + (hi - lo) / 2
        IF arr[mid] = target THEN RETURN mid

        IF arr[lo] ≤ arr[mid]   // left half is sorted
            IF arr[lo] ≤ target AND target < arr[mid]
                hi ← mid - 1
            ELSE
                lo ← mid + 1
        ELSE                     // right half is sorted
            IF arr[mid] < target AND target ≤ arr[hi]
                lo ← mid + 1
            ELSE
                hi ← mid - 1
    RETURN NOT_FOUND

Time: O(log n).

Two Pointers Technique

Use two pointers (indices) to scan the array, typically from both ends or both from the start.

Two Sum (Sorted Array)

Find two elements summing to target:

FUNCTION TWO_SUM_SORTED(arr, target)
    lo ← 0, hi ← length(arr) - 1
    WHILE lo < hi
        sum ← arr[lo] + arr[hi]
        IF sum = target THEN RETURN (lo, hi)
        IF sum < target
            lo ← lo + 1
        ELSE
            hi ← hi - 1
    RETURN NOT_FOUND

Time: O(n). Much better than brute-force O(n²).

Remove Duplicates

Maintain a "write" pointer and a "read" pointer. Write pointer advances only for unique elements.

Container With Most Water

Two pointers from both ends. Move the shorter side inward (since moving the taller side can't increase area).

Sliding Window

Maintain a window [left, right] over the array. Expand right to include more, shrink left to satisfy constraints.

Maximum Sum Subarray of Size k

FUNCTION MAX_SUM_K(arr, k)
    window_sum ← SUM(arr[0..k])
    max_sum ← window_sum
    FOR i ← k TO length(arr) - 1
        window_sum ← window_sum + arr[i] - arr[i - k]
        max_sum ← MAX(max_sum, window_sum)
    RETURN max_sum

Variable-Length Sliding Window

Expand right until condition violated, then shrink left until condition restored.

Example: Longest substring without repeating characters. Minimum window substring.

Time: O(n) — each element is added and removed at most once.

Applications in CS

  • Database indexing: B-tree search is a disk-optimized binary search. Range scans use lower/upper bound.
  • System calls: bsearch in C standard library. Binary search in sorted arrays for configuration lookup.
  • Debugging: git bisect is binary search on commit history to find the commit that introduced a bug.
  • Networking: Binary search for MTU path discovery. IP routing (longest prefix match).
  • Algorithms: Binary search on answer is a fundamental technique for optimization problems (minimize/maximize subject to constraint).
  • Machine learning: Hyperparameter tuning (though Bayesian optimization is usually better). Feature threshold selection in decision trees.
  • Compression: Huffman/arithmetic coding uses search in cumulative frequency tables.