Non-Comparison and Hybrid Sorting

Non-comparison sorts break the Ω(n log n) barrier by exploiting structure in the keys (integers, strings). Hybrid sorts combine the best properties of multiple algorithms.

Counting Sort

For integer keys in a known range [0, k).

Algorithm

Count occurrences of each value: count[v] = number of elements equal to v.
Compute prefix sums: count[v] = number of elements ≤ v.
Place each element in its correct position using the prefix sums (scan right-to-left for stability).

FUNCTION COUNTING_SORT(arr, k)
    count ← array of size k, initialized to 0
    FOR EACH x IN arr
        count[x] ← count[x] + 1

    // Prefix sums
    FOR i ← 1 TO k - 1
        count[i] ← count[i] + count[i - 1]

    // Build output (right-to-left for stability)
    output ← array of size length(arr)
    FOR EACH x IN arr (reversed)
        count[x] ← count[x] - 1
        output[count[x]] ← x
    RETURN output

Time: O(n + k). Space: O(n + k). Stable. Not in-place.

When to use: k = O(n) (range is proportional to input size). If k ≫ n, use comparison sort instead.

Radix Sort

Sort integers digit by digit, from least significant to most significant (LSD) or vice versa (MSD).

LSD Radix Sort

Sort by each digit using a stable sort (typically counting sort) as a subroutine.

PROCEDURE RADIX_SORT_LSD(arr)
    max_val ← MAX(arr)
    exp ← 1
    output ← array of size length(arr)

    WHILE max_val / exp > 0
        count ← array of size 10, initialized to 0
        FOR EACH x IN arr
            count[(x / exp) MOD 10] ← count[(x / exp) MOD 10] + 1
        FOR i ← 1 TO 9
            count[i] ← count[i] + count[i - 1]
        FOR EACH x IN arr (reversed)
            digit ← (x / exp) MOD 10
            count[digit] ← count[digit] - 1
            output[count[digit]] ← x
        COPY output INTO arr
        exp ← exp * 10

Time: O(d × (n + b)) where d = number of digits, b = base (radix).

For w-bit integers with base b = 2^r: d = w/r passes, each pass O(n + 2^r). Choose r ≈ log₂ n to minimize: O(wn/log n).

Best radix: For 32-bit integers and n ≈ 10⁶, use base 256 (r=8): 4 passes of counting sort with 256 buckets. Very fast in practice.

MSD Radix Sort

Sort by the most significant digit first, then recursively sort each bucket.

Advantage: Can stop early for strings of different lengths. Natural for string sorting. Disadvantage: Not stable (within buckets, original order isn't preserved without extra work). Requires recursion or explicit stack.

For strings: MSD radix sort is the basis of most efficient string sorting algorithms. Process character by character from the first.

Radix Sort Properties

Stable (LSD). Not in-place (O(n + b) extra). Linear time for fixed-width keys.

When to use: Large arrays of fixed-width integers. Particularly effective for 32-bit or 64-bit keys where d is small.

Bucket Sort

Distribute elements into buckets, sort each bucket, concatenate.

Algorithm

Create n buckets for the range [0, 1) (or map keys to this range).
Place each element in its bucket: bucket[⌊n × element⌋].
Sort each bucket (insertion sort for small buckets).
Concatenate all buckets.

Time: O(n) expected when elements are uniformly distributed (each bucket has O(1) elements on average). Worst case O(n²) if all elements go to one bucket.

When to use: Uniform or near-uniform distribution over a known range. Floating-point numbers.

Pigeonhole Sort

A variant of counting sort where we directly place elements into their "pigeonhole" position. For integer keys with range = O(n).

Time: O(n + range). Space: O(range).

External Sorting

Sorting data that doesn't fit in memory (stored on disk).

k-Way Merge Sort

Run formation: Read chunks that fit in memory, sort them (using quicksort/Timsort), write sorted runs to disk.
Merge: Merge k sorted runs using a min-heap of size k. Read from each run's buffer, write to output buffer.

I/O complexity: O((N/B) × log_{M/B}(N/B)) disk accesses, where N = data size, M = memory size, B = block size.

Practical optimizations:

Large buffers for sequential I/O (minimize seeks on HDD, less critical for SSD).
Replacement selection for run formation (generates runs ~2× memory size on average).
Polyphase merge for minimizing passes with limited temp files.

Partial Sorting

Quickselect

Find the k-th smallest element (or partition around the k-th element) in expected O(n) time.

FUNCTION QUICKSELECT(arr, k)
    n ← length(arr)
    IF n = 1 THEN RETURN arr[0]

    pivot_idx ← PARTITION(arr)

    IF k = pivot_idx
        RETURN arr[k]
    ELSE IF k < pivot_idx
        RETURN QUICKSELECT(arr[0..pivot_idx], k)
    ELSE
        RETURN QUICKSELECT(arr[pivot_idx + 1..n], k - pivot_idx - 1)

Expected time: O(n). Worst case: O(n²) (same pathological pivots as quicksort).

Introselect

Quickselect with fallback to median of medians (guaranteed O(n) worst case) when recursion depth exceeds a threshold. Used in C++ std::nth_element.

Median of Medians

Guaranteed O(n) selection by choosing a "good enough" pivot:

Divide into groups of 5.
Find the median of each group (O(1) per group, O(n/5) total).
Recursively find the median of these medians.
Use this as the pivot for partitioning.

The pivot guarantees at least 30% of elements on each side → T(n) = T(n/5) + T(7n/10) + O(n) = O(n).

In practice: The constant factor is large (~5× quickselect). Introselect is preferred.

Hybrid Sorts

Timsort

Python's default sort (since 2002). Also used in Java (for objects), Android, Swift.

Key ideas:

Identify natural runs (already-sorted subsequences, both ascending and descending).
Extend short runs using insertion sort (to a minimum run length, typically 32-64).
Merge runs using a merge policy that balances run sizes (maintaining invariants similar to Fibonacci numbers on a stack of pending runs).
Galloping mode: When merging, if one run is "winning" consistently, switch to exponential search (gallop) for the merge point. Reduces comparisons for partially sorted data.

Time: O(n) best case (already sorted). O(n log n) worst case. Adaptive — takes advantage of existing order.

Space: O(n) for merge buffer.

Introsort

C++ std::sort (typically). Combines quicksort + heapsort + insertion sort.

Start with quicksort (fast in practice).
If recursion depth exceeds 2 log₂ n, switch to heapsort (guaranteed O(n log n)).
When subarray size ≤ 16, switch to insertion sort (fast for small arrays).

Time: O(n log n) guaranteed (heapsort backstop). Fast in practice (quicksort front-end).

pdqsort (Pattern-Defeating Quicksort)

A modern improvement on introsort by Orson Peters. Used in Rust's sort_unstable.

Improvements over introsort:

Block partitioning: Reduces branch mispredictions by processing elements in blocks.
Pattern detection: Detects already-sorted, reverse-sorted, and saw patterns → falls back to insertion sort or reversal.
Better pivot selection: Ninther (median of medians of three).
Adaptive: Near-O(n) for many practical input patterns.

Block Merge Sort (WikiSort/GrailSort)

Stable O(n log n) sorting with O(1) extra space. Achieves what was thought impossible — stable, in-place, O(n log n).

Idea: Use block-level operations. Divide the array into blocks, use a buffer block for merging, rotate blocks into position. Very complex but achieves the theoretical ideal.

Summary

Algorithm	Time	Space	Stable	When to Use
Counting sort	O(n + k)	O(n + k)	Yes	Small integer range
Radix sort (LSD)	O(d(n + b))	O(n + b)	Yes	Fixed-width integers
Bucket sort	O(n) expected	O(n)	Yes	Uniform distribution
Timsort	O(n) - O(n log n)	O(n)	Yes	General purpose (Python, Java)
Introsort	O(n log n)	O(log n)	No	General purpose (C++)
pdqsort	O(n log n)	O(log n)	No	General purpose (Rust)
External merge	O(N/B × log(N/B))	O(M)	Yes	Data larger than memory
Quickselect	O(n) expected	O(1)	No	k-th element / partition

Applications in CS

Standard libraries: Every language uses a hybrid sort. Understanding the design helps predict performance.
Database sorting: External merge sort for large result sets. Radix sort for integer columns.
Data processing: MapReduce shuffle/sort phase. Distributed sorting (TeraSort).
Networking: Packet classification, IP lookup table construction.
Graphics: Depth sorting for transparency rendering. Radix sort on GPU (very fast — billions of keys/sec).