Kolmogorov Complexity
Algorithmic Information Theory
Kolmogorov complexity (algorithmic complexity, descriptive complexity) measures the information content of an individual string, independent of any probability distribution. While Shannon entropy characterizes the average information of a random source, Kolmogorov complexity captures the intrinsic complexity of a single object.
Definition: The Kolmogorov complexity of a string with respect to a universal Turing machine is:
where is the length of program in bits. Intuitively, is the length of the shortest program that produces .
Conditional complexity: --- the shortest program that produces given as auxiliary input.
Joint complexity: where is an effective pairing function.
Invariance Theorem
Theorem (Solomonoff, Kolmogorov, Chaitin): For any two universal Turing machines :
where is a constant depending only on the machines, not on .
Proof sketch: can simulate by prepending a fixed interpreter program. The constant equals the length of this interpreter.
Consequence: is well-defined up to an additive constant, so we fix a reference universal machine and suppress the subscript. All asymptotic statements about are machine-independent.
Prefix-Free Kolmogorov Complexity
To connect with coding theory and probability, we restrict to prefix-free programs (no program is a prefix of another):
This variant satisfies Kraft's inequality: , enabling interpretation as a probability distribution (algorithmic probability / universal prior).
The prefix-free complexity and plain complexity are related:
Basic Properties
- Upper bound: (the program can simply contain as a literal, plus its length encoding)
- Subadditivity:
- Symmetry of information: where is the shortest program for
- Uncomputability: is not computable (but upper semi-computable: we can enumerate shorter and shorter programs)
- Non-monotonicity: is not monotone under substrings; a substring of a random string can be compressible
Incompressibility Method
A string of length is -incompressible if .
Counting argument: at most programs shorter than bits, so at most strings have . Hence at least one string of each length is incompressible, and the fraction of -incompressible strings is .
The incompressibility method proves combinatorial results by choosing a random (incompressible) object and deriving contradictions. Applications:
- Lower bounds on sorting: any comparison sort of elements requires comparisons. Take an incompressible permutation; if the sort uses fewer comparisons, the comparison outcomes encode a shorter description
- Average-case analysis: the average number of comparisons in Quicksort is
- Graph theory: existence of graphs with specific properties (high girth and high chromatic number)
- Communication complexity: lower bounds on communication protocols
Algorithmic Randomness
Martin-Lof Randomness
An infinite binary sequence is Martin-Lof random if it passes all effective statistical tests. Formally, it avoids all constructive measure-zero sets.
A Martin-Lof test is a uniformly computably enumerable sequence of open sets with . A sequence is ML-random iff for every ML test.
Equivalent characterizations:
- Levin-Schnorr theorem: is ML-random iff for all (prefix-free complexity stays near maximum)
- Unpredictability: no computable martingale succeeds on (cannot computably gamble and win unbounded wealth)
- Typicality: satisfies all effectively testable properties that hold with probability 1 (law of large numbers, law of the iterated logarithm, etc.)
Hierarchy of Randomness Notions
From weakest to strongest:
- Schnorr randomness: pass all computable tests (test measures must be computably convergent)
- Computable randomness: no computable martingale succeeds
- Martin-Lof randomness: pass all c.e. tests
- 2-randomness: pass all tests (ML-random relative to the halting problem)
Each level strictly contains the next. ML-randomness is the most widely accepted formalization of algorithmic randomness.
Chaitin's Omega
The halting probability of a universal prefix-free machine :
Properties:
- (converges by Kraft's inequality)
- Martin-Lof random: the binary expansion of is an ML-random sequence
- Computably enumerable from below: we can compute increasingly accurate lower bounds
- Not computable: knowing the first bits of solves the halting problem for all programs of length
- Maximally unknowable: no consistent formal system of complexity can prove more than bits of
concentrates the difficulty of the halting problem: its first bits encode the answers to all halting questions for short programs. It is sometimes called the "number of wisdom."
Connections to Godel's Theorems
Kolmogorov complexity provides an elegant information-theoretic perspective on incompleteness:
Chaitin's incompleteness theorem: for any consistent formal system of complexity :
Proof: if could prove for arbitrarily large values, we could enumerate theorems of , find a proof that , and output --- using only bits (to specify , the search, and the constant), contradicting for large enough .
This mirrors Godel's first incompleteness theorem: any sufficiently powerful formal system has true but unprovable statements. The information-theoretic version reveals that the unprovable statements are precisely those asserting high complexity (randomness) of specific strings.
Algorithmic Probability and Solomonoff Induction
Algorithmic probability (Solomonoff, 1964): the probability that outputs when fed random bits:
This is a universal semimeasure: it dominates every computable semimeasure up to a multiplicative constant. .
Solomonoff's theory of induction: predict the next bit of a sequence by weighting all computable hypotheses by their algorithmic probability. This is the Bayesian-optimal predictor relative to the universal prior. It converges to the true distribution (if computable) at an optimal rate.
Connection to Occam's razor: simpler (shorter program) hypotheses receive higher prior weight. This provides a formal justification for preferring simpler explanations.
Minimum Description Length (MDL) Principle
MDL (Rissanen, 1978) is the practical operationalization of Kolmogorov complexity for statistical inference. Select the model and parameters minimizing:
where denotes description length (in bits).
Two-part MDL (crude): minimize model code length + data-given-model code length. Requires discretization of parameters.
Refined MDL (normalized maximum likelihood): the minimax optimal universal code for a model class :
The stochastic complexity (log of the NML denominator, the parametric complexity) asymptotically equals , connecting to BIC and Fisher information.
MDL advantages over classical hypothesis testing:
- No need for significance levels or p-values
- Automatically penalizes complexity (Occam)
- Consistent model selection
- Works with nested and non-nested models
Normalized Compression Distance (NCD)
A practical approximation to the normalized information distance:
which is a universal metric (up to additive precision): it minorizes every computable normalized distance.
NCD replaces with a real compressor (gzip, bzip2, etc.):
Applications: language classification, phylogenetic tree construction, plagiarism detection, malware analysis, music clustering. Remarkably parameter-free---the compressor implicitly captures all regularities.
Algorithmic Statistics
Algorithmic sufficient statistic: a set containing is an algorithmic sufficient statistic if:
- is small (the "model" is simple)
- (within , looks random)
- (two-part description is near-optimal)
The structure function characterizes the tradeoff between model complexity and randomness deficiency. Its behavior reveals whether has useful structure (stochastic, non-stochastic, or random).
Connections and Applications
- Kolmogorov structure function links to rate-distortion theory: minimum "noise" for a given model complexity budget
- Resource-bounded complexity: restrict to polynomial-time programs, connecting to computational complexity (e.g., pseudorandomness means )
- Logical depth (Bennett): the computational time of the shortest program for . Captures "organized complexity"---deep objects are neither simple nor random
- Sophistication: the complexity of the simplest "meaningful" description of (the algorithmic sufficient statistic)
- Effective dimension (Lutz): use Kolmogorov complexity to define Hausdorff dimension for individual sequences, connecting algorithmic randomness to fractal geometry