Probabilistically Checkable Proofs

PCP: Definition

A probabilistically checkable proof (PCP) system for a language L provides:

A proof string pi (of polynomial length) written by an omniscient prover
A polynomial-time verifier V that uses r(n) random bits and queries q(n) bits of pi

The system satisfies:

Completeness: If x in L, there exists pi such that V^pi(x) accepts with probability 1
Soundness: If x not in L, for all pi, V^pi(x) accepts with probability at most 1/2

PCP[r(n), q(n)] denotes the class of languages with such verifiers.

The PCP Theorem

Theorem (Arora-Safra, 1998; Arora-Lund-Motwani-Sudan-Szegedy, 1998):

NP = PCP[O(log n), O(1)]

Every NP statement has a proof that can be verified by reading only a constant number of bits, using only O(log n) random bits.

This is extraordinary: a polynomial-length proof of a SAT instance can be checked by examining O(1) positions, with constant soundness error. The O(log n) random bits select which positions to examine (giving polynomially many possible checks, covering the proof).

Proof Overview

The proof proceeds in stages, combining algebraic and combinatorial techniques:

Arithmetization: Encode the NP witness as evaluations of a low-degree polynomial over a finite field
Linearity testing: Use Blum-Luby-Rubinfeld linearity tests to ensure the proof is "close to" a valid low-degree polynomial
Low-degree testing: Verify that the proof string represents evaluations of a polynomial of bounded degree
Consistency checking: Combine the above with the constraint structure of the original NP problem
Alphabet reduction and composition: Use PCP composition to reduce the query complexity to O(1)

Dinur (2007) gave a substantially simpler combinatorial proof using gap amplification: starting from a "trivial" PCP (NP = PCP[poly(n), poly(n)]), iteratively amplify the soundness gap through graph powering and alphabet reduction, maintaining O(log n) randomness.

Inapproximability: The Fundamental Connection

The PCP theorem is equivalent to the existence of a gap-introducing reduction from SAT to itself. Specifically:

There exists epsilon > 0 and a polynomial-time reduction f such that:

If phi is satisfiable, then f(phi) is satisfiable
If phi is unsatisfiable, then no assignment satisfies more than (1 - epsilon) fraction of clauses of f(phi)

This transforms the PCP theorem from a proof-theoretic statement into a tool for hardness of approximation.

Hardness of Approximation

MAX-3SAT

Theorem (Hastad, 2001): It is NP-hard to distinguish between 3SAT instances where:

At least (1 - epsilon) fraction of clauses are satisfiable
At most (7/8 + epsilon) fraction of clauses are satisfiable

Since a random assignment satisfies 7/8 of all 3SAT clauses in expectation, this means: no polynomial-time algorithm can beat random assignment for MAX-3SAT (unless P = NP). This is an optimal inapproximability result.

Hastad's proof uses the 3-query PCP with specific algebraic structure (linearity testing over GF(2), Fourier analysis of Boolean functions).

MAX-CLIQUE

Theorem (Hastad, 1996): For every epsilon > 0, it is NP-hard to approximate the maximum clique in an n-vertex graph within a factor of n^{1-epsilon}.

The proof uses the connection between PCPs and graph products. If V is a PCP verifier making q queries, construct a graph where vertices are (randomness, proof-bits) pairs and edges connect consistent pairs. The clique number relates to the satisfiability of the original instance, with gap amplification via graph powering.

Stronger result (Khot, 2001, assuming UGC): Clique cannot be approximated within n^{1-epsilon} even with quasi-polynomial time.

Set Cover

Theorem (Dinur-Steurer, 2014): Set Cover cannot be approximated within (1 - epsilon) ln n for any epsilon > 0 unless P = NP.

This matches the greedy algorithm's O(ln n) approximation ratio, showing the greedy algorithm is essentially optimal.

Vertex Cover

Theorem (Dinur-Safra, 2005): Vertex Cover is NP-hard to approximate within 1.3606. Assuming UGC (Khot-Regev, 2008), it is hard to approximate within 2 - epsilon, matching the LP-based 2-approximation.

Traveling Salesman Problem

Metric TSP: NP-hard to approximate within 123/122 (Karpinski-Lampis-Schmied, 2015). Christofides' algorithm gives 3/2. Karlin-Klein-Oveis Gharan (2021) slightly beat 3/2.
General TSP: NP-hard to approximate within any polynomial factor.

The Unique Games Conjecture (UGC)

Conjecture (Khot, 2002): For every epsilon, delta > 0, there exists a constant k such that it is NP-hard to determine, given a Unique Games instance with alphabet size k:

(YES) There is an assignment satisfying >= (1-epsilon) fraction of constraints
(NO) No assignment satisfies more than delta fraction of constraints

A Unique Game is a 2-variable CSP where each constraint is a bijection between label sets.

Implications of UGC

If UGC is true:

MAX-CUT: The Goemans-Williamson SDP-based 0.878-approximation is optimal
Vertex Cover: The LP-based 2-approximation is optimal
Every Max-CSP: The optimal approximation ratio is exactly characterized by a semidefinite programming relaxation (Raghavendra, 2008)
Sparsest Cut, Minimum Linear Arrangement: Current algorithms are optimal

Status of UGC

Not proven, but widely studied as a working hypothesis
Subexponential algorithms exist for Unique Games (Arora-Barak-Steurer, 2010), which do not refute UGC but constrain potential proofs
The 2-to-2 Games Theorem (Khot-Minzer-Safra, 2018) proves a variant with imperfect completeness, a major step toward UGC

Gap Amplification

Starting from a weak PCP (large query complexity, small gap), gap amplification transforms it into a strong PCP (small query complexity, large gap).

Dinur's approach:

Represent the constraint graph as an expander (via graph powering)
Amplify the gap: if alpha fraction of constraints are violated, after powering, Omega(alpha) fraction are violated even locally
Reduce alphabet size through composition with an "inner verifier"
Iterate O(log n) times to achieve constant gap

Each iteration maintains O(log n) randomness and constant query complexity. The key technical insight: expander-based amplification preserves the structure needed for composition.

Problem	Hardness Ratio	Best Algorithm	Match?
MAX-3SAT	7/8 + epsilon	7/8 (random)	Yes (optimal)
MAX-CLIQUE	n^{1-epsilon}	O(n/log^2 n)	Near (both ~n)
Set Cover	(1-epsilon) ln n	(1+o(1)) ln n (greedy)	Yes
Vertex Cover	1.3606 (2-epsilon under UGC)	2 (LP)	Under UGC
MAX-CUT	~0.878 (under UGC)	0.878 (GW)	Under UGC
Chromatic Number	n^{1-epsilon}	O(n/log n)	Near
Shortest Vector (CVP)	2^{log^{1-epsilon} n}	Exponential	Gap

Connections to Other Areas

Coding theory: PCP proofs are locally testable codes (LTCs). The PCP theorem is equivalent to the existence of certain LTCs.
Property testing: PCP verifiers are property testers for proof validity.
Cryptography: PCPs underlie succinct arguments (SNARGs) and verifiable computation.
Proof complexity: PCPs provide strong lower bounds on proof systems (automatizability).

Open Problems

Resolve the Unique Games Conjecture
Prove the Sliding Scale Conjecture
Determine the exact approximation threshold for Metric TSP
Construct linear-size PCPs with O(1) queries (currently, PCPs have quasi-linear size)
Can Dinur-style amplification be made to work for 2-query PCPs (for Label Cover with perfect completeness)?