6 min read
On this page

Approximation Algorithms

When exact solutions to NP-hard problems are infeasible, approximation algorithms find near-optimal solutions in polynomial time with provable quality guarantees.

Approximation Ratio

For a minimization problem, algorithm A has approximation ratio ρ if:

A(I) / OPT(I) ≤ ρ     for all instances I

For maximization: OPT(I) / A(I) ≤ ρ.

ρ = 1: Exact algorithm. ρ = 2: At most twice the optimal. ρ = O(log n): Logarithmic approximation.

Classic Approximation Algorithms

Vertex Cover — 2-Approximation

Problem: Find the smallest set of vertices covering all edges.

Algorithm: Find a maximal matching (greedily add non-adjacent edges). Take both endpoints of every matched edge.

PROCEDURE APPROX_VERTEX_COVER(graph)
    cover ← empty set
    used ← array of FALSE, size |graph|

    FOR u ← 0 TO |graph| - 1 DO
        IF used[u] THEN CONTINUE
        FOR EACH v IN graph[u] DO
            IF NOT used[v] THEN
                ADD u TO cover
                ADD v TO cover
                used[u] ← TRUE
                used[v] ← TRUE
                BREAK
    RETURN cover as list

Proof of ratio 2: Every edge in the matching must be covered. OPT must include at least one endpoint of each matched edge → OPT ≥ |matching|. Our cover has 2|matching| vertices → ratio ≤ 2.

Tight? Yes — consider a perfect matching (n/2 edges). OPT = n/2, our cover = n. Ratio = 2.

Can we do better? Under the Unique Games Conjecture, no polynomial algorithm achieves ratio < 2 - ε for vertex cover.

Set Cover — ln(n)-Approximation

Problem: Given universe U (|U| = n) and sets S₁, ..., Sₘ, find minimum number of sets covering U.

Greedy algorithm: Repeatedly pick the set covering the most uncovered elements.

while uncovered elements remain:
    pick the set Sᵢ covering the most uncovered elements
    add Sᵢ to the cover
    mark its elements as covered

Approximation ratio: Hₙ = 1 + 1/2 + 1/3 + ... + 1/n ≈ ln n.

Proof sketch: When element j is covered, at least (uncovered - OPT × price_so_far) elements remain. The greedy choice covers at least uncovered/OPT of them. This leads to the harmonic sum.

Tight? Yes — matching lower bound: (1 - ε)ln n is NP-hard to beat (Dinur-Steurer, 2014).

TSP — Christofides' 3/2-Approximation

For metric TSP (triangle inequality holds):

Algorithm (Christofides, 1976):

  1. Compute MST of the graph.
  2. Find the set S of odd-degree vertices in the MST.
  3. Compute minimum-weight perfect matching on S.
  4. Combine MST + matching → Eulerian multigraph.
  5. Find Euler tour.
  6. Shortcut: Skip repeated vertices (triangle inequality ensures shortcutting doesn't increase cost).

Approximation ratio: 3/2 (MST ≤ OPT; matching ≤ OPT/2; total ≤ 3/2 × OPT).

Best known (as of 2020): Slightly below 3/2 (Karlin, Klein, Gharan: (3/2 - ε)-approximation for some tiny ε).

For general TSP (no triangle inequality): No constant-factor approximation unless P = NP (follows from Hamiltonian cycle NP-completeness).

Knapsack — FPTAS

Fully Polynomial-Time Approximation Scheme: For any ε > 0, produces a (1-ε)-approximation in time O(n/ε).

Algorithm: Scale down all values by a factor of ε × max_value / n. Round to integers. Solve the scaled problem exactly with DP.

Original values:     v₁, v₂, ..., vₙ
Scaled values:       ⌊vᵢ × n / (ε × v_max)⌋
DP on scaled values: O(n × max_scaled_value) = O(n × n/ε) = O(n²/ε)

The rounding error is at most ε × OPT. So the result is ≥ (1-ε) × OPT.

FPTAS exists only for weakly NP-hard problems. Strongly NP-hard problems (unless P = NP) don't have FPTAS.

MAX-SAT

Randomized: Assign each variable TRUE with probability 1/2. Expected satisfied clauses ≥ OPT × (1 - 2^(-k)) for k-literal clauses. For 3-SAT: 7/8-approximation.

Deterministic (Johnson's algorithm): Greedily assign each variable to maximize satisfied clauses. 1/2-approximation.

LP relaxation + rounding: Solve the LP relaxation of MAX-SAT. Round fractional variables to 0/1. 3/4-approximation.

Best: Combine randomized and LP-based → 3/4-approximation (Goemans-Williamson for MAX-2SAT gives 0.878...).

Scheduling

Longest Processing Time (LPT) for makespan minimization on identical machines:

Sort jobs by processing time (decreasing). Assign each to the least-loaded machine.

Ratio: 4/3 - 1/(3m) where m = number of machines. As m → ∞, ratio → 4/3.

Approximation Scheme Hierarchy

| Type | Definition | Example | |---|---|---| | ρ-approximation | Within factor ρ of optimal | 2-approx vertex cover | | PTAS | (1+ε)-approximation for any ε, time O(n^f(1/ε)) | Euclidean TSP | | EPTAS | PTAS with time O(f(1/ε) × n^c) (FPT in 1/ε) | k-center | | FPTAS | PTAS with time polynomial in BOTH n and 1/ε | Knapsack |

PTAS: Polynomial for each fixed ε, but the exponent may depend on 1/ε (e.g., O(n^(1/ε))). Can be impractical for small ε.

FPTAS: Time is polynomial in n AND 1/ε. The most desirable approximation guarantee.

Inapproximability Results

Some problems can't be approximated below certain thresholds (unless P = NP):

| Problem | Best Known Ratio | Inapproximability | |---|---|---| | Vertex Cover | 2 | < 2 - ε (under UGC) | | Set Cover | ln n | < (1-ε) ln n | | MAX-3SAT | 7/8 | < 7/8 + ε | | Chromatic Number | O(n^(1-ε)) | < n^(1-ε) | | Clique | O(n^(1-ε)) | < n^(1-ε) | | TSP (general) | — | No constant ratio | | MAX-CUT | 0.878 (GW) | < 0.878 + ε (under UGC) |

Unique Games Conjecture (UGC): Khot (2002). If true, provides tight inapproximability for many problems (vertex cover, MAX-CUT, graph coloring). Currently unresolved — one of the most important open problems in complexity theory.

LP Relaxation and Rounding

A powerful framework for approximation algorithms:

  1. Formulate the optimization problem as an integer linear program (ILP).
  2. Relax integrality constraints (allow 0 ≤ x ≤ 1 instead of x ∈ {0, 1}).
  3. Solve the LP relaxation in polynomial time.
  4. Round the fractional solution to an integer solution.

LP optimum ≤ ILP optimum (LP is a relaxation → weaker constraints → at least as good).

Rounding gap = ILP optimum / LP optimum. The integrality gap determines the best approximation ratio achievable by LP rounding.

Randomized rounding: Round each fractional variable xᵢ to 1 with probability xᵢ. Expected cost matches the LP cost. Actual cost is close with high probability (Chernoff bounds).

Deterministic rounding: Round using specific rules (threshold, conditional probabilities). Guarantees on the worst case.

Semidefinite Programming (SDP)

More powerful than LP. Variables are positive semidefinite matrices (or equivalently, vectors with inner product constraints).

MAX-CUT (Goemans-Williamson, 1995): SDP relaxation + random hyperplane rounding → 0.878-approximation. Assuming UGC, this is optimal!

Applications in CS

  • Operations research: Vehicle routing, scheduling, facility location — all NP-hard, all have practical approximation algorithms.
  • Network design: Steiner tree approximation for network layout.
  • Machine learning: Clustering (k-means++ is an O(log k)-approximation for k-means), feature selection.
  • Compiler optimization: Register allocation and instruction scheduling are NP-hard; heuristics with bounded approximation ratios are used.
  • Cloud computing: Virtual machine placement, load balancing, resource allocation — approximation algorithms guide practical heuristics.
  • Competitive programming: Recognizing that a problem is NP-hard and switching to approximation or heuristics is a key skill.