5 min read
On this page

Duality and KKT Conditions

The General Optimization Problem

Consider the standard form:

minimize    f(x)
subject to  g_i(x) <= 0,   i = 1,...,m
            h_j(x) = 0,    j = 1,...,p

with optimal value p*. The feasible set is the intersection of m inequality and p equality constraints.

Lagrangian and Lagrange Multipliers

The Lagrangian associates a multiplier with each constraint:

L(x, lambda, nu) = f(x) + sum_i lambda_i * g_i(x) + sum_j nu_j * h_j(x)

where lambda in R^m (lambda >= 0) are the inequality multipliers and nu in R^p are the equality multipliers. The pair (lambda, nu) is called the dual variables.

Lagrangian Interpretation

For any lambda >= 0 and any nu, the Lagrangian provides a lower bound:

inf_x L(x, lambda, nu) <= p*

This holds because at any feasible x: g_i(x) <= 0 with lambda_i >= 0 gives lambda_i * g_i(x) <= 0, and h_j(x) = 0 gives nu_j * h_j(x) = 0. So L(x, lambda, nu) <= f(x) for feasible x.

The Dual Problem

The Lagrangian dual function is:

q(lambda, nu) = inf_x L(x, lambda, nu)

This is always concave (infimum of affine functions of lambda, nu), even when the primal is nonconvex. The dual problem is:

maximize    q(lambda, nu)
subject to  lambda >= 0

with optimal value d*.

Weak Duality

For any optimization problem: d <= p**.

The difference p* - d* is the duality gap. Weak duality always holds, regardless of convexity. This makes dual solutions useful as certificates of optimality: if a feasible primal solution x achieves f(x) = d*, then x is optimal.

Strong Duality

Strong duality holds when d* = p* (zero duality gap). This does not hold in general but is guaranteed under constraint qualifications.

When Strong Duality Holds

  • Convex problems with Slater's condition: always.
  • Linear programs: always (no constraint qualification needed beyond feasibility).
  • Conic programs (SOCP, SDP): under mild conditions.
  • Nonconvex problems: generally does not hold, but can for specific structures.

Constraint Qualifications

Constraint qualifications are regularity conditions ensuring that the geometry of the constraint set is well-behaved at an optimal point.

Slater's Condition

There exists a strictly feasible point x_0 with:

  • g_i(x_0) < 0 for all i (strict inequality for all inequality constraints)
  • h_j(x_0) = 0 for all j

For convex problems, Slater's condition guarantees strong duality. Note: affine inequality constraints need only be satisfied (not strictly). Slater's is the most commonly used CQ in convex optimization.

LICQ (Linear Independence Constraint Qualification)

At the optimal point x*, the gradients of all active inequality constraints and all equality constraints are linearly independent:

{ grad(g_i(x*)) : g_i(x*) = 0 } ∪ { grad(h_j(x*)) : j = 1,...,p }

are linearly independent. LICQ is stronger than Slater's but applies to nonconvex problems. It guarantees uniqueness of the KKT multipliers.

Other Constraint Qualifications

| CQ | Condition | |----|-----------| | MFCQ (Mangasarian-Fromovitz) | Active constraint gradients are positively linearly independent | | CRCQ (Constant Rank) | Active constraint gradients have constant rank in a neighborhood | | ACQ (Abadie) | Tangent cone equals linearized feasible cone | | GCQ (Guignard) | Weakest CQ under which KKT is necessary |

LICQ => MFCQ => ACQ => GCQ. LICQ is sufficient for most practical purposes.

KKT Conditions

The Karush-Kuhn-Tucker (KKT) conditions are first-order necessary conditions for optimality (under a CQ). For differentiable f, g_i, h_j:

1. Stationarity:

grad(f(x*)) + sum_i lambda_i * grad(g_i(x*)) + sum_j nu_j * grad(h_j(x*)) = 0

2. Primal feasibility:

g_i(x*) <= 0,  h_j(x*) = 0

3. Dual feasibility:

lambda_i >= 0

4. Complementary slackness:

lambda_i * g_i(x*) = 0   for all i

Interpretation

  • Stationarity: the gradient of f at x* is a non-negative combination of the active constraint gradients—f cannot decrease in any feasible direction.
  • Complementary slackness: either a constraint is active (g_i = 0) or its multiplier is zero (lambda_i = 0). Inactive constraints do not influence the optimal solution.
  • The multiplier lambda_i measures the sensitivity of p* to perturbation of g_i. Specifically, if the constraint becomes g_i(x) <= epsilon, then dp*/d(epsilon) = -lambda_i at epsilon = 0.

KKT for Convex Problems

For convex problems (f, g_i convex, h_j affine) with a CQ, KKT conditions are both necessary and sufficient for global optimality. This is the central result connecting duality and optimality in convex optimization.

Saddle Point Interpretation

A point (x*, lambda*, nu*) is a saddle point of the Lagrangian if:

L(x*, lambda, nu) <= L(x*, lambda*, nu*) <= L(x, lambda*, nu*)

for all x, all lambda >= 0, and all nu. Equivalently:

  • x* minimizes L over x (given lambda*, nu*)
  • (lambda*, nu*) maximizes L over dual variables (given x*)

A saddle point exists if and only if strong duality holds and both primal and dual optima are attained. The saddle point conditions are equivalent to KKT.

Duality Gap in Practice

Zero Gap (Strong Duality)

The dual solution provides an optimality certificate. Solvers use the duality gap f(x) - q(lambda, nu) as a stopping criterion: when it falls below a tolerance, both primal and dual solutions are near-optimal.

Nonzero Gap

For nonconvex problems, the gap can be positive. The dual solution still provides a lower bound, useful for:

  • Bounding the suboptimality of heuristic solutions
  • Branch and bound (LP relaxation duality gap drives pruning)
  • Lagrangian relaxation in combinatorial optimization

Lagrangian Relaxation

For hard (e.g., integer) optimization problems, relax complicating constraints into the objective:

q(lambda) = min_x { f(x) + sum_i lambda_i * g_i(x) : x in X }

where X captures "easy" constraints. The dual function q(lambda) is concave and can be maximized via subgradient methods or bundle methods. This yields bounds for branch and bound.

Examples

Equality-Constrained Quadratic

minimize  (1/2) x^T Q x + c^T x   subject to  Ax = b

KKT stationarity: Qx + c + A^T nu = 0. Combined with Ax = b, this gives the KKT system:

[Q   A^T] [x ]   [-c]
[A   0  ] [nu] = [ b]

Support Vector Machine (SVM)

The SVM dual reveals the kernel trick: the dual involves only inner products x_i^T x_j, enabling nonlinear classification via kernel substitution. Complementary slackness identifies support vectors (points with lambda_i > 0).

Connections

  • LP duality as a special case (→ 01-linear-programming.md).
  • Convex optimization where KKT is sufficient (→ 03-convex-optimization.md).
  • Conic duality extending these ideas (→ 05-conic-programming.md).
  • Penalty and augmented Lagrangian methods for constrained optimization (→ 06-nonlinear-programming.md).