5 min read
On this page

Theorem Proving

Overview

Interactive theorem provers (ITPs) allow users to construct machine-checked proofs of mathematical theorems and program properties. Unlike model checking, theorem proving handles infinite-state systems and arbitrary mathematical statements, but requires human guidance.

The Curry-Howard Correspondence

The Curry-Howard isomorphism connects logic and computation:

| Logic | Programming | |-------|-------------| | Proposition | Type | | Proof | Program (term) | | Implication A → B | Function type A → B | | Conjunction A ∧ B | Product type (A, B) | | Disjunction A ∨ B | Sum type A + B | | Universal ∀x.P(x) | Dependent function (x:A) → P(x) | | Existential ∃x.P(x) | Dependent pair Σ(x:A).P(x) | | True | Unit type | | False | Empty type (no inhabitants) |

To prove a proposition, construct a term of the corresponding type. A type checker verifies the proof by type-checking the term. This is the foundation of proof assistants based on dependent type theory (Coq, Lean, Agda).

Major Theorem Provers

Coq

Based on the Calculus of Inductive Constructions (CIC). Uses a tactic language (Ltac/Ltac2) for interactive proof development.

Theorem plus_comm : forall n m : nat, n + m = m + n.
Proof.
  intros n m.
  induction n as [| n' IHn'].
  - simpl. rewrite Nat.add_0_r. reflexivity.
  - simpl. rewrite IHn'. rewrite Nat.add_succ_r. reflexivity.
Qed.

(* Certified programming: specify and verify in one language *)
Definition safe_head {A : Type} (l : list A) (H : l <> nil) : A.
  destruct l.
  - contradiction.
  - exact a.
Defined.

Key features:

  • Extraction: compile Coq programs to OCaml, Haskell, or Scheme
  • Program and Equations for dependently-typed programming
  • Rich ecosystem of libraries (MathComp, stdpp)

Notable projects: CompCert (verified C compiler), Fiat Cryptography (verified crypto), the Four Color Theorem proof.

Isabelle/HOL

Based on Higher-Order Logic (HOL). Uses the Isar structured proof language and powerful automation (Sledgehammer integrates external ATP provers).

theorem plus_comm: "n + m = m + (n::nat)"
  by (induction n) auto

lemma list_rev_rev: "rev (rev xs) = xs"
  by (induction xs) auto

(* Isar structured proof *)
theorem sqrt2_irrational:
  "sqrt 2 ∉ ℚ"
proof (rule notI)
  assume "sqrt 2 ∈ ℚ"
  then obtain p q :: int where
    "q ≠ 0" and "sqrt 2 = p / q" and "coprime p q"
    ...
qed

Key features:

  • Sledgehammer: automatically finds proofs using external provers (Z3, CVC5, E, SPASS)
  • Code generation to Haskell, Scala, OCaml, SML
  • Locales for modular mathematical reasoning
  • Archive of Formal Proofs (AFP): 700+ entries

Notable projects: seL4 (verified OS microkernel), CakeML (verified ML compiler).

Lean 4

A newer prover doubling as a general-purpose functional programming language. Based on the Calculus of Inductive Constructions with quotient types.

theorem plus_comm (n m : Nat) : n + m = m + n := by
  induction n with
  | zero => simp
  | succ n ih => simp [Nat.succ_add, ih]

-- Lean 4 as a programming language
def quicksort (xs : List Nat) : List Nat :=
  match xs with
  | [] => []
  | pivot :: rest =>
    let smaller := rest.filter (· ≤ pivot)
    let larger := rest.filter (· > pivot)
    quicksort smaller ++ [pivot] ++ quicksort larger
termination_by xs.length

Key features:

  • Metaprogramming framework in Lean itself
  • Mathlib: the largest unified library of formalized mathematics (200k+ theorems)
  • Fast compiler, can be used for systems programming
  • Strong tactic automation and term-mode proving

Agda

A dependently-typed programming language and proof assistant. No separate tactic language -- proofs are written as programs directly.

data Vec (A : Set) : Nat → Set where
  []  : Vec A zero
  _∷_ : {n : Nat} → A → Vec A n → Vec A (suc n)

head : {A : Set} {n : Nat} → Vec A (suc n) → A
head (x ∷ _) = x
-- No need for a "non-empty" check: the type guarantees it

-- Proof by pattern matching and recursion
+-comm : (m n : Nat) → m + n ≡ n + m
+-comm zero    n = sym (+-identityʳ n)
+-comm (suc m) n = cong suc (+-comm m n) ∙ sym (+-suc n m)

Key features:

  • Programs are proofs, proofs are programs (no tactic layer)
  • Cubical Agda supports Homotopy Type Theory (HoTT)
  • Excellent for teaching dependent types

Tactics

Tactics are commands that transform proof goals. They construct proof terms behind the scenes.

Common tactics (Coq names, similar in Lean/Isabelle):

| Tactic | Effect | |--------|--------| | intro / intros | Introduce hypotheses | | apply H | Apply hypothesis or lemma H | | rewrite H | Rewrite using equation H | | induction x | Structural induction on x | | cases x | Case split on x | | simp | Simplification using rewrite rules | | auto | Automated proof search | | omega / linarith | Linear arithmetic decision procedure | | ring | Ring equation solver | | exact t | Provide exact proof term t | | contradiction | Close goal from contradictory hypotheses |

Proof automation hierarchy:

  1. Decision procedures: always work for their domain (omega, ring)
  2. Tactics with search: try many approaches (auto, simp)
  3. Sledgehammer-style: call external provers
  4. Manual proof: human constructs the argument

Certified Programming

The ultimate goal: programs that carry machine-checked proofs of correctness.

Approaches:

  1. Extraction: write spec + proof in ITP, extract executable code (Coq → OCaml)
  2. Intrinsic verification: use dependent types so ill-formed programs don't typecheck
  3. Extrinsic verification: write program, then prove properties about it separately
  4. Refinement: start with abstract spec, refine to efficient implementation with proofs

Landmark verified software:

  • CompCert: optimizing C compiler, every optimization formally verified (Coq)
  • seL4: high-assurance OS microkernel with full functional correctness proof (Isabelle)
  • CakeML: verified ML compiler with verified garbage collector (HOL4)
  • Fiat Cryptography: generates verified field arithmetic for cryptographic curves (Coq)
  • Iris: framework for concurrent separation logic, used in RustBelt (Coq)

Mathematical Formalization

Theorem provers increasingly formalize pure mathematics:

  • Mathlib (Lean): 200k+ theorems covering algebra, analysis, topology, number theory
  • Perfectoid spaces (Lean): formalization of cutting-edge algebraic geometry
  • Liquid Tensor Experiment: formalization of a result by Peter Scholze (Lean)
  • Kepler Conjecture (HOL Light/Isabelle): Hales' Flyspeck project
  • Four Color Theorem (Coq): first major theorem formalized with a proof assistant

Comparison

| Feature | Coq | Isabelle/HOL | Lean 4 | Agda | |---------|-----|-------------|--------|------| | Logic | CIC (constructive) | HOL (classical) | CIC + quotients | Martin-Lof TT | | Proof style | Tactics (Ltac) | Isar + tactics | Tactics + terms | Direct terms | | Automation | Moderate | Strong (Sledgehammer) | Growing | Minimal | | Programming | Via extraction | Code generation | Native | Native | | Math library | MathComp | AFP | Mathlib | stdlib | | Learning curve | Steep | Moderate | Moderate | Steep |

Key Takeaways

  • Theorem provers provide the highest assurance level: proofs are machine-checked
  • Curry-Howard connects proofs to programs -- a proof of P is a program of type P
  • Tactics provide a high-level language for proof construction
  • Verified software (CompCert, seL4) demonstrates feasibility for real systems
  • Lean 4 and Mathlib are accelerating mathematical formalization
  • The gap between proving and programming continues to narrow