Theorem Proving

Overview

Interactive theorem provers (ITPs) allow users to construct machine-checked proofs of mathematical theorems and program properties. Unlike model checking, theorem proving handles infinite-state systems and arbitrary mathematical statements, but requires human guidance.

The Curry-Howard Correspondence

The Curry-Howard isomorphism connects logic and computation:

Logic	Programming
Proposition	Type
Proof	Program (term)
Implication A → B	Function type A → B
Conjunction A ∧ B	Product type (A, B)
Disjunction A ∨ B	Sum type A + B
Universal ∀x.P(x)	Dependent function (x:A) → P(x)
Existential ∃x.P(x)	Dependent pair Σ(x:A).P(x)
True	Unit type
False	Empty type (no inhabitants)

To prove a proposition, construct a term of the corresponding type. A type checker verifies the proof by type-checking the term. This is the foundation of proof assistants based on dependent type theory (Coq, Lean, Agda).

Major Theorem Provers

Coq

Based on the Calculus of Inductive Constructions (CIC). Uses a tactic language (Ltac/Ltac2) for interactive proof development.

Theorem plus_comm : forall n m : nat, n + m = m + n.
Proof.
  intros n m.
  induction n as [| n' IHn'].
  - simpl. rewrite Nat.add_0_r. reflexivity.
  - simpl. rewrite IHn'. rewrite Nat.add_succ_r. reflexivity.
Qed.

(* Certified programming: specify and verify in one language *)
Definition safe_head {A : Type} (l : list A) (H : l <> nil) : A.
  destruct l.
  - contradiction.
  - exact a.
Defined.

Key features:

Extraction: compile Coq programs to OCaml, Haskell, or Scheme
Program and Equations for dependently-typed programming
Rich ecosystem of libraries (MathComp, stdpp)

Notable projects: CompCert (verified C compiler), Fiat Cryptography (verified crypto), the Four Color Theorem proof.

Isabelle/HOL

Based on Higher-Order Logic (HOL). Uses the Isar structured proof language and powerful automation (Sledgehammer integrates external ATP provers).

theorem plus_comm: "n + m = m + (n::nat)"
  by (induction n) auto

lemma list_rev_rev: "rev (rev xs) = xs"
  by (induction xs) auto

(* Isar structured proof *)
theorem sqrt2_irrational:
  "sqrt 2 ∉ ℚ"
proof (rule notI)
  assume "sqrt 2 ∈ ℚ"
  then obtain p q :: int where
    "q ≠ 0" and "sqrt 2 = p / q" and "coprime p q"
    ...
qed

Key features:

Sledgehammer: automatically finds proofs using external provers (Z3, CVC5, E, SPASS)
Code generation to Haskell, Scala, OCaml, SML
Locales for modular mathematical reasoning
Archive of Formal Proofs (AFP): 700+ entries

Notable projects: seL4 (verified OS microkernel), CakeML (verified ML compiler).

Lean 4

A newer prover doubling as a general-purpose functional programming language. Based on the Calculus of Inductive Constructions with quotient types.

theorem plus_comm (n m : Nat) : n + m = m + n := by
  induction n with
  | zero => simp
  | succ n ih => simp [Nat.succ_add, ih]

-- Lean 4 as a programming language
def quicksort (xs : List Nat) : List Nat :=
  match xs with
  | [] => []
  | pivot :: rest =>
    let smaller := rest.filter (· ≤ pivot)
    let larger := rest.filter (· > pivot)
    quicksort smaller ++ [pivot] ++ quicksort larger
termination_by xs.length

Key features:

Metaprogramming framework in Lean itself
Mathlib: the largest unified library of formalized mathematics (200k+ theorems)
Fast compiler, can be used for systems programming
Strong tactic automation and term-mode proving

Agda

A dependently-typed programming language and proof assistant. No separate tactic language -- proofs are written as programs directly.

data Vec (A : Set) : Nat → Set where
  []  : Vec A zero
  _∷_ : {n : Nat} → A → Vec A n → Vec A (suc n)

head : {A : Set} {n : Nat} → Vec A (suc n) → A
head (x ∷ _) = x
-- No need for a "non-empty" check: the type guarantees it

-- Proof by pattern matching and recursion
+-comm : (m n : Nat) → m + n ≡ n + m
+-comm zero    n = sym (+-identityʳ n)
+-comm (suc m) n = cong suc (+-comm m n) ∙ sym (+-suc n m)

Key features:

Programs are proofs, proofs are programs (no tactic layer)
Cubical Agda supports Homotopy Type Theory (HoTT)
Excellent for teaching dependent types

Tactics

Tactics are commands that transform proof goals. They construct proof terms behind the scenes.

Common tactics (Coq names, similar in Lean/Isabelle):

Tactic	Effect
`intro` / `intros`	Introduce hypotheses
`apply H`	Apply hypothesis or lemma H
`rewrite H`	Rewrite using equation H
`induction x`	Structural induction on x
`cases x`	Case split on x
`simp`	Simplification using rewrite rules
`auto`	Automated proof search
`omega` / `linarith`	Linear arithmetic decision procedure
`ring`	Ring equation solver
`exact t`	Provide exact proof term t
`contradiction`	Close goal from contradictory hypotheses

Proof automation hierarchy:

Decision procedures: always work for their domain (omega, ring)
Tactics with search: try many approaches (auto, simp)
Sledgehammer-style: call external provers
Manual proof: human constructs the argument

Certified Programming

The ultimate goal: programs that carry machine-checked proofs of correctness.

Approaches:

Extraction: write spec + proof in ITP, extract executable code (Coq → OCaml)
Intrinsic verification: use dependent types so ill-formed programs don't typecheck
Extrinsic verification: write program, then prove properties about it separately
Refinement: start with abstract spec, refine to efficient implementation with proofs

Landmark verified software:

CompCert: optimizing C compiler, every optimization formally verified (Coq)
seL4: high-assurance OS microkernel with full functional correctness proof (Isabelle)
CakeML: verified ML compiler with verified garbage collector (HOL4)
Fiat Cryptography: generates verified field arithmetic for cryptographic curves (Coq)
Iris: framework for concurrent separation logic, used in RustBelt (Coq)

Mathematical Formalization

Theorem provers increasingly formalize pure mathematics:

Mathlib (Lean): 200k+ theorems covering algebra, analysis, topology, number theory
Perfectoid spaces (Lean): formalization of cutting-edge algebraic geometry
Liquid Tensor Experiment: formalization of a result by Peter Scholze (Lean)
Kepler Conjecture (HOL Light/Isabelle): Hales' Flyspeck project
Four Color Theorem (Coq): first major theorem formalized with a proof assistant

Comparison

Feature	Coq	Isabelle/HOL	Lean 4	Agda
Logic	CIC (constructive)	HOL (classical)	CIC + quotients	Martin-Lof TT
Proof style	Tactics (Ltac)	Isar + tactics	Tactics + terms	Direct terms
Automation	Moderate	Strong (Sledgehammer)	Growing	Minimal
Programming	Via extraction	Code generation	Native	Native
Math library	MathComp	AFP	Mathlib	stdlib
Learning curve	Steep	Moderate	Moderate	Steep

Key Takeaways

Theorem provers provide the highest assurance level: proofs are machine-checked
Curry-Howard connects proofs to programs -- a proof of P is a program of type P
Tactics provide a high-level language for proof construction
Verified software (CompCert, seL4) demonstrates feasibility for real systems
Lean 4 and Mathlib are accelerating mathematical formalization
The gap between proving and programming continues to narrow