Abstract Interpretation

Overview

Abstract interpretation, introduced by Patrick and Radhia Cousot (1977), is a theory of sound approximation of program semantics. Instead of computing exact program behavior (which is undecidable in general), abstract interpretation computes an over-approximation in a simpler mathematical domain. If the approximation satisfies a property, so does the concrete program. This enables fully automatic, sound static analysis.

Mathematical Foundations

Lattices

Abstract interpretation operates over complete lattices: partially ordered sets where every subset has a least upper bound (join, ⊔) and greatest lower bound (meet, ⊓).

        ⊤ (top = "don't know" / all values)
       / \
      /   \
    ...   ...
      \   /
       \ /
        ⊥ (bottom = "unreachable" / no values)

Key properties:

Partial order ⊑: defines precision (a ⊑ b means a is more precise than b)
Join (⊔): combines information (least upper bound, loses precision)
Meet (⊓): intersects information (greatest lower bound, gains precision)
⊤ (top): least precise element -- represents "any value"
⊥ (bottom): most precise element -- represents "no value" / unreachable

Fixpoints

Program loops and recursion correspond to fixpoint computations.

Knaster-Tarski theorem: every monotone function on a complete lattice has a least fixpoint and a greatest fixpoint.

For program analysis:

Start from ⊥ (nothing known)
Repeatedly apply the abstract transfer function
Converge to the least fixpoint = most precise sound result

⊥ ⊑ F(⊥) ⊑ F(F(⊥)) ⊑ ... ⊑ lfp(F)

Galois Connections

A Galois connection (α, γ) between concrete domain C and abstract domain A consists of:

Abstraction function α : C → A: maps concrete values to abstract values
Concretization function γ : A → C: maps abstract values back to concrete sets

Requirements:

∀c ∈ C, a ∈ A:  α(c) ⊑_A a  ⟺  c ⊑_C γ(a)

This ensures:

c ⊑ γ(α(c)): abstraction may lose information (over-approximation)
α(γ(a)) ⊑ a: concretization followed by abstraction is precise or an under-approximation
Soundness: if a property holds in the abstract, it holds in the concrete

Example: mapping integers to their sign

α({-3, -1, 0, 5}) = ⊤        (mixed signs → top)
α({-3, -1})       = Negative
α({0})             = Zero
γ(Positive)        = {1, 2, 3, ...}

Abstract Domains

Interval Domain

Tracks a lower and upper bound for each variable: x ∈ [lo, hi].

x := 5          → x ∈ [5, 5]
y := x + 3      → y ∈ [8, 8]
if (x > 0)      → x ∈ [1, 5] (in true branch)
z := x * y      → z ∈ [8, 40]

Properties:

Efficient: O(1) per variable per operation
No relational information: cannot express x ≤ y or x + y = 10
Widely used as a baseline domain

Octagon Domain (Mine, 2001)

Tracks constraints of the form ±x ± y ≤ c (unit two variable per inequality, UTVPI).

x - y ≤ 3       (x is at most 3 more than y)
x + y ≤ 10      (sum bounded)
-x + y ≤ 2      (y is at most 2 more than x)

Properties:

Captures relationships between pairs of variables
O(n^3) per operation for n variables (difference-bound matrices)
Good trade-off between precision and cost

Polyhedra Domain (Cousot & Halbwachs, 1978)

Tracks arbitrary linear inequalities: a1*x1 + a2*x2 + ... + an*xn ≤ c.

2x + 3y ≤ 12
x - y ≥ 0
x ≥ 0

Properties:

Most precise of the three relational numeric domains listed here
Exponential worst-case cost (doubly exponential for some operations)
Practical for moderate numbers of variables

Domain Hierarchy

Precision:    Signs  <  Intervals  <  Octagons  <  Polyhedra
Cost:         O(1)      O(1)         O(n³)        exponential

Other Notable Domains

Congruences: track x ≡ c (mod m) -- useful for alignment checks
Bit-vector domains: model machine arithmetic precisely
Pointer/shape domains: track heap structure (lists, trees)
String domains: track string lengths, prefixes, character sets
Reduced product: combine two domains, using each to refine the other

Widening and Narrowing

The Problem

Loops may require infinitely many fixpoint iterations:

x := 0; while (true) { x := x + 1; }
-- Iteration: x∈[0,0], x∈[0,1], x∈[0,2], ... never converges

Widening (∇)

Widening accelerates convergence by extrapolating the trend, jumping to a sound over-approximation.

[0,0] ∇ [0,1] = [0, +∞)

Rule: if the upper bound increased, jump to +∞. If the lower bound decreased, jump to -∞.

Widening guarantees termination in finite steps but may lose precision.

Narrowing (Δ)

After widening produces a post-fixpoint (over-approximation), narrowing recovers precision by iterating downward.

[0, +∞) Δ [0, 99] = [0, 99]    (if loop condition is x < 100)

Analysis pipeline:

Ascending (with widening) → Post-fixpoint → Descending (with narrowing) → Better fixpoint

Widening Strategies

Widening with thresholds: instead of jumping to ±∞, use a set of predefined constants (e.g., loop bounds, array sizes)
Delayed widening: apply widening only after N iterations, allowing initial precision
Widening at loop heads only: standard placement strategy

Astrée

Astrée is an industrial abstract interpreter developed at ENS Paris, specifically designed to prove the absence of runtime errors in safety-critical embedded C programs.

What Astrée proves:

No buffer overflows
No integer overflows (or defined wrapping behavior)
No division by zero
No invalid pointer dereferences
No uninitialized variable reads
No data races (in concurrent mode)

Key design decisions:

Domain combination tuned for control/command software (floating-point heavy, few pointers)
Trace partitioning: maintain separate abstract states for different execution paths
Parameterized by the target platform (integer sizes, float semantics)

Industrial use: used to verify Airbus fly-by-wire software (A340, A380) -- zero false alarms on ~130,000 lines of C, proving absence of all runtime errors.

Applications

Absence of Runtime Errors

The primary industrial application. Sound analysis proves no execution can trigger:

Array out-of-bounds access
Null pointer dereference
Arithmetic overflow/underflow
Division by zero
Use of uninitialized memory

Other Applications

Security: taint analysis, information flow tracking
Compiler optimization: value range analysis, constant propagation, dead code elimination
Termination analysis: ranking function synthesis
Resource analysis: stack depth, WCET (worst-case execution time)

Notable Tools

Tool	Domain	Application
Astrée	Embedded C	Absence of runtime errors
Polyspace	C, C++, Ada	Runtime error detection
Infer (Meta)	Java, C, ObjC	Memory safety, concurrency
Julia (AbsInt)	C, C++	Coding standards compliance
Mopsa	Python, C	Multi-language analysis
IKOS	LLVM IR	Buffer overflows

Soundness vs Completeness

Abstract interpretation is typically sound but incomplete:

Sound: if analysis says "no error," there truly is no error (no false negatives)
Incomplete: analysis may report false alarms (false positives) for safe programs

The art of abstract interpretation is designing domains and widening strategies that minimize false alarms while maintaining soundness and reasonable performance.

                    Concrete (undecidable)
                   /                      \
    Sound over-approximation     Unsound under-approximation
    (may have false positives)   (may miss real bugs)
    Abstract Interpretation      Testing, Fuzzing

Key Takeaways

Abstract interpretation provides a rigorous framework for sound static analysis
Galois connections formalize the relationship between concrete and abstract semantics
The interval/octagon/polyhedra hierarchy offers precision-cost trade-offs
Widening ensures termination; narrowing recovers precision
Astrée demonstrates industrial-scale verification: zero false alarms on real avionics code
Abstract interpretation underlies many practical tools (Infer, Polyspace, compiler analyses)