5 min read
On this page

Abstract Interpretation

Overview

Abstract interpretation, introduced by Patrick and Radhia Cousot (1977), is a theory of sound approximation of program semantics. Instead of computing exact program behavior (which is undecidable in general), abstract interpretation computes an over-approximation in a simpler mathematical domain. If the approximation satisfies a property, so does the concrete program. This enables fully automatic, sound static analysis.

Mathematical Foundations

Lattices

Abstract interpretation operates over complete lattices: partially ordered sets where every subset has a least upper bound (join, ⊔) and greatest lower bound (meet, ⊓).

        ⊤ (top = "don't know" / all values)
       / \
      /   \
    ...   ...
      \   /
       \ /
        ⊥ (bottom = "unreachable" / no values)

Key properties:

  • Partial order ⊑: defines precision (a ⊑ b means a is more precise than b)
  • Join (⊔): combines information (least upper bound, loses precision)
  • Meet (⊓): intersects information (greatest lower bound, gains precision)
  • (top): least precise element -- represents "any value"
  • (bottom): most precise element -- represents "no value" / unreachable

Fixpoints

Program loops and recursion correspond to fixpoint computations.

Knaster-Tarski theorem: every monotone function on a complete lattice has a least fixpoint and a greatest fixpoint.

For program analysis:

  • Start from ⊥ (nothing known)
  • Repeatedly apply the abstract transfer function
  • Converge to the least fixpoint = most precise sound result
⊥ ⊑ F(⊥) ⊑ F(F(⊥)) ⊑ ... ⊑ lfp(F)

Galois Connections

A Galois connection (α, γ) between concrete domain C and abstract domain A consists of:

  • Abstraction function α : C → A: maps concrete values to abstract values
  • Concretization function γ : A → C: maps abstract values back to concrete sets

Requirements:

∀c ∈ C, a ∈ A:  α(c) ⊑_A a  ⟺  c ⊑_C γ(a)

This ensures:

  • c ⊑ γ(α(c)): abstraction may lose information (over-approximation)
  • α(γ(a)) ⊑ a: concretization followed by abstraction is precise or an under-approximation
  • Soundness: if a property holds in the abstract, it holds in the concrete

Example: mapping integers to their sign

α({-3, -1, 0, 5}) = ⊤        (mixed signs → top)
α({-3, -1})       = Negative
α({0})             = Zero
γ(Positive)        = {1, 2, 3, ...}

Abstract Domains

Interval Domain

Tracks a lower and upper bound for each variable: x ∈ [lo, hi].

x := 5          → x ∈ [5, 5]
y := x + 3      → y ∈ [8, 8]
if (x > 0)      → x ∈ [1, 5] (in true branch)
z := x * y      → z ∈ [8, 40]

Properties:

  • Efficient: O(1) per variable per operation
  • No relational information: cannot express x ≤ y or x + y = 10
  • Widely used as a baseline domain

Octagon Domain (Mine, 2001)

Tracks constraints of the form ±x ± y ≤ c (unit two variable per inequality, UTVPI).

x - y ≤ 3       (x is at most 3 more than y)
x + y ≤ 10      (sum bounded)
-x + y ≤ 2      (y is at most 2 more than x)

Properties:

  • Captures relationships between pairs of variables
  • O(n^3) per operation for n variables (difference-bound matrices)
  • Good trade-off between precision and cost

Polyhedra Domain (Cousot & Halbwachs, 1978)

Tracks arbitrary linear inequalities: a1*x1 + a2*x2 + ... + an*xn ≤ c.

2x + 3y ≤ 12
x - y ≥ 0
x ≥ 0

Properties:

  • Most precise of the three relational numeric domains listed here
  • Exponential worst-case cost (doubly exponential for some operations)
  • Practical for moderate numbers of variables

Domain Hierarchy

Precision:    Signs  <  Intervals  <  Octagons  <  Polyhedra
Cost:         O(1)      O(1)         O(n³)        exponential

Other Notable Domains

  • Congruences: track x ≡ c (mod m) -- useful for alignment checks
  • Bit-vector domains: model machine arithmetic precisely
  • Pointer/shape domains: track heap structure (lists, trees)
  • String domains: track string lengths, prefixes, character sets
  • Reduced product: combine two domains, using each to refine the other

Widening and Narrowing

The Problem

Loops may require infinitely many fixpoint iterations:

x := 0; while (true) { x := x + 1; }
-- Iteration: x∈[0,0], x∈[0,1], x∈[0,2], ... never converges

Widening (∇)

Widening accelerates convergence by extrapolating the trend, jumping to a sound over-approximation.

[0,0] ∇ [0,1] = [0, +∞)

Rule: if the upper bound increased, jump to +∞. If the lower bound decreased, jump to -∞.

Widening guarantees termination in finite steps but may lose precision.

Narrowing (Δ)

After widening produces a post-fixpoint (over-approximation), narrowing recovers precision by iterating downward.

[0, +∞) Δ [0, 99] = [0, 99]    (if loop condition is x < 100)

Analysis pipeline:

Ascending (with widening) → Post-fixpoint → Descending (with narrowing) → Better fixpoint

Widening Strategies

  • Widening with thresholds: instead of jumping to ±∞, use a set of predefined constants (e.g., loop bounds, array sizes)
  • Delayed widening: apply widening only after N iterations, allowing initial precision
  • Widening at loop heads only: standard placement strategy

Astrée

Astrée is an industrial abstract interpreter developed at ENS Paris, specifically designed to prove the absence of runtime errors in safety-critical embedded C programs.

What Astrée proves:

  • No buffer overflows
  • No integer overflows (or defined wrapping behavior)
  • No division by zero
  • No invalid pointer dereferences
  • No uninitialized variable reads
  • No data races (in concurrent mode)

Key design decisions:

  • Domain combination tuned for control/command software (floating-point heavy, few pointers)
  • Trace partitioning: maintain separate abstract states for different execution paths
  • Parameterized by the target platform (integer sizes, float semantics)

Industrial use: used to verify Airbus fly-by-wire software (A340, A380) -- zero false alarms on ~130,000 lines of C, proving absence of all runtime errors.

Applications

Absence of Runtime Errors

The primary industrial application. Sound analysis proves no execution can trigger:

  • Array out-of-bounds access
  • Null pointer dereference
  • Arithmetic overflow/underflow
  • Division by zero
  • Use of uninitialized memory

Other Applications

  • Security: taint analysis, information flow tracking
  • Compiler optimization: value range analysis, constant propagation, dead code elimination
  • Termination analysis: ranking function synthesis
  • Resource analysis: stack depth, WCET (worst-case execution time)

Notable Tools

| Tool | Domain | Application | |------|--------|-------------| | Astrée | Embedded C | Absence of runtime errors | | Polyspace | C, C++, Ada | Runtime error detection | | Infer (Meta) | Java, C, ObjC | Memory safety, concurrency | | Julia (AbsInt) | C, C++ | Coding standards compliance | | Mopsa | Python, C | Multi-language analysis | | IKOS | LLVM IR | Buffer overflows |

Soundness vs Completeness

Abstract interpretation is typically sound but incomplete:

  • Sound: if analysis says "no error," there truly is no error (no false negatives)
  • Incomplete: analysis may report false alarms (false positives) for safe programs

The art of abstract interpretation is designing domains and widening strategies that minimize false alarms while maintaining soundness and reasonable performance.

                    Concrete (undecidable)
                   /                      \
    Sound over-approximation     Unsound under-approximation
    (may have false positives)   (may miss real bugs)
    Abstract Interpretation      Testing, Fuzzing

Key Takeaways

  • Abstract interpretation provides a rigorous framework for sound static analysis
  • Galois connections formalize the relationship between concrete and abstract semantics
  • The interval/octagon/polyhedra hierarchy offers precision-cost trade-offs
  • Widening ensures termination; narrowing recovers precision
  • Astrée demonstrates industrial-scale verification: zero false alarms on real avionics code
  • Abstract interpretation underlies many practical tools (Infer, Polyspace, compiler analyses)