Program Transformation
Overview
Program transformations systematically rewrite programs to improve performance, change representation, or enable further analysis. They preserve semantics while modifying structure.
Partial Evaluation
Specializes a program with respect to known (static) inputs, producing a residual program over the remaining (dynamic) inputs.
Basic Idea
Given prog(static_input, dynamic_input), a partial evaluator produces prog_specialized(dynamic_input) that is equivalent but faster because static computations are performed at specialization time.
power(n, x) = if n == 0 then 1 else x * power(n-1, x)
-- Specialize for n=3:
power_3(x) = x * x * x * 1 -- after constant folding: x * x * x
Binding-Time Analysis
Classifies expressions as static (known at specialization time) or dynamic (known only at runtime). Guides what to evaluate vs. what to residualize.
- Monovariant: each expression gets one classification
- Polyvariant: same expression may be classified differently in different contexts
The Futamura Projections
Three remarkable applications of partial evaluation, discovered by Futamura (1971).
First projection: Specializing an interpreter with respect to a source program yields a compiled program.
pe(interpreter, source) = target_program
The interpreter "runs" at specialization time; only dynamic operations remain.
Second projection: Specializing a partial evaluator with respect to an interpreter yields a compiler.
pe(pe, interpreter) = compiler
Third projection: Specializing a partial evaluator with respect to itself yields a compiler generator.
pe(pe, pe) = compiler_generator
compiler_generator(interpreter) = compiler
Practical Challenges
- Termination: partial evaluation of recursive programs may not terminate; requires widening or memoization (polyvariant specialization)
- Code explosion: unfolding loops/recursion can produce exponentially large residual programs
- Online vs. offline: online PE decides what to specialize during specialization; offline PE uses a separate binding-time analysis phase
Supercompilation
A more aggressive program transformation that drives evaluation symbolically, tracking all possible execution paths.
Process
- Drive: symbolically evaluate the program, building a process tree
- Generalize: when the tree grows too large or a repeated pattern is detected, generalize (abstract) the state
- Fold: identify when a state is a renaming of a previously seen state; create a loop
Key Operations
- Driving: unfolds function calls and case-splits on unknown values
- Generalization (most specific generalization / anti-unification): finds a common pattern between two expressions
- Homeomorphic embedding: termination criterion. If a new expression embeds a previous one, generalize to prevent infinite unfolding
Compared to Partial Evaluation
- Supercompilation considers all branches (not just the static path)
- Can discover optimizations that partial evaluation misses (e.g., deforestation emerges naturally)
- More expensive; harder to control termination and code size
- Implementations: SPSC, HOSC (Haskell), Distillation
Deforestation
Eliminates intermediate data structures in compositions of producers and consumers.
The Problem
sum (map (+1) (filter even xs))
Naive execution builds two intermediate lists: one from filter, one from map. Deforestation fuses these into a single traversal.
Wadler's Deforestation
Applies to treeless programs (no intermediate constructors). Transform compositions so producers and consumers are fused.
Shortcut Fusion (foldr/build)
Based on the foldr/build framework:
build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
-- Fusion law:
foldr k z (build g) = g k z
Producers use build; consumers use foldr. GHC applies this as a rewrite rule.
-- map as build/foldr:
map f xs = build (\c n -> foldr (\x acc -> c (f x) acc) n xs)
-- After fusion with a consumer:
sum (map f xs) = foldr (\x acc -> f x + acc) 0 xs
Stream Fusion
Represents lists as unfold-based streams:
data Step s a = Done | Skip s | Yield a s
data Stream a = forall s. Stream (s -> Step s a) s
-- Operations defined on streams, not lists
mapS f (Stream step s) = Stream step' s
where step' s = case step s of
Yield a s' -> Yield (f a) s'
Skip s' -> Skip s'
Done -> Done
- Fusion is achieved by inlining: no intermediate streams are allocated
- Handles
zip,filter, and concatMap better than shortcut fusion - Used in the
vectorlibrary andData.Text
CPS Transformation
Converts direct-style programs to continuation-passing style, making control flow explicit.
Transformation Rules
CPS[[x]] = λk. k x
CPS[[λx. e]] = λk. k (λx. CPS[[e]])
CPS[[e1 e2]] = λk. CPS[[e1]] (λf. CPS[[e2]] (λa. f a k))
Every function takes an extra argument (the continuation). No function ever returns; instead it calls its continuation.
Properties
- All calls are tail calls (no stack growth, suitable for tail-call elimination)
- Control flow is explicit: exceptions, backtracking, coroutines become simple
- Intermediate representation in compilers: SML/NJ, early versions of GHC used CPS IR
- Double-barreled CPS: pass two continuations (success and failure) for exception handling
Administrative Redexes
The naive CPS transform introduces many trivial beta-redexes. Administrative normal form (ANF) or one-pass CPS avoids them.
ANF (A-normal form) is an alternative to CPS that names all intermediate results without making continuations explicit. Used by many modern compilers.
Closure Conversion
Makes the free variables of a function explicit by converting closures to data structures.
Before
let x = 5 in
let f = λy. x + y in
f 3
After
let x = 5 in
let f_code = λ(env, y). env.x + y in
let f = (f_code, {x = 5}) in -- closure = (code, environment)
f.code(f.env, 3)
Flat vs. Linked Closures
- Flat closures: copy all free variables into the closure record. O(1) variable access, O(n) closure creation
- Linked closures: share environment frames via pointers. O(1) closure creation, O(depth) variable access
- Hybrid approaches used in practice
In Compilation
Closure conversion is a key step in compiling functional languages to low-level targets (C, LLVM, machine code). After closure conversion, all functions are closed (no free variables), enabling standard calling conventions.
Defunctionalization
Replaces higher-order functions with first-order data types and an apply function.
Before (Higher-Order)
map f [] = []
map f (x:xs) = f x : map f xs
main = map (λx. x + 1) [1, 2, 3]
After (First-Order)
data Fun = Inc -- one constructor per lambda
apply Inc x = x + 1 -- dispatch table
map f [] = []
map f (x:xs) = apply f x : map f xs
main = map Inc [1, 2, 3]
Properties
- Each lambda in the program becomes a constructor of a sum type
- Free variables of the lambda become fields of the constructor
- The
applyfunction pattern-matches on the constructor to dispatch - Inverse of Reynolds' defunctionalization: refunctionalization recovers higher-order code from first-order + apply
Relationship to Abstract Machines
Defunctionalizing a continuation-passing interpreter yields an abstract machine. The CEK machine is the defunctionalized CPS interpreter for the lambda calculus (Danvy and Nielsen).
Lambda Lifting
Transforms local function definitions into top-level (global) definitions by passing free variables as extra parameters.
Before
let f x =
let g y = x + y in
g 3 + g 4
After
g' x y = x + y -- x is now an explicit parameter
f x = g' x 3 + g' x 4
Algorithm
- Compute free variables of each local function
- Add free variables as extra parameters
- Update all call sites to pass the extra arguments
- Move the function to the top level
Lambda Dropping (Inverse)
Moves top-level functions into local scope to reduce parameter passing. Useful when a function is only called within a specific scope.
Tradeoffs
- Lambda lifting eliminates closures but may increase parameter-passing overhead
- Closure conversion preserves calling conventions but requires heap allocation
- Choice depends on the target: lambda lifting suits targets without closures (C, hardware); closure conversion suits GC'd runtimes
Transformation Pipeline
A typical compiler applies these transformations in sequence:
Source program
-> Desugaring
-> Type checking
-> CPS / ANF conversion
-> Closure conversion (or lambda lifting)
-> Defunctionalization (for whole-program compilation)
-> Optimization (inlining, constant folding, dead code elimination)
-> Code generation
Each transformation makes some aspect explicit (control flow, environments, higher-order functions) while preserving semantics. The art of compiler construction lies in choosing the right transformations and their ordering.