Program Transformation

Overview

Program transformations systematically rewrite programs to improve performance, change representation, or enable further analysis. They preserve semantics while modifying structure.

Partial Evaluation

Specializes a program with respect to known (static) inputs, producing a residual program over the remaining (dynamic) inputs.

Given prog(static_input, dynamic_input), a partial evaluator produces prog_specialized(dynamic_input) that is equivalent but faster because static computations are performed at specialization time.

power(n, x) = if n == 0 then 1 else x * power(n-1, x)

-- Specialize for n=3:
power_3(x) = x * x * x * 1    -- after constant folding: x * x * x

Binding-Time Analysis

Classifies expressions as static (known at specialization time) or dynamic (known only at runtime). Guides what to evaluate vs. what to residualize.

Monovariant: each expression gets one classification
Polyvariant: same expression may be classified differently in different contexts

The Futamura Projections

Three remarkable applications of partial evaluation, discovered by Futamura (1971).

First projection: Specializing an interpreter with respect to a source program yields a compiled program.

pe(interpreter, source) = target_program

The interpreter "runs" at specialization time; only dynamic operations remain.

Second projection: Specializing a partial evaluator with respect to an interpreter yields a compiler.

pe(pe, interpreter) = compiler

Third projection: Specializing a partial evaluator with respect to itself yields a compiler generator.

pe(pe, pe) = compiler_generator
compiler_generator(interpreter) = compiler

Practical Challenges

Termination: partial evaluation of recursive programs may not terminate; requires widening or memoization (polyvariant specialization)
Code explosion: unfolding loops/recursion can produce exponentially large residual programs
Online vs. offline: online PE decides what to specialize during specialization; offline PE uses a separate binding-time analysis phase

Supercompilation

A more aggressive program transformation that drives evaluation symbolically, tracking all possible execution paths.

Process

Drive: symbolically evaluate the program, building a process tree
Generalize: when the tree grows too large or a repeated pattern is detected, generalize (abstract) the state
Fold: identify when a state is a renaming of a previously seen state; create a loop

Key Operations

Driving: unfolds function calls and case-splits on unknown values
Generalization (most specific generalization / anti-unification): finds a common pattern between two expressions
Homeomorphic embedding: termination criterion. If a new expression embeds a previous one, generalize to prevent infinite unfolding

Compared to Partial Evaluation

Supercompilation considers all branches (not just the static path)
Can discover optimizations that partial evaluation misses (e.g., deforestation emerges naturally)
More expensive; harder to control termination and code size
Implementations: SPSC, HOSC (Haskell), Distillation

Deforestation

Eliminates intermediate data structures in compositions of producers and consumers.

The Problem

sum (map (+1) (filter even xs))

Naive execution builds two intermediate lists: one from filter, one from map. Deforestation fuses these into a single traversal.

Wadler's Deforestation

Applies to treeless programs (no intermediate constructors). Transform compositions so producers and consumers are fused.

Shortcut Fusion (foldr/build)

Based on the foldr/build framework:

build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []

-- Fusion law:
foldr k z (build g) = g k z

Producers use build; consumers use foldr. GHC applies this as a rewrite rule.

-- map as build/foldr:
map f xs = build (\c n -> foldr (\x acc -> c (f x) acc) n xs)

-- After fusion with a consumer:
sum (map f xs) = foldr (\x acc -> f x + acc) 0 xs

Stream Fusion

Represents lists as unfold-based streams:

data Step s a = Done | Skip s | Yield a s
data Stream a = forall s. Stream (s -> Step s a) s

-- Operations defined on streams, not lists
mapS f (Stream step s) = Stream step' s
  where step' s = case step s of
                    Yield a s' -> Yield (f a) s'
                    Skip s'    -> Skip s'
                    Done       -> Done

Fusion is achieved by inlining: no intermediate streams are allocated
Handles zip, filter, and concatMap better than shortcut fusion
Used in the vector library and Data.Text

CPS Transformation

Converts direct-style programs to continuation-passing style, making control flow explicit.

Transformation Rules

CPS[[x]]         = λk. k x
CPS[[λx. e]]     = λk. k (λx. CPS[[e]])
CPS[[e1 e2]]     = λk. CPS[[e1]] (λf. CPS[[e2]] (λa. f a k))

Every function takes an extra argument (the continuation). No function ever returns; instead it calls its continuation.

Properties

All calls are tail calls (no stack growth, suitable for tail-call elimination)
Control flow is explicit: exceptions, backtracking, coroutines become simple
Intermediate representation in compilers: SML/NJ, early versions of GHC used CPS IR
Double-barreled CPS: pass two continuations (success and failure) for exception handling

Administrative Redexes

The naive CPS transform introduces many trivial beta-redexes. Administrative normal form (ANF) or one-pass CPS avoids them.

ANF (A-normal form) is an alternative to CPS that names all intermediate results without making continuations explicit. Used by many modern compilers.

Closure Conversion

Makes the free variables of a function explicit by converting closures to data structures.

Before

let x = 5 in
let f = λy. x + y in
f 3

After

let x = 5 in
let f_code = λ(env, y). env.x + y in
let f = (f_code, {x = 5}) in         -- closure = (code, environment)
f.code(f.env, 3)

Flat vs. Linked Closures

Flat closures: copy all free variables into the closure record. O(1) variable access, O(n) closure creation
Linked closures: share environment frames via pointers. O(1) closure creation, O(depth) variable access
Hybrid approaches used in practice

In Compilation

Closure conversion is a key step in compiling functional languages to low-level targets (C, LLVM, machine code). After closure conversion, all functions are closed (no free variables), enabling standard calling conventions.

Defunctionalization

Replaces higher-order functions with first-order data types and an apply function.

Before (Higher-Order)

map f []     = []
map f (x:xs) = f x : map f xs

main = map (λx. x + 1) [1, 2, 3]

After (First-Order)

data Fun = Inc              -- one constructor per lambda

apply Inc x = x + 1         -- dispatch table

map f []     = []
map f (x:xs) = apply f x : map f xs

main = map Inc [1, 2, 3]

Properties

Each lambda in the program becomes a constructor of a sum type
Free variables of the lambda become fields of the constructor
The apply function pattern-matches on the constructor to dispatch
Inverse of Reynolds' defunctionalization: refunctionalization recovers higher-order code from first-order + apply

Relationship to Abstract Machines

Defunctionalizing a continuation-passing interpreter yields an abstract machine. The CEK machine is the defunctionalized CPS interpreter for the lambda calculus (Danvy and Nielsen).

Lambda Lifting

Transforms local function definitions into top-level (global) definitions by passing free variables as extra parameters.

Before

let f x =
  let g y = x + y in
  g 3 + g 4

After

g' x y = x + y         -- x is now an explicit parameter

f x = g' x 3 + g' x 4

Algorithm

Compute free variables of each local function
Add free variables as extra parameters
Update all call sites to pass the extra arguments
Move the function to the top level

Lambda Dropping (Inverse)

Moves top-level functions into local scope to reduce parameter passing. Useful when a function is only called within a specific scope.

Tradeoffs

Lambda lifting eliminates closures but may increase parameter-passing overhead
Closure conversion preserves calling conventions but requires heap allocation
Choice depends on the target: lambda lifting suits targets without closures (C, hardware); closure conversion suits GC'd runtimes

Transformation Pipeline

A typical compiler applies these transformations in sequence:

Source program
  -> Desugaring
  -> Type checking
  -> CPS / ANF conversion
  -> Closure conversion (or lambda lifting)
  -> Defunctionalization (for whole-program compilation)
  -> Optimization (inlining, constant folding, dead code elimination)
  -> Code generation

Each transformation makes some aspect explicit (control flow, environments, higher-order functions) while preserving semantics. The art of compiler construction lies in choosing the right transformations and their ordering.