Language Design
Syntax Design
Concrete vs. Abstract Syntax
- Concrete syntax: what programmers write (surface syntax with precedence, associativity, sugar)
- Abstract syntax: tree representation used by the compiler (AST). Strips away syntactic sugar and ambiguity
- Abstract binding trees (ABTs): extend ASTs with explicit binding structure; used in formal PL theory
Design Considerations
- Readability vs. writability tradeoff (Python favors readability; APL favors writability)
- Regularity: few special cases, uniform rules (Lisp's S-expressions are maximally regular)
- Syntactic sugar: convenient surface forms that desugar to core constructs.
let x = e1 in e2desugars to(λx. e2) e1 - Precedence and fixity: determines parsing of
a + b * c. Configurable in Haskell, Agda - Delimiter style: significant whitespace (Python, Haskell) vs. braces (C family) vs. keywords (Pascal, Ruby)
Parsing Considerations
- Most languages target LL(k) or LR(1) grammars for efficient parsing
- PEG grammars eliminate ambiguity by ordered choice but cannot express all CFLs
- Layout rules (offside rule): whitespace-sensitive parsing in Haskell, Python. Formally described as a lexer post-pass
Binding and Scope
Lexical (Static) Scoping
A variable refers to the binding in the nearest enclosing scope in the source text. Determined at compile time.
let x = 10 in
let f = λy. x + y in
let x = 20 in
f 5 -- returns 15 (captures x=10)
- Closures capture the environment at definition time
- Standard in nearly all modern languages
- Enables static analysis, optimization, and equational reasoning
Dynamic Scoping
A variable refers to the most recent binding on the call stack at runtime.
(defvar x 10)
(defun f (y) (+ x y))
(let ((x 20)) (f 5)) ;; returns 25 in dynamic scoping
- Used historically in early Lisps, still available in Emacs Lisp
- Useful for implicit parameters (thread-local state, configuration)
- Common Lisp's special variables use dynamic scoping explicitly
- Harder to reason about; inhibits optimization
Scope and Binding Mechanisms
letvsletrec: non-recursive vs. recursive binding.letrecallows mutual recursion- Shadowing: inner bindings hide outer ones with the same name
- Hygienic macros: maintain lexical scoping guarantees during expansion (Scheme's
syntax-rules, Rust's macro system) - Name resolution in modules: qualified vs. unqualified names, import/export lists
Control Flow
Exceptions
Structured non-local exit for error handling.
try { riskyOp() }
catch (e: IOException) { handleIO(e) }
finally { cleanup() }
- Semantically:
throwunwinds the stack to the nearest matchingcatch - Typed exceptions (Java) vs. untyped (Python, JS) vs. algebraic effects (as generalization)
- Effect on reasoning: exceptions break equational reasoning since any expression may diverge laterally
- Performance: zero-cost exceptions (table-based, no overhead on the happy path; costly on throw) vs. setjmp/longjmp
Continuations
A continuation represents "the rest of the computation" from a given point.
- First-class continuations: reify the continuation as a value that can be stored and invoked
- CPS (continuation-passing style): every function takes an extra argument representing its continuation. Makes control flow explicit
call/cc (Call with Current Continuation)
call/cc : ((α -> β) -> α) -> α
Captures the current continuation k and passes it to the given function. Invoking k abandons the current computation and returns to the capture point.
- Can implement exceptions, coroutines, backtracking, green threads
- Composability problems: captured continuation includes the entire rest of the program (undelimited)
- Present in Scheme, SML/NJ
Delimited Continuations
Capture only a slice of the continuation, bounded by a delimiter (prompt).
reset { ... shift k { body } ... }
shiftcaptures the continuation up to the enclosingreset(not the whole program)- The captured continuation is a regular function (composable)
- Strictly more expressive than call/cc while being more tractable
- Operators:
shift/reset,control/prompt,shift0/reset0 - Foundation for algebraic effects and effect handlers
Algebraic Effects (Brief)
Generalize exceptions and delimited continuations. Operations are declared, and handlers define their semantics:
effect Ask : unit -> int
handle (perform Ask()) with
| val x -> x
| Ask () k -> k 42
The handler receives the continuation k and decides how to resume.
Module Systems
Basic Modules
Group related definitions; provide namespace management and encapsulation.
module Stack = struct
type 'a t = 'a list
let empty = []
let push x s = x :: s
let pop = function x :: s -> (x, s) | [] -> failwith "empty"
end
Signatures (Interfaces)
Specify what a module exposes:
module type STACK = sig
type 'a t
val empty : 'a t
val push : 'a -> 'a t -> 'a t
val pop : 'a t -> 'a * 'a t
end
Abstract types (type 'a t without definition) enforce encapsulation.
ML Functors
Functions from modules to modules. Parameterize implementations over interfaces.
module MakeSet (Ord : ORDERED) : SET with type elem = Ord.t = struct
type elem = Ord.t
type t = elem list
let member x s = List.exists (fun y -> Ord.compare x y = 0) s
...
end
- First-class modules (OCaml): modules as values, enabling runtime module selection
- Applicative vs. generative functors: applicative functors produce compatible types when applied to the same argument; generative functors always produce fresh types
- Sharing constraints:
with type t = ...ensures type compatibility across module boundaries
Module Systems in Other Languages
- Haskell: type classes serve some roles of modules; no first-class module system
- Rust: traits + modules (crates). Traits provide bounded polymorphism
- Scala: objects as modules, path-dependent types
- 1ML (Rossberg): unifies modules and core language; modules are first-class values
Overloading Resolution
Ad-hoc Polymorphism
The same name refers to different implementations based on types.
Type Classes (Haskell)
class Eq a where
(==) :: a -> a -> Bool
instance Eq Int where
x == y = primEqInt x y
instance Eq a => Eq [a] where
[] == [] = True
(x:xs) == (y:ys) = x == y && xs == ys
_ == _ = False
- Resolved at compile time via dictionary passing
- Coherence: at most one instance per type (global uniqueness)
- Superclasses, multi-parameter type classes, functional dependencies, associated types
Other Approaches
- C++ templates: resolved at instantiation (monomorphization); SFINAE for conditional overloading
- Java/C# overloading: resolved by static argument types; generics add bounded polymorphism
- Rust traits: like type classes with coherence (orphan rules); monomorphized
- Multiple dispatch (Julia, CLOS): resolved at runtime based on all argument types
- Implicits (Scala): compiler searches for values of the required type in scope
Resolution Complexity
- Haskell type class resolution is essentially a logic program; can diverge with certain extensions
- Instance search with overlapping instances, backtracking, or higher-order unification is undecidable in general
- Practical systems impose restrictions (no overlapping instances, termination checks) to keep resolution predictable
Design Tradeoffs Summary
| Dimension | Spectrum | |---|---| | Typing | Static <-> Gradual <-> Dynamic | | Evaluation | Strict <-> Lazy | | Scoping | Lexical <-> Dynamic | | Mutability | Immutable-first <-> Mutable-first | | Effects | Pure + effect tracking <-> Unrestricted | | Modules | Structural <-> Nominal | | Dispatch | Static <-> Dynamic (single/multiple) |
Language design is the art of choosing coherent points in this multi-dimensional space, guided by the intended domain, user community, and implementation constraints.