5 min read
On this page

Language Design

Syntax Design

Concrete vs. Abstract Syntax

  • Concrete syntax: what programmers write (surface syntax with precedence, associativity, sugar)
  • Abstract syntax: tree representation used by the compiler (AST). Strips away syntactic sugar and ambiguity
  • Abstract binding trees (ABTs): extend ASTs with explicit binding structure; used in formal PL theory

Design Considerations

  • Readability vs. writability tradeoff (Python favors readability; APL favors writability)
  • Regularity: few special cases, uniform rules (Lisp's S-expressions are maximally regular)
  • Syntactic sugar: convenient surface forms that desugar to core constructs. let x = e1 in e2 desugars to (λx. e2) e1
  • Precedence and fixity: determines parsing of a + b * c. Configurable in Haskell, Agda
  • Delimiter style: significant whitespace (Python, Haskell) vs. braces (C family) vs. keywords (Pascal, Ruby)

Parsing Considerations

  • Most languages target LL(k) or LR(1) grammars for efficient parsing
  • PEG grammars eliminate ambiguity by ordered choice but cannot express all CFLs
  • Layout rules (offside rule): whitespace-sensitive parsing in Haskell, Python. Formally described as a lexer post-pass

Binding and Scope

Lexical (Static) Scoping

A variable refers to the binding in the nearest enclosing scope in the source text. Determined at compile time.

let x = 10 in
  let f = λy. x + y in
    let x = 20 in
      f 5    -- returns 15 (captures x=10)
  • Closures capture the environment at definition time
  • Standard in nearly all modern languages
  • Enables static analysis, optimization, and equational reasoning

Dynamic Scoping

A variable refers to the most recent binding on the call stack at runtime.

(defvar x 10)
(defun f (y) (+ x y))
(let ((x 20)) (f 5))   ;; returns 25 in dynamic scoping
  • Used historically in early Lisps, still available in Emacs Lisp
  • Useful for implicit parameters (thread-local state, configuration)
  • Common Lisp's special variables use dynamic scoping explicitly
  • Harder to reason about; inhibits optimization

Scope and Binding Mechanisms

  • let vs letrec: non-recursive vs. recursive binding. letrec allows mutual recursion
  • Shadowing: inner bindings hide outer ones with the same name
  • Hygienic macros: maintain lexical scoping guarantees during expansion (Scheme's syntax-rules, Rust's macro system)
  • Name resolution in modules: qualified vs. unqualified names, import/export lists

Control Flow

Exceptions

Structured non-local exit for error handling.

try { riskyOp() }
catch (e: IOException) { handleIO(e) }
finally { cleanup() }
  • Semantically: throw unwinds the stack to the nearest matching catch
  • Typed exceptions (Java) vs. untyped (Python, JS) vs. algebraic effects (as generalization)
  • Effect on reasoning: exceptions break equational reasoning since any expression may diverge laterally
  • Performance: zero-cost exceptions (table-based, no overhead on the happy path; costly on throw) vs. setjmp/longjmp

Continuations

A continuation represents "the rest of the computation" from a given point.

  • First-class continuations: reify the continuation as a value that can be stored and invoked
  • CPS (continuation-passing style): every function takes an extra argument representing its continuation. Makes control flow explicit

call/cc (Call with Current Continuation)

call/cc : ((α -> β) -> α) -> α

Captures the current continuation k and passes it to the given function. Invoking k abandons the current computation and returns to the capture point.

  • Can implement exceptions, coroutines, backtracking, green threads
  • Composability problems: captured continuation includes the entire rest of the program (undelimited)
  • Present in Scheme, SML/NJ

Delimited Continuations

Capture only a slice of the continuation, bounded by a delimiter (prompt).

reset { ... shift k { body } ... }
  • shift captures the continuation up to the enclosing reset (not the whole program)
  • The captured continuation is a regular function (composable)
  • Strictly more expressive than call/cc while being more tractable
  • Operators: shift/reset, control/prompt, shift0/reset0
  • Foundation for algebraic effects and effect handlers

Algebraic Effects (Brief)

Generalize exceptions and delimited continuations. Operations are declared, and handlers define their semantics:

effect Ask : unit -> int
handle (perform Ask()) with
| val x -> x
| Ask () k -> k 42

The handler receives the continuation k and decides how to resume.


Module Systems

Basic Modules

Group related definitions; provide namespace management and encapsulation.

module Stack = struct
  type 'a t = 'a list
  let empty = []
  let push x s = x :: s
  let pop = function x :: s -> (x, s) | [] -> failwith "empty"
end

Signatures (Interfaces)

Specify what a module exposes:

module type STACK = sig
  type 'a t
  val empty : 'a t
  val push : 'a -> 'a t -> 'a t
  val pop : 'a t -> 'a * 'a t
end

Abstract types (type 'a t without definition) enforce encapsulation.

ML Functors

Functions from modules to modules. Parameterize implementations over interfaces.

module MakeSet (Ord : ORDERED) : SET with type elem = Ord.t = struct
  type elem = Ord.t
  type t = elem list
  let member x s = List.exists (fun y -> Ord.compare x y = 0) s
  ...
end
  • First-class modules (OCaml): modules as values, enabling runtime module selection
  • Applicative vs. generative functors: applicative functors produce compatible types when applied to the same argument; generative functors always produce fresh types
  • Sharing constraints: with type t = ... ensures type compatibility across module boundaries

Module Systems in Other Languages

  • Haskell: type classes serve some roles of modules; no first-class module system
  • Rust: traits + modules (crates). Traits provide bounded polymorphism
  • Scala: objects as modules, path-dependent types
  • 1ML (Rossberg): unifies modules and core language; modules are first-class values

Overloading Resolution

Ad-hoc Polymorphism

The same name refers to different implementations based on types.

Type Classes (Haskell)

class Eq a where
  (==) :: a -> a -> Bool

instance Eq Int where
  x == y = primEqInt x y

instance Eq a => Eq [a] where
  []     == []     = True
  (x:xs) == (y:ys) = x == y && xs == ys
  _      == _      = False
  • Resolved at compile time via dictionary passing
  • Coherence: at most one instance per type (global uniqueness)
  • Superclasses, multi-parameter type classes, functional dependencies, associated types

Other Approaches

  • C++ templates: resolved at instantiation (monomorphization); SFINAE for conditional overloading
  • Java/C# overloading: resolved by static argument types; generics add bounded polymorphism
  • Rust traits: like type classes with coherence (orphan rules); monomorphized
  • Multiple dispatch (Julia, CLOS): resolved at runtime based on all argument types
  • Implicits (Scala): compiler searches for values of the required type in scope

Resolution Complexity

  • Haskell type class resolution is essentially a logic program; can diverge with certain extensions
  • Instance search with overlapping instances, backtracking, or higher-order unification is undecidable in general
  • Practical systems impose restrictions (no overlapping instances, termination checks) to keep resolution predictable

Design Tradeoffs Summary

| Dimension | Spectrum | |---|---| | Typing | Static <-> Gradual <-> Dynamic | | Evaluation | Strict <-> Lazy | | Scoping | Lexical <-> Dynamic | | Mutability | Immutable-first <-> Mutable-first | | Effects | Pure + effect tracking <-> Unrestricted | | Modules | Structural <-> Nominal | | Dispatch | Static <-> Dynamic (single/multiple) |

Language design is the art of choosing coherent points in this multi-dimensional space, guided by the intended domain, user community, and implementation constraints.