Language Design

Syntax Design

Concrete vs. Abstract Syntax

Concrete syntax: what programmers write (surface syntax with precedence, associativity, sugar)
Abstract syntax: tree representation used by the compiler (AST). Strips away syntactic sugar and ambiguity
Abstract binding trees (ABTs): extend ASTs with explicit binding structure; used in formal PL theory

Design Considerations

Readability vs. writability tradeoff (Python favors readability; APL favors writability)
Regularity: few special cases, uniform rules (Lisp's S-expressions are maximally regular)
Syntactic sugar: convenient surface forms that desugar to core constructs. let x = e1 in e2 desugars to (λx. e2) e1
Precedence and fixity: determines parsing of a + b * c. Configurable in Haskell, Agda
Delimiter style: significant whitespace (Python, Haskell) vs. braces (C family) vs. keywords (Pascal, Ruby)

Parsing Considerations

Most languages target LL(k) or LR(1) grammars for efficient parsing
PEG grammars eliminate ambiguity by ordered choice but cannot express all CFLs
Layout rules (offside rule): whitespace-sensitive parsing in Haskell, Python. Formally described as a lexer post-pass

Binding and Scope

Lexical (Static) Scoping

A variable refers to the binding in the nearest enclosing scope in the source text. Determined at compile time.

let x = 10 in
  let f = λy. x + y in
    let x = 20 in
      f 5    -- returns 15 (captures x=10)

Closures capture the environment at definition time
Standard in nearly all modern languages
Enables static analysis, optimization, and equational reasoning

Dynamic Scoping

A variable refers to the most recent binding on the call stack at runtime.

(defvar x 10)
(defun f (y) (+ x y))
(let ((x 20)) (f 5))   ;; returns 25 in dynamic scoping

Used historically in early Lisps, still available in Emacs Lisp
Useful for implicit parameters (thread-local state, configuration)
Common Lisp's special variables use dynamic scoping explicitly
Harder to reason about; inhibits optimization

Scope and Binding Mechanisms

let vs letrec: non-recursive vs. recursive binding. letrec allows mutual recursion
Shadowing: inner bindings hide outer ones with the same name
Hygienic macros: maintain lexical scoping guarantees during expansion (Scheme's syntax-rules, Rust's macro system)
Name resolution in modules: qualified vs. unqualified names, import/export lists

Control Flow

Exceptions

Structured non-local exit for error handling.

try { riskyOp() }
catch (e: IOException) { handleIO(e) }
finally { cleanup() }

Semantically: throw unwinds the stack to the nearest matching catch
Typed exceptions (Java) vs. untyped (Python, JS) vs. algebraic effects (as generalization)
Effect on reasoning: exceptions break equational reasoning since any expression may diverge laterally
Performance: zero-cost exceptions (table-based, no overhead on the happy path; costly on throw) vs. setjmp/longjmp

Continuations

A continuation represents "the rest of the computation" from a given point.

First-class continuations: reify the continuation as a value that can be stored and invoked
CPS (continuation-passing style): every function takes an extra argument representing its continuation. Makes control flow explicit

call/cc (Call with Current Continuation)

call/cc : ((α -> β) -> α) -> α

Captures the current continuation k and passes it to the given function. Invoking k abandons the current computation and returns to the capture point.

Can implement exceptions, coroutines, backtracking, green threads
Composability problems: captured continuation includes the entire rest of the program (undelimited)
Present in Scheme, SML/NJ

Delimited Continuations

Capture only a slice of the continuation, bounded by a delimiter (prompt).

reset { ... shift k { body } ... }

shift captures the continuation up to the enclosing reset (not the whole program)
The captured continuation is a regular function (composable)
Strictly more expressive than call/cc while being more tractable
Operators: shift/reset, control/prompt, shift0/reset0
Foundation for algebraic effects and effect handlers

Algebraic Effects (Brief)

Generalize exceptions and delimited continuations. Operations are declared, and handlers define their semantics:

effect Ask : unit -> int
handle (perform Ask()) with
| val x -> x
| Ask () k -> k 42

The handler receives the continuation k and decides how to resume.

Module Systems

Basic Modules

Group related definitions; provide namespace management and encapsulation.

module Stack = struct
  type 'a t = 'a list
  let empty = []
  let push x s = x :: s
  let pop = function x :: s -> (x, s) | [] -> failwith "empty"
end

Signatures (Interfaces)

Specify what a module exposes:

module type STACK = sig
  type 'a t
  val empty : 'a t
  val push : 'a -> 'a t -> 'a t
  val pop : 'a t -> 'a * 'a t
end

Abstract types (type 'a t without definition) enforce encapsulation.

ML Functors

Functions from modules to modules. Parameterize implementations over interfaces.

module MakeSet (Ord : ORDERED) : SET with type elem = Ord.t = struct
  type elem = Ord.t
  type t = elem list
  let member x s = List.exists (fun y -> Ord.compare x y = 0) s
  ...
end

First-class modules (OCaml): modules as values, enabling runtime module selection
Applicative vs. generative functors: applicative functors produce compatible types when applied to the same argument; generative functors always produce fresh types
Sharing constraints: with type t = ... ensures type compatibility across module boundaries

Module Systems in Other Languages

Haskell: type classes serve some roles of modules; no first-class module system
Rust: traits + modules (crates). Traits provide bounded polymorphism
Scala: objects as modules, path-dependent types
1ML (Rossberg): unifies modules and core language; modules are first-class values

Overloading Resolution

Ad-hoc Polymorphism

The same name refers to different implementations based on types.

Type Classes (Haskell)

class Eq a where
  (==) :: a -> a -> Bool

instance Eq Int where
  x == y = primEqInt x y

instance Eq a => Eq [a] where
  []     == []     = True
  (x:xs) == (y:ys) = x == y && xs == ys
  _      == _      = False

Resolved at compile time via dictionary passing
Coherence: at most one instance per type (global uniqueness)
Superclasses, multi-parameter type classes, functional dependencies, associated types

Other Approaches

C++ templates: resolved at instantiation (monomorphization); SFINAE for conditional overloading
Java/C# overloading: resolved by static argument types; generics add bounded polymorphism
Rust traits: like type classes with coherence (orphan rules); monomorphized
Multiple dispatch (Julia, CLOS): resolved at runtime based on all argument types
Implicits (Scala): compiler searches for values of the required type in scope

Resolution Complexity

Haskell type class resolution is essentially a logic program; can diverge with certain extensions
Instance search with overlapping instances, backtracking, or higher-order unification is undecidable in general
Practical systems impose restrictions (no overlapping instances, termination checks) to keep resolution predictable

Design Tradeoffs Summary

Dimension	Spectrum
Typing	Static <-> Gradual <-> Dynamic
Evaluation	Strict <-> Lazy
Scoping	Lexical <-> Dynamic
Mutability	Immutable-first <-> Mutable-first
Effects	Pure + effect tracking <-> Unrestricted
Modules	Structural <-> Nominal
Dispatch	Static <-> Dynamic (single/multiple)

Language design is the art of choosing coherent points in this multi-dimensional space, guided by the intended domain, user community, and implementation constraints.