Semantic Analysis
Semantic analysis checks the meaning of a syntactically valid program — types, scopes, name resolution, and constraint validation. It bridges parsing and code generation.
Attribute Grammars
Associate attributes (values) with grammar symbols. Compute attribute values using semantic rules attached to productions.
Synthesized Attributes
Computed bottom-up from child attributes.
Production: Semantic Rule:
E → E₁ + T E.val = E₁.val + T.val
E → T E.val = T.val
T → T₁ * F T.val = T₁.val * F.val
T → F T.val = F.val
F → (E) F.val = E.val
F → NUM F.val = NUM.value
Evaluate during a bottom-up parse — natural for LR parsers.
Inherited Attributes
Computed top-down — passed from parent/sibling to child.
Example: Type declarations. The declaration type is inherited by all variables in the declaration.
Decl → Type VarList VarList.type = Type.type
VarList → Var , VarList₁ Var.type = VarList.type; VarList₁.type = VarList.type
VarList → Var Var.type = VarList.type
Type Checking
Type Systems
Static (compile-time): Rust, C, Java. Errors caught before execution.
Dynamic (runtime): Python, Ruby, JavaScript. Errors caught during execution.
Strong: Rust, Python. No implicit unsafe conversions.
Weak: C, JavaScript. Implicit conversions between types.
Type Checking Rules
// Typing judgments: Γ ⊢ e : τ ("In context Γ, expression e has type τ")
// Variables
Γ ⊢ x : τ if (x : τ) ∈ Γ
// Function application
Γ ⊢ f : τ₁ → τ₂ Γ ⊢ a : τ₁
─────────────────────────────────
Γ ⊢ f(a) : τ₂
// Binary operators
Γ ⊢ a : int Γ ⊢ b : int
─────────────────────────────
Γ ⊢ a + b : int
// If-else
Γ ⊢ c : bool Γ ⊢ t : τ Γ ⊢ f : τ
──────────────────────────────────────────
Γ ⊢ if c then t else f : τ
Type Inference
Deduce types without explicit annotations.
x ← 42 // inferred: integer
y ← [1, 2, 3] // inferred: list of integers
z ← x + 1 // inferred: integer
f ← (a, b) -> a + b // inferred from usage
Hindley-Milner (Algorithm W): Complete type inference for polymorphic lambda calculus. Used by Haskell, ML, OCaml. Rust uses bidirectional type checking (local inference + explicit function signatures).
Type Equivalence
Structural equivalence: Types are equivalent if they have the same structure.
type A = { x: i32, y: f64 }
type B = { x: i32, y: f64 }
// Structurally equivalent (same fields, same types)
Nominal equivalence: Types are equivalent only if they have the same name.
TYPE Meters(value: number)
TYPE Seconds(value: number)
// Nominally different! Can't use Meters where Seconds expected.
Rust uses nominal equivalence for structs/enums, structural for tuples and closures (partially).
Type Coercion
Implicit conversion between types. Rust has limited coercions (e.g., &String → &str, &Vec<T> → &[T]). C has extensive coercions (int → float, char → int).
Widening: Safe (i32 → i64, int → float). Narrowing: Potentially lossy (i64 → i32, float → int). Rust requires explicit as for narrowing.
Symbol Tables
A symbol table maps identifiers to their attributes (type, scope, memory location).
Structure
RECORD Symbol
name: string
kind: Variable | Function | Type | Constant
typ: Type
scope_level: integer
offset: integer or NIL // stack offset for variables
CLASS SymbolTable
FIELDS: scopes (list of dictionaries)
PROCEDURE ENTER_SCOPE() PUSH empty dictionary TO scopes
PROCEDURE EXIT_SCOPE() POP from scopes
PROCEDURE DEFINE(name, sym)
INSERT (name, sym) INTO last scope
FUNCTION LOOKUP(name)
// Search from innermost scope outward
FOR EACH scope IN scopes (reversed)
IF name IN scope
RETURN scope[name]
RETURN NIL
Scope Management
Block-structured scope: Each block ({...}) creates a new scope. Inner scopes can shadow outer variables.
x ← 1 // scope 0
BEGIN BLOCK
x ← 2 // scope 1, shadows outer x
y ← 3 // scope 1
// x = 2, y = 3
END BLOCK
// x = 1, y not visible
Name Resolution
Resolve each identifier to its declaration. Check:
- Is the name declared? (Undeclared variable error)
- Is it declared in a visible scope? (Not in scope error)
- Does the usage match the declaration? (Using a type as a value, etc.)
Overload Resolution
When multiple functions have the same name but different signatures:
// Methods can be "overloaded" via different implementations
INTERFACE Display FUNCTION FMT() -> string
IMPLEMENT Display FOR Integer FUNCTION FMT() RETURN TO_STRING(self)
IMPLEMENT Display FOR String FUNCTION FMT() RETURN COPY(self)
// Compiler resolves based on the receiver type
Resolution process: Find all candidates → filter by argument count → filter by type compatibility → select the most specific match (or report ambiguity).
Static vs Dynamic Semantics
Static semantics: Checked at compile time. Types, scope, declarations, lifetime analysis.
Dynamic semantics: Behavior at runtime. The meaning of executing a statement. Defined by the language specification (operational semantics) or the compiler's code generation.
Abstract Interpretation (Preview)
Approximate program behavior at compile time to detect errors.
Example: Range analysis — track possible values of variables.
x: byte ← 200
y: byte ← 100
z ← x + y // Static analysis: x in [200,200], y in [100,100] -> z in [300,300] -> OVERFLOW!
Rust's const evaluation and bounds checking use forms of abstract interpretation. Full treatment in program analysis topic.
Semantic Errors
| Error | Example |
|---|---|
| Undeclared variable | x + 1 where x is not declared |
| Type mismatch | "hello" + 42 (string + integer) |
| Wrong number of arguments | f(1, 2) when f takes 3 args |
| Return type mismatch | Function declared to return i32, returns String |
| Duplicate declaration | Two variables with the same name in the same scope |
| Use before initialization | Read variable before assigning a value |
| Borrow checker violation | Mutable borrow while shared borrow exists (Rust-specific) |
| Lifetime violation | Reference outlives the data it points to (Rust-specific) |
Applications in CS
- IDE features: Go-to-definition, find-all-references, rename refactoring — all require semantic analysis (name resolution, symbol tables).
- Error messages: Good type error messages require understanding what the programmer likely intended. Rust's compiler errors are a model of clarity.
- Language servers: LSP (Language Server Protocol) implementations use semantic analysis for autocomplete, diagnostics, and hover information.
- Type-directed code generation: Types guide optimization decisions (monomorphization, vtable dispatch).