3 min read
On this page

Semantic Analysis

Semantic analysis checks the meaning of a syntactically valid program — types, scopes, name resolution, and constraint validation. It bridges parsing and code generation.

Attribute Grammars

Associate attributes (values) with grammar symbols. Compute attribute values using semantic rules attached to productions.

Synthesized Attributes

Computed bottom-up from child attributes.

Production:        Semantic Rule:
E → E₁ + T        E.val = E₁.val + T.val
E → T             E.val = T.val
T → T₁ * F        T.val = T₁.val * F.val
T → F             T.val = F.val
F → (E)           F.val = E.val
F → NUM           F.val = NUM.value

Evaluate during a bottom-up parse — natural for LR parsers.

Inherited Attributes

Computed top-down — passed from parent/sibling to child.

Example: Type declarations. The declaration type is inherited by all variables in the declaration.

Decl → Type VarList     VarList.type = Type.type
VarList → Var , VarList₁  Var.type = VarList.type; VarList₁.type = VarList.type
VarList → Var            Var.type = VarList.type

Type Checking

Type Systems

Static (compile-time): Rust, C, Java. Errors caught before execution.

Dynamic (runtime): Python, Ruby, JavaScript. Errors caught during execution.

Strong: Rust, Python. No implicit unsafe conversions.

Weak: C, JavaScript. Implicit conversions between types.

Type Checking Rules

// Typing judgments: Γ ⊢ e : τ  ("In context Γ, expression e has type τ")

// Variables
Γ ⊢ x : τ     if (x : τ) ∈ Γ

// Function application
Γ ⊢ f : τ₁ → τ₂    Γ ⊢ a : τ₁
─────────────────────────────────
         Γ ⊢ f(a) : τ₂

// Binary operators
Γ ⊢ a : int    Γ ⊢ b : int
─────────────────────────────
     Γ ⊢ a + b : int

// If-else
Γ ⊢ c : bool    Γ ⊢ t : τ    Γ ⊢ f : τ
──────────────────────────────────────────
        Γ ⊢ if c then t else f : τ

Type Inference

Deduce types without explicit annotations.

x ← 42                    // inferred: integer
y ← [1, 2, 3]             // inferred: list of integers
z ← x + 1                 // inferred: integer
f ← (a, b) -> a + b       // inferred from usage

Hindley-Milner (Algorithm W): Complete type inference for polymorphic lambda calculus. Used by Haskell, ML, OCaml. Rust uses bidirectional type checking (local inference + explicit function signatures).

Type Equivalence

Structural equivalence: Types are equivalent if they have the same structure.

type A = { x: i32, y: f64 }
type B = { x: i32, y: f64 }
// Structurally equivalent (same fields, same types)

Nominal equivalence: Types are equivalent only if they have the same name.

TYPE Meters(value: number)
TYPE Seconds(value: number)
// Nominally different! Can't use Meters where Seconds expected.

Rust uses nominal equivalence for structs/enums, structural for tuples and closures (partially).

Type Coercion

Implicit conversion between types. Rust has limited coercions (e.g., &String&str, &Vec<T>&[T]). C has extensive coercions (int → float, char → int).

Widening: Safe (i32 → i64, int → float). Narrowing: Potentially lossy (i64 → i32, float → int). Rust requires explicit as for narrowing.

Symbol Tables

A symbol table maps identifiers to their attributes (type, scope, memory location).

Structure

RECORD Symbol
    name: string
    kind: Variable | Function | Type | Constant
    typ: Type
    scope_level: integer
    offset: integer or NIL   // stack offset for variables

CLASS SymbolTable
    FIELDS: scopes (list of dictionaries)

    PROCEDURE ENTER_SCOPE()   PUSH empty dictionary TO scopes
    PROCEDURE EXIT_SCOPE()    POP from scopes

    PROCEDURE DEFINE(name, sym)
        INSERT (name, sym) INTO last scope

    FUNCTION LOOKUP(name)
        // Search from innermost scope outward
        FOR EACH scope IN scopes (reversed)
            IF name IN scope
                RETURN scope[name]
        RETURN NIL

Scope Management

Block-structured scope: Each block ({...}) creates a new scope. Inner scopes can shadow outer variables.

x ← 1             // scope 0
BEGIN BLOCK
    x ← 2         // scope 1, shadows outer x
    y ← 3         // scope 1
    // x = 2, y = 3
END BLOCK
// x = 1, y not visible

Name Resolution

Resolve each identifier to its declaration. Check:

  1. Is the name declared? (Undeclared variable error)
  2. Is it declared in a visible scope? (Not in scope error)
  3. Does the usage match the declaration? (Using a type as a value, etc.)

Overload Resolution

When multiple functions have the same name but different signatures:

// Methods can be "overloaded" via different implementations
INTERFACE Display   FUNCTION FMT() -> string
IMPLEMENT Display FOR Integer   FUNCTION FMT()  RETURN TO_STRING(self)
IMPLEMENT Display FOR String    FUNCTION FMT()  RETURN COPY(self)
// Compiler resolves based on the receiver type

Resolution process: Find all candidates → filter by argument count → filter by type compatibility → select the most specific match (or report ambiguity).

Static vs Dynamic Semantics

Static semantics: Checked at compile time. Types, scope, declarations, lifetime analysis.

Dynamic semantics: Behavior at runtime. The meaning of executing a statement. Defined by the language specification (operational semantics) or the compiler's code generation.

Abstract Interpretation (Preview)

Approximate program behavior at compile time to detect errors.

Example: Range analysis — track possible values of variables.

x: byte ← 200
y: byte ← 100
z ← x + y   // Static analysis: x in [200,200], y in [100,100] -> z in [300,300] -> OVERFLOW!

Rust's const evaluation and bounds checking use forms of abstract interpretation. Full treatment in program analysis topic.

Semantic Errors

| Error | Example | |---|---| | Undeclared variable | x + 1 where x is not declared | | Type mismatch | "hello" + 42 (string + integer) | | Wrong number of arguments | f(1, 2) when f takes 3 args | | Return type mismatch | Function declared to return i32, returns String | | Duplicate declaration | Two variables with the same name in the same scope | | Use before initialization | Read variable before assigning a value | | Borrow checker violation | Mutable borrow while shared borrow exists (Rust-specific) | | Lifetime violation | Reference outlives the data it points to (Rust-specific) |

Applications in CS

  • IDE features: Go-to-definition, find-all-references, rename refactoring — all require semantic analysis (name resolution, symbol tables).
  • Error messages: Good type error messages require understanding what the programmer likely intended. Rust's compiler errors are a model of clarity.
  • Language servers: LSP (Language Server Protocol) implementations use semantic analysis for autocomplete, diagnostics, and hover information.
  • Type-directed code generation: Types guide optimization decisions (monomorphization, vtable dispatch).