Probabilistic Reasoning

Probabilistic reasoning handles uncertainty — the real world is noisy, partially observable, and unpredictable. Bayesian networks and related models provide principled frameworks.

Bayesian Networks

A Bayesian network (belief network) is a directed acyclic graph (DAG) where:

Nodes represent random variables.
Edges represent direct dependencies.
Each node has a conditional probability table (CPT) given its parents.

         Burglary    Earthquake
            \          /
             Alarm
            /     \
        JohnCalls  MaryCalls

Joint probability (chain rule using the DAG structure):

P(B,E,A,J,M) = P(B)·P(E)·P(A|B,E)·P(J|A)·P(M|A)

The DAG encodes conditional independencies: given the alarm state, JohnCalls and MaryCalls are independent.

d-Separation

Determines conditional independence from the graph structure (without computing probabilities).

X and Y are d-separated given Z if every undirected path between X and Y is "blocked" by Z:

Chain (A → B → C): B in Z blocks.
Fork (A ← B → C): B in Z blocks.
Collider (A → B ← C): B NOT in Z blocks. B in Z opens the path.

Inference

Exact inference: Variable elimination, junction tree algorithm. Exponential in treewidth.

Variable elimination: Eliminate variables one at a time by summing over them (marginalization). Order of elimination affects efficiency.

Belief propagation: Message-passing on trees. Each node sends messages to neighbors. Converges to exact marginals on trees. Approximate (loopy BP) on general graphs.

Hidden Markov Models (HMMs)

Model sequential data with hidden states.

S₁ → S₂ → S₃ → S₄ → ...    (hidden states)
↓     ↓     ↓     ↓
O₁    O₂    O₃    O₄    ...    (observations)

Parameters: Initial distribution π, transition probabilities A(sᵢ → sⱼ), emission probabilities B(sᵢ → oₖ).

Three Problems

Evaluation (Forward algorithm): P(O₁...Oₜ | model). Uses dynamic programming. O(T·N²).

Decoding (Viterbi algorithm): Find the most likely state sequence given observations. DP: track the best path to each state. O(T·N²).

Learning (Baum-Welch / EM): Estimate model parameters from observations. Iteratively: E-step (compute expected state occupancies using forward-backward), M-step (update parameters). Converges to local optimum.

// Viterbi algorithm (sketch)
PROCEDURE VITERBI(obs, trans, emit, init) → path
    n_states ← LENGTH(trans)
    t_len ← LENGTH(obs)
    dp ← 2D array [t_len x n_states], filled with -INFINITY
    backptr ← 2D array [t_len x n_states], filled with 0

    // Initialize
    FOR s ← 0 TO n_states - 1 DO
        dp[0][s] ← LN(init[s]) + LN(emit[s][obs[0]])

    // Forward pass
    FOR t ← 1 TO t_len - 1 DO
        FOR s ← 0 TO n_states - 1 DO
            FOR prev ← 0 TO n_states - 1 DO
                score ← dp[t-1][prev] + LN(trans[prev][s]) + LN(emit[s][obs[t]])
                IF score > dp[t][s] THEN
                    dp[t][s] ← score
                    backptr[t][s] ← prev

    // Backtrace
    path ← array of size t_len
    path[t_len - 1] ← ARGMAX over s of dp[t_len - 1][s]
    FOR t ← t_len - 2 DOWNTO 0 DO
        path[t] ← backptr[t + 1][path[t + 1]]
    RETURN path

P(x₁,...,xₙ) = (1/Z) Π φₖ(xₖ)

Used in: Image segmentation (pixels connected to neighbors), spatial statistics, CRF for sequence labeling.

Probabilistic Programming

Specify generative models as programs. Inference engine automatically computes posteriors.

Languages: Stan (MCMC), Pyro (variational inference, deep learning), Gen (Julia), Turing.jl.

# Pyro-style probabilistic program
def model(data):
    mu = pyro.sample("mu", Normal(0, 10))
    sigma = pyro.sample("sigma", HalfNormal(5))
    with pyro.plate("data", len(data)):
        pyro.sample("obs", Normal(mu, sigma), obs=data)

Applications in CS

Speech recognition: HMMs for acoustic modeling (before deep learning). Still used in hybrid systems.
Natural language: Part-of-speech tagging (HMM tagger). Named entity recognition (CRF). Language models.
Robotics: Kalman filter for localization. Particle filters for SLAM. Bayesian planning.
Medical diagnosis: Bayesian networks model disease-symptom relationships.
Spam filtering: Naive Bayes (simplified Bayesian network).
Finance: HMMs for regime detection (bull/bear markets). Bayesian inference for risk.