Probabilistic Reasoning
Probabilistic reasoning handles uncertainty — the real world is noisy, partially observable, and unpredictable. Bayesian networks and related models provide principled frameworks.
Bayesian Networks
A Bayesian network (belief network) is a directed acyclic graph (DAG) where:
- Nodes represent random variables.
- Edges represent direct dependencies.
- Each node has a conditional probability table (CPT) given its parents.
Burglary Earthquake
\ /
Alarm
/ \
JohnCalls MaryCalls
Joint probability (chain rule using the DAG structure):
P(B,E,A,J,M) = P(B)·P(E)·P(A|B,E)·P(J|A)·P(M|A)
The DAG encodes conditional independencies: given the alarm state, JohnCalls and MaryCalls are independent.
d-Separation
Determines conditional independence from the graph structure (without computing probabilities).
X and Y are d-separated given Z if every undirected path between X and Y is "blocked" by Z:
- Chain (A → B → C): B in Z blocks.
- Fork (A ← B → C): B in Z blocks.
- Collider (A → B ← C): B NOT in Z blocks. B in Z opens the path.
Inference
Exact inference: Variable elimination, junction tree algorithm. Exponential in treewidth.
Variable elimination: Eliminate variables one at a time by summing over them (marginalization). Order of elimination affects efficiency.
Belief propagation: Message-passing on trees. Each node sends messages to neighbors. Converges to exact marginals on trees. Approximate (loopy BP) on general graphs.
Hidden Markov Models (HMMs)
Model sequential data with hidden states.
S₁ → S₂ → S₃ → S₄ → ... (hidden states)
↓ ↓ ↓ ↓
O₁ O₂ O₃ O₄ ... (observations)
Parameters: Initial distribution π, transition probabilities A(sᵢ → sⱼ), emission probabilities B(sᵢ → oₖ).
Three Problems
Evaluation (Forward algorithm): P(O₁...Oₜ | model). Uses dynamic programming. O(T·N²).
Decoding (Viterbi algorithm): Find the most likely state sequence given observations. DP: track the best path to each state. O(T·N²).
Learning (Baum-Welch / EM): Estimate model parameters from observations. Iteratively: E-step (compute expected state occupancies using forward-backward), M-step (update parameters). Converges to local optimum.
// Viterbi algorithm (sketch)
PROCEDURE VITERBI(obs, trans, emit, init) → path
n_states ← LENGTH(trans)
t_len ← LENGTH(obs)
dp ← 2D array [t_len x n_states], filled with -INFINITY
backptr ← 2D array [t_len x n_states], filled with 0
// Initialize
FOR s ← 0 TO n_states - 1 DO
dp[0][s] ← LN(init[s]) + LN(emit[s][obs[0]])
// Forward pass
FOR t ← 1 TO t_len - 1 DO
FOR s ← 0 TO n_states - 1 DO
FOR prev ← 0 TO n_states - 1 DO
score ← dp[t-1][prev] + LN(trans[prev][s]) + LN(emit[s][obs[t]])
IF score > dp[t][s] THEN
dp[t][s] ← score
backptr[t][s] ← prev
// Backtrace
path ← array of size t_len
path[t_len - 1] ← ARGMAX over s of dp[t_len - 1][s]
FOR t ← t_len - 2 DOWNTO 0 DO
path[t] ← backptr[t + 1][path[t + 1]]
RETURN path
Kalman Filters
Continuous-state HMM with linear-Gaussian dynamics. Covered in DSP topic. Optimal for linear systems with Gaussian noise.
Dynamic Bayesian Networks
Generalize HMMs: arbitrary Bayesian network structure at each time step, with edges across time steps.
Markov Random Fields (MRFs)
Undirected graphical models. Factors (potential functions) encode local compatibility.
P(x₁,...,xₙ) = (1/Z) Π φₖ(xₖ)
Used in: Image segmentation (pixels connected to neighbors), spatial statistics, CRF for sequence labeling.
Probabilistic Programming
Specify generative models as programs. Inference engine automatically computes posteriors.
Languages: Stan (MCMC), Pyro (variational inference, deep learning), Gen (Julia), Turing.jl.
# Pyro-style probabilistic program
def model(data):
mu = pyro.sample("mu", Normal(0, 10))
sigma = pyro.sample("sigma", HalfNormal(5))
with pyro.plate("data", len(data)):
pyro.sample("obs", Normal(mu, sigma), obs=data)
Applications in CS
- Speech recognition: HMMs for acoustic modeling (before deep learning). Still used in hybrid systems.
- Natural language: Part-of-speech tagging (HMM tagger). Named entity recognition (CRF). Language models.
- Robotics: Kalman filter for localization. Particle filters for SLAM. Bayesian planning.
- Medical diagnosis: Bayesian networks model disease-symptom relationships.
- Spam filtering: Naive Bayes (simplified Bayesian network).
- Finance: HMMs for regime detection (bull/bear markets). Bayesian inference for risk.