Monte Carlo Methods
Monte Carlo Integration
Monte Carlo integration estimates integrals by random sampling. For integral I = ∫f(x)dx over domain D:
Î = (Volume of D / N) · Σ f(xᵢ), xᵢ ~ Uniform(D)
The estimator is unbiased with standard error O(1/√N), independent of dimension — a decisive advantage over deterministic quadrature in high dimensions.
// Estimate pi by sampling points in the unit square
PROCEDURE ESTIMATE_PI(n) → real
inside ← 0
FOR i ← 1 TO n DO
x ← RANDOM_UNIFORM(0, 1)
y ← RANDOM_UNIFORM(0, 1)
IF x*x + y*y ≤ 1.0 THEN
inside ← inside + 1
RETURN 4.0 * inside / n
// Monte Carlo integration of f over [a, b]
PROCEDURE MC_INTEGRATE(f, a, b, n) → (estimate, std_err)
sum ← 0.0
sum_sq ← 0.0
FOR i ← 1 TO n DO
x ← a + (b - a) * RANDOM_UNIFORM(0, 1)
val ← f(x)
sum ← sum + val
sum_sq ← sum_sq + val * val
mean ← sum / n
variance ← sum_sq / n - mean * mean
estimate ← (b - a) * mean
std_err ← (b - a) * SQRT(variance / n)
RETURN (estimate, std_err)
Importance Sampling
Standard Monte Carlo fails when the integrand has sharp peaks that random samples rarely hit. Importance sampling draws from a proposal distribution q(x) that concentrates samples in important regions:
Î = (1/N) · Σ f(xᵢ) · p(xᵢ) / q(xᵢ), xᵢ ~ q
where p is the original density. The optimal q is proportional to |f(x)|·p(x), but finding it exactly solves the original problem. In practice, choose q that roughly matches the shape of |f·p|.
Pitfalls: If q has thinner tails than f·p, the estimator can have infinite variance. Always ensure q's support covers f·p's support.
Self-Normalized Importance Sampling
When p is known only up to a normalizing constant:
Î = Σ wᵢ f(xᵢ) / Σ wᵢ, where wᵢ = p(xᵢ)/q(xᵢ)
Biased but consistent. The Effective Sample Size ESS = (Σwᵢ)² / Σwᵢ² diagnoses weight degeneracy.
Stratified Sampling
Partition the domain into K strata S₁, ..., Sₖ. Sample nₖ points within each stratum. The variance reduction comes from eliminating between-stratum variance.
Domain [0, 1] with K = 4 strata:
S₁ = [0, 0.25) n₁ samples
S₂ = [0.25, 0.5) n₂ samples
S₃ = [0.5, 0.75) n₃ samples
S₄ = [0.75, 1.0] n₄ samples
Optimal allocation (Neyman): allocate more samples to strata with higher variance.
Variance reduction is guaranteed: Var(stratified) ≤ Var(crude MC).
Variance Reduction Techniques
Antithetic Variates
For each sample using uniform draws U₁, ..., Uₖ, create a paired sample using 1-U₁, ..., 1-Uₖ. The estimator averages both:
Î = (1/2N) · Σ [f(Uᵢ) + f(1-Uᵢ)]
Works when f is monotone (negative correlation between pairs reduces variance).
Control Variates
If g(X) has a known expectation E[g(X)] = μ_g and is correlated with f(X):
Î_cv = (1/N) Σ f(Xᵢ) - c · [(1/N) Σ g(Xᵢ) - μ_g]
Optimal coefficient: c* = Cov(f,g) / Var(g). Variance reduction = ρ²(f,g).
// Control variate MC integration of f on [0,1]
// using g(x) = x as control (E[g] = 0.5)
PROCEDURE MC_CONTROL_VARIATE(f, n) → real
f_vals ← empty list
g_vals ← empty list
FOR i ← 1 TO n DO
x ← RANDOM_UNIFORM(0, 1)
APPEND f(x) TO f_vals
APPEND x TO g_vals
f_mean ← SUM(f_vals) / n
g_mean ← SUM(g_vals) / n
// Estimate optimal c
cov ← 0.0
var_g ← 0.0
FOR i ← 0 TO n - 1 DO
df ← f_vals[i] - f_mean
dg ← g_vals[i] - g_mean
cov ← cov + df * dg
var_g ← var_g + dg * dg
c ← cov / var_g
RETURN f_mean - c * (g_mean - 0.5)
Quasi-Monte Carlo Methods
Replace pseudorandom sequences with low-discrepancy sequences that fill space more uniformly. Convergence improves to O(1/N) or O((log N)^d / N) versus O(1/√N) for standard MC.
Halton Sequence
Generates points by radical inversion in different prime bases for each dimension.
PROCEDURE HALTON(index, base) → real
result ← 0.0
f ← 1.0 / base
i ← index
WHILE i > 0 DO
result ← result + (i MOD base) * f
i ← i / base (integer division)
f ← f / base
RETURN result
// 2D Halton sequence point
PROCEDURE HALTON_2D(index) → (real, real)
RETURN (HALTON(index, 2), HALTON(index, 3))
Sobol Sequence
Direction-number-based construction producing better uniformity than Halton in high dimensions. Uses Gray code ordering for incremental generation.
Randomized QMC
Randomly shift or scramble the low-discrepancy sequence to enable error estimation (standard QMC gives a point estimate with no error bound). Common: Cranley-Patterson rotation — add a random uniform shift modulo 1.
Sequential Monte Carlo (Particle Filters)
Track a probability distribution over time as new observations arrive.
Algorithm (Bootstrap Particle Filter)
1. Initialize: draw particles x₁⁽¹⁾, ..., x₁⁽ᴺ⁾ from prior
2. For each time step t:
a. Propagate: x̃ₜ⁽ⁱ⁾ ~ p(xₜ | xₜ₋₁⁽ⁱ⁾)
b. Weight: wₜ⁽ⁱ⁾ = p(yₜ | x̃ₜ⁽ⁱ⁾)
c. Normalize: W⁽ⁱ⁾ = wₜ⁽ⁱ⁾ / Σ wₜ⁽ʲ⁾
d. Resample: draw N particles from {x̃ₜ⁽ⁱ⁾} with probabilities {W⁽ⁱ⁾}
Degeneracy problem: After resampling, many particles are copies. Mitigations:
- Resample only when ESS drops below threshold (adaptive resampling)
- Use better proposal distributions (optimal, locally linearized)
- Regularization (add small noise after resampling)
Applications: robot localization, target tracking, financial volatility estimation.
Rare Event Simulation
Estimating probabilities of events with p < 10⁻⁶ requires specialized techniques since standard MC needs ~1/p samples.
Splitting
When a rare event requires crossing multiple barriers:
- Define intermediate thresholds L₁ < L₂ < ... < Lₖ = L
- Run N₀ simulations; those crossing L₁ are "split" into copies
- Continue each copy; split again at L₂
- P(rare event) ≈ ∏ P(cross Lᵢ | crossed Lᵢ₋₁)
Cross-Entropy Method
Iteratively adapt the sampling distribution to concentrate on rare event paths:
- Start with original distribution
- Sample and identify top-performing (rarest) paths
- Fit new distribution parameters by minimizing cross-entropy to elite samples
- Repeat until convergence
Applications: network reliability, financial risk (VaR estimation), safety-critical systems.
Convergence and Error Analysis
| Method | Convergence Rate | Error Estimable? | |--------|-----------------|------------------| | Crude MC | O(N⁻¹/²) | Yes (CLT) | | Importance sampling | O(N⁻¹/²) but lower constant | Yes | | Stratified MC | O(N⁻¹/²) but lower constant | Yes | | Quasi-MC | O(N⁻¹ (log N)^d) | Only with randomization | | Antithetic | O(N⁻¹/²) but lower constant | Yes |
The dimension-independent convergence of MC methods makes them the method of choice for integrals in d > ~5 dimensions, where grid-based quadrature suffers the curse of dimensionality.