Monte Carlo Methods

Monte Carlo Integration

Monte Carlo integration estimates integrals by random sampling. For integral I = ∫f(x)dx over domain D:

Î = (Volume of D / N) · Σ f(xᵢ),  xᵢ ~ Uniform(D)

The estimator is unbiased with standard error O(1/√N), independent of dimension — a decisive advantage over deterministic quadrature in high dimensions.

// Estimate pi by sampling points in the unit square
PROCEDURE ESTIMATE_PI(n) → real
    inside ← 0
    FOR i ← 1 TO n DO
        x ← RANDOM_UNIFORM(0, 1)
        y ← RANDOM_UNIFORM(0, 1)
        IF x*x + y*y ≤ 1.0 THEN
            inside ← inside + 1
    RETURN 4.0 * inside / n

// Monte Carlo integration of f over [a, b]
PROCEDURE MC_INTEGRATE(f, a, b, n) → (estimate, std_err)
    sum ← 0.0
    sum_sq ← 0.0
    FOR i ← 1 TO n DO
        x ← a + (b - a) * RANDOM_UNIFORM(0, 1)
        val ← f(x)
        sum ← sum + val
        sum_sq ← sum_sq + val * val
    mean ← sum / n
    variance ← sum_sq / n - mean * mean
    estimate ← (b - a) * mean
    std_err ← (b - a) * SQRT(variance / n)
    RETURN (estimate, std_err)

Standard Monte Carlo fails when the integrand has sharp peaks that random samples rarely hit. Importance sampling draws from a proposal distribution q(x) that concentrates samples in important regions:

Î = (1/N) · Σ f(xᵢ) · p(xᵢ) / q(xᵢ),  xᵢ ~ q

where p is the original density. The optimal q is proportional to |f(x)|·p(x), but finding it exactly solves the original problem. In practice, choose q that roughly matches the shape of |f·p|.

Pitfalls: If q has thinner tails than f·p, the estimator can have infinite variance. Always ensure q's support covers f·p's support.

Self-Normalized Importance Sampling

When p is known only up to a normalizing constant:

Î = Σ wᵢ f(xᵢ) / Σ wᵢ,  where wᵢ = p(xᵢ)/q(xᵢ)

Biased but consistent. The Effective Sample Size ESS = (Σwᵢ)² / Σwᵢ² diagnoses weight degeneracy.

Stratified Sampling

Partition the domain into K strata S₁, ..., Sₖ. Sample nₖ points within each stratum. The variance reduction comes from eliminating between-stratum variance.

Domain [0, 1] with K = 4 strata:
  S₁ = [0, 0.25)  n₁ samples
  S₂ = [0.25, 0.5) n₂ samples
  S₃ = [0.5, 0.75) n₃ samples
  S₄ = [0.75, 1.0] n₄ samples

Optimal allocation (Neyman): allocate more samples to strata with higher variance.

Variance reduction is guaranteed: Var(stratified) ≤ Var(crude MC).

Variance Reduction Techniques

Antithetic Variates

For each sample using uniform draws U₁, ..., Uₖ, create a paired sample using 1-U₁, ..., 1-Uₖ. The estimator averages both:

Î = (1/2N) · Σ [f(Uᵢ) + f(1-Uᵢ)]

Works when f is monotone (negative correlation between pairs reduces variance).

Control Variates

If g(X) has a known expectation E[g(X)] = μ_g and is correlated with f(X):

Î_cv = (1/N) Σ f(Xᵢ) - c · [(1/N) Σ g(Xᵢ) - μ_g]

Optimal coefficient: c* = Cov(f,g) / Var(g). Variance reduction = ρ²(f,g).

// Control variate MC integration of f on [0,1]
// using g(x) = x as control (E[g] = 0.5)
PROCEDURE MC_CONTROL_VARIATE(f, n) → real
    f_vals ← empty list
    g_vals ← empty list

    FOR i ← 1 TO n DO
        x ← RANDOM_UNIFORM(0, 1)
        APPEND f(x) TO f_vals
        APPEND x TO g_vals

    f_mean ← SUM(f_vals) / n
    g_mean ← SUM(g_vals) / n

    // Estimate optimal c
    cov ← 0.0
    var_g ← 0.0
    FOR i ← 0 TO n - 1 DO
        df ← f_vals[i] - f_mean
        dg ← g_vals[i] - g_mean
        cov ← cov + df * dg
        var_g ← var_g + dg * dg
    c ← cov / var_g

    RETURN f_mean - c * (g_mean - 0.5)

Quasi-Monte Carlo Methods

Replace pseudorandom sequences with low-discrepancy sequences that fill space more uniformly. Convergence improves to O(1/N) or O((log N)^d / N) versus O(1/√N) for standard MC.

Halton Sequence

Generates points by radical inversion in different prime bases for each dimension.

PROCEDURE HALTON(index, base) → real
    result ← 0.0
    f ← 1.0 / base
    i ← index
    WHILE i > 0 DO
        result ← result + (i MOD base) * f
        i ← i / base   (integer division)
        f ← f / base
    RETURN result

// 2D Halton sequence point
PROCEDURE HALTON_2D(index) → (real, real)
    RETURN (HALTON(index, 2), HALTON(index, 3))

Sobol Sequence

Direction-number-based construction producing better uniformity than Halton in high dimensions. Uses Gray code ordering for incremental generation.

Randomized QMC

Randomly shift or scramble the low-discrepancy sequence to enable error estimation (standard QMC gives a point estimate with no error bound). Common: Cranley-Patterson rotation — add a random uniform shift modulo 1.

Sequential Monte Carlo (Particle Filters)

Track a probability distribution over time as new observations arrive.

Algorithm (Bootstrap Particle Filter)

1. Initialize: draw particles x₁⁽¹⁾, ..., x₁⁽ᴺ⁾ from prior
2. For each time step t:
   a. Propagate: x̃ₜ⁽ⁱ⁾ ~ p(xₜ | xₜ₋₁⁽ⁱ⁾)
   b. Weight: wₜ⁽ⁱ⁾ = p(yₜ | x̃ₜ⁽ⁱ⁾)
   c. Normalize: W⁽ⁱ⁾ = wₜ⁽ⁱ⁾ / Σ wₜ⁽ʲ⁾
   d. Resample: draw N particles from {x̃ₜ⁽ⁱ⁾} with probabilities {W⁽ⁱ⁾}

Degeneracy problem: After resampling, many particles are copies. Mitigations:

Resample only when ESS drops below threshold (adaptive resampling)
Use better proposal distributions (optimal, locally linearized)
Regularization (add small noise after resampling)

Applications: robot localization, target tracking, financial volatility estimation.

Rare Event Simulation

Estimating probabilities of events with p < 10⁻⁶ requires specialized techniques since standard MC needs ~1/p samples.

Splitting

When a rare event requires crossing multiple barriers:

Define intermediate thresholds L₁ < L₂ < ... < Lₖ = L
Run N₀ simulations; those crossing L₁ are "split" into copies
Continue each copy; split again at L₂
P(rare event) ≈ ∏ P(cross Lᵢ | crossed Lᵢ₋₁)

Cross-Entropy Method

Iteratively adapt the sampling distribution to concentrate on rare event paths:

Start with original distribution
Sample and identify top-performing (rarest) paths
Fit new distribution parameters by minimizing cross-entropy to elite samples
Repeat until convergence

Applications: network reliability, financial risk (VaR estimation), safety-critical systems.

Convergence and Error Analysis

Method	Convergence Rate	Error Estimable?
Crude MC	O(N⁻¹/²)	Yes (CLT)
Importance sampling	O(N⁻¹/²) but lower constant	Yes
Stratified MC	O(N⁻¹/²) but lower constant	Yes
Quasi-MC	O(N⁻¹ (log N)^d)	Only with randomization
Antithetic	O(N⁻¹/²) but lower constant	Yes

The dimension-independent convergence of MC methods makes them the method of choice for integrals in d > ~5 dimensions, where grid-based quadrature suffers the curse of dimensionality.