Numerical Differentiation and Integration
Finite Differences
Approximate derivatives using function values at discrete points.
Forward Difference
f'(x) ≈ (f(x+h) - f(x)) / h
Error: O(h). First-order accurate.
Backward Difference
f'(x) ≈ (f(x) - f(x-h)) / h
Error: O(h).
Central Difference
f'(x) ≈ (f(x+h) - f(x-h)) / (2h)
Error: O(h²). Second-order accurate — much better than forward/backward.
Second Derivative
f''(x) ≈ (f(x+h) - 2f(x) + f(x-h)) / h²
Error: O(h²).
Higher-Order Formulas
Using more points increases accuracy:
f'(x) ≈ (-f(x+2h) + 8f(x+h) - 8f(x-h) + f(x-2h)) / (12h)
Error: O(h⁴). Fourth-order central difference.
Optimal Step Size
Two competing errors:
- Truncation error: ∝ h^p (decreases with smaller h)
- Rounding error: ∝ ε/h (increases with smaller h)
Optimal h minimizes total error. For central differences: h_opt ≈ ε^(1/3) ≈ 6×10⁻⁶ for f64.
Automatic Differentiation vs Finite Differences
Finite differences: Simple but approximate. Choosing h is tricky. O(n) evaluations for gradient in ℝⁿ.
Automatic differentiation (AD): Exact (to machine precision). No step size issues. Forward mode: O(n) for gradient. Reverse mode (backpropagation): O(1) gradient computations regardless of n. AD is preferred in modern ML frameworks.
Richardson Extrapolation
Improve accuracy by combining approximations at different step sizes.
If A(h) ≈ A + c₁h^p + c₂h^(p+1) + ..., then:
A_improved = (2^p · A(h/2) - A(h)) / (2^p - 1)
Error becomes O(h^(p+1)) — one order higher.
Example: Central difference has error O(h²). Richardson with h and h/2:
f'_improved = (4·f'(h/2) - f'(h)) / 3
gives O(h⁴) accuracy.
Can be applied repeatedly to build a Richardson table (Neville-like).
Numerical Integration (Quadrature)
Newton-Cotes Formulas
Integrate by interpolating with a polynomial on equally spaced points, then integrating the polynomial.
Trapezoidal Rule (linear interpolation):
∫ₐᵇ f(x)dx ≈ (b-a)/2 · (f(a) + f(b))
Error: O(h³f''), where h = b - a.
Composite trapezoidal (n subintervals, h = (b-a)/n):
∫ₐᵇ f(x)dx ≈ h/2 · (f(x₀) + 2f(x₁) + 2f(x₂) + ... + 2f(xₙ₋₁) + f(xₙ))
Error: O(h²).
Simpson's Rule (quadratic interpolation):
∫ₐᵇ f(x)dx ≈ (b-a)/6 · (f(a) + 4f((a+b)/2) + f(b))
Error: O(h⁵f⁴). Exact for polynomials up to degree 3 (one degree higher than expected!).
Composite Simpson (n subintervals, n even):
∫ₐᵇ f(x)dx ≈ h/3 · (f(x₀) + 4f(x₁) + 2f(x₂) + 4f(x₃) + ... + 4f(xₙ₋₁) + f(xₙ))
Error: O(h⁴).
Simpson's 3/8 Rule: Uses cubic interpolation on 4 points.
Gaussian Quadrature
Choose both nodes and weights optimally (not restricted to equal spacing).
∫₋₁¹ f(x)dx ≈ Σᵢ₌₁ⁿ wᵢ f(xᵢ)
Gauss-Legendre: Nodes are roots of Legendre polynomials. Exact for polynomials up to degree 2n-1 using only n points. Double the accuracy of Newton-Cotes with the same number of evaluations.
| n points | Exact for degree | Newton-Cotes needs | |---|---|---| | 1 | 1 | 1 | | 2 | 3 | 3 | | 3 | 5 | 5 | | n | 2n-1 | 2n-1 (n+1 points) |
Other Gaussian rules:
- Gauss-Hermite: ∫ f(x)e^(-x²) dx. Used in statistics.
- Gauss-Laguerre: ∫₀^∞ f(x)e^(-x) dx. Used in physics.
- Gauss-Chebyshev: ∫ f(x)/√(1-x²) dx.
For a general interval [a, b]: transform via x = (b+a)/2 + (b-a)/2 · t.
Adaptive Quadrature
Automatically refine the mesh where the integrand is difficult.
Algorithm:
- Estimate integral on [a, b] with a simple rule (S₁).
- Subdivide: estimate on [a, m] and [m, b] separately (S₂).
- If |S₂ - S₁| < tolerance: accept S₂.
- Else: recursively subdivide each half.
Concentrates function evaluations where the integrand varies rapidly.
Gauss-Kronrod: Embeds a Gauss rule within a higher-order rule to estimate error without extra evaluations. Used in GSL, QUADPACK (the standard).
Romberg Integration
Apply Richardson extrapolation to the composite trapezoidal rule.
Build a table:
T(1,1)
T(2,1) T(2,2)
T(3,1) T(3,2) T(3,3)
...
where T(k,1) uses 2^(k-1) subintervals, and T(k,j) = extrapolation.
T(k,2) = Simpson's rule. T(k,3) = Boole's rule. Each column gains two orders of accuracy.
Monte Carlo Integration
Estimate ∫f(x)dx by random sampling:
∫ₐᵇ f(x)dx ≈ (b-a)/N · Σᵢ f(xᵢ) where xᵢ ~ Uniform(a, b)
Error: O(1/√N) regardless of dimension. Crucial for high-dimensional integrals where deterministic methods fail (curse of dimensionality).
Importance sampling: Sample from a distribution q(x) concentrated where f is large:
∫ f(x)dx = ∫ (f(x)/q(x)) · q(x)dx ≈ (1/N) Σ f(xᵢ)/q(xᵢ)
Reduces variance.
Quasi-Monte Carlo: Use low-discrepancy sequences (Halton, Sobol) instead of random points. Achieves O(1/N) or better (vs O(1/√N) for random).
Multi-Dimensional Integration
Product rules: Tensor product of 1D rules. n^d points for d dimensions — exponential growth!
Sparse grids (Smolyak): Reduce the number of points from n^d to O(n(log n)^(d-1)). Practical for d ≤ 10-20.
Monte Carlo: Error O(1/√N) independent of d. The only practical choice for d ≫ 10.
Applications in CS
- Machine learning: Gradient computation (automatic differentiation preferred over finite differences). Numerical integration in Bayesian inference (MCMC, variational).
- Computer graphics: Integrating lighting equations (Monte Carlo path tracing). Importance sampling for noise reduction.
- Scientific computing: Solving PDEs (finite differences for spatial derivatives). Quadrature for finite element methods.
- Signal processing: Numerical Fourier transform. Filter coefficient computation.
- Finance: Option pricing via Monte Carlo. Greeks (sensitivities) via finite differences.
- Physics simulation: Numerical integration of forces, energy computation.
- Probabilistic programming: Computing expectations, marginal likelihoods, evidence.