Numerical Differentiation and Integration

Finite Differences

Approximate derivatives using function values at discrete points.

Central Difference

f'(x) ≈ (f(x+h) - f(x-h)) / (2h)

Error: O(h²). Second-order accurate — much better than forward/backward.

Higher-Order Formulas

Using more points increases accuracy:

f'(x) ≈ (-f(x+2h) + 8f(x+h) - 8f(x-h) + f(x-2h)) / (12h)

Error: O(h⁴). Fourth-order central difference.

Optimal Step Size

Two competing errors:

Truncation error: ∝ h^p (decreases with smaller h)
Rounding error: ∝ ε/h (increases with smaller h)

Optimal h minimizes total error. For central differences: h_opt ≈ ε^(1/3) ≈ 6×10⁻⁶ for f64.

Automatic Differentiation vs Finite Differences

Finite differences: Simple but approximate. Choosing h is tricky. O(n) evaluations for gradient in ℝⁿ.

Automatic differentiation (AD): Exact (to machine precision). No step size issues. Forward mode: O(n) for gradient. Reverse mode (backpropagation): O(1) gradient computations regardless of n. AD is preferred in modern ML frameworks.

Richardson Extrapolation

Improve accuracy by combining approximations at different step sizes.

If A(h) ≈ A + c₁h^p + c₂h^(p+1) + ..., then:

A_improved = (2^p · A(h/2) - A(h)) / (2^p - 1)

Error becomes O(h^(p+1)) — one order higher.

Example: Central difference has error O(h²). Richardson with h and h/2:

f'_improved = (4·f'(h/2) - f'(h)) / 3

gives O(h⁴) accuracy.

Can be applied repeatedly to build a Richardson table (Neville-like).

Numerical Integration (Quadrature)

Newton-Cotes Formulas

Integrate by interpolating with a polynomial on equally spaced points, then integrating the polynomial.

Trapezoidal Rule (linear interpolation):

∫ₐᵇ f(x)dx ≈ (b-a)/2 · (f(a) + f(b))

Error: O(h³f''), where h = b - a.

Composite trapezoidal (n subintervals, h = (b-a)/n):

∫ₐᵇ f(x)dx ≈ h/2 · (f(x₀) + 2f(x₁) + 2f(x₂) + ... + 2f(xₙ₋₁) + f(xₙ))

Error: O(h²).

Simpson's Rule (quadratic interpolation):

∫ₐᵇ f(x)dx ≈ (b-a)/6 · (f(a) + 4f((a+b)/2) + f(b))

Error: O(h⁵f⁴). Exact for polynomials up to degree 3 (one degree higher than expected!).

Composite Simpson (n subintervals, n even):

∫ₐᵇ f(x)dx ≈ h/3 · (f(x₀) + 4f(x₁) + 2f(x₂) + 4f(x₃) + ... + 4f(xₙ₋₁) + f(xₙ))

Error: O(h⁴).

Simpson's 3/8 Rule: Uses cubic interpolation on 4 points.

Gaussian Quadrature

Choose both nodes and weights optimally (not restricted to equal spacing).

∫₋₁¹ f(x)dx ≈ Σᵢ₌₁ⁿ wᵢ f(xᵢ)

Gauss-Legendre: Nodes are roots of Legendre polynomials. Exact for polynomials up to degree 2n-1 using only n points. Double the accuracy of Newton-Cotes with the same number of evaluations.

n points	Exact for degree	Newton-Cotes needs
1	1	1
2	3	3
3	5	5
n	2n-1	2n-1 (n+1 points)

Other Gaussian rules:

Gauss-Hermite: ∫ f(x)e^(-x²) dx. Used in statistics.
Gauss-Laguerre: ∫₀^∞ f(x)e^(-x) dx. Used in physics.
Gauss-Chebyshev: ∫ f(x)/√(1-x²) dx.

For a general interval [a, b]: transform via x = (b+a)/2 + (b-a)/2 · t.

Adaptive Quadrature

Automatically refine the mesh where the integrand is difficult.

Algorithm:

Estimate integral on [a, b] with a simple rule (S₁).
Subdivide: estimate on [a, m] and [m, b] separately (S₂).
If |S₂ - S₁| < tolerance: accept S₂.
Else: recursively subdivide each half.

Concentrates function evaluations where the integrand varies rapidly.

Gauss-Kronrod: Embeds a Gauss rule within a higher-order rule to estimate error without extra evaluations. Used in GSL, QUADPACK (the standard).

Romberg Integration

Apply Richardson extrapolation to the composite trapezoidal rule.

Build a table:

T(1,1)
T(2,1)  T(2,2)
T(3,1)  T(3,2)  T(3,3)
...

where T(k,1) uses 2^(k-1) subintervals, and T(k,j) = extrapolation.

T(k,2) = Simpson's rule. T(k,3) = Boole's rule. Each column gains two orders of accuracy.

Monte Carlo Integration

Estimate ∫f(x)dx by random sampling:

∫ₐᵇ f(x)dx ≈ (b-a)/N · Σᵢ f(xᵢ)   where xᵢ ~ Uniform(a, b)

Error: O(1/√N) regardless of dimension. Crucial for high-dimensional integrals where deterministic methods fail (curse of dimensionality).

Importance sampling: Sample from a distribution q(x) concentrated where f is large:

∫ f(x)dx = ∫ (f(x)/q(x)) · q(x)dx ≈ (1/N) Σ f(xᵢ)/q(xᵢ)

Reduces variance.

Quasi-Monte Carlo: Use low-discrepancy sequences (Halton, Sobol) instead of random points. Achieves O(1/N) or better (vs O(1/√N) for random).

Multi-Dimensional Integration

Product rules: Tensor product of 1D rules. n^d points for d dimensions — exponential growth!

Sparse grids (Smolyak): Reduce the number of points from n^d to O(n(log n)^(d-1)). Practical for d ≤ 10-20.

Monte Carlo: Error O(1/√N) independent of d. The only practical choice for d ≫ 10.

Applications in CS

Machine learning: Gradient computation (automatic differentiation preferred over finite differences). Numerical integration in Bayesian inference (MCMC, variational).
Computer graphics: Integrating lighting equations (Monte Carlo path tracing). Importance sampling for noise reduction.
Scientific computing: Solving PDEs (finite differences for spatial derivatives). Quadrature for finite element methods.
Signal processing: Numerical Fourier transform. Filter coefficient computation.
Finance: Option pricing via Monte Carlo. Greeks (sensitivities) via finite differences.
Physics simulation: Numerical integration of forces, energy computation.
Probabilistic programming: Computing expectations, marginal likelihoods, evidence.