5 min read
On this page

Numerical Linear Algebra

Numerical linear algebra solves linear systems, eigenvalue problems, and least squares — the computational core of scientific computing, ML, and data analysis.

Direct Methods for Ax = b

LU with Pivoting

Covered in linear algebra topic. Key numerical points:

Partial pivoting: At each step, swap the row with the largest absolute value in the current column to the pivot position. Prevents division by small numbers.

Growth factor: With partial pivoting, the growth factor g ≤ 2^(n-1) in theory, but in practice g = O(1). Without pivoting, g can be enormous → catastrophic instability.

Complete pivoting: Swap both rows and columns. Growth factor ≤ n^(1/2) · Πₖ₌₂ⁿ k^(1/(2(k-1))). Rarely needed in practice.

Cost: O(2n³/3) flops.

Cholesky for SPD Systems

For symmetric positive definite A:

A = LLᵀ

Cost: O(n³/3) — half of LU. No pivoting needed (always stable).

When to use: Covariance matrices, kernel matrices, normal equations (AᵀA), any SPD system.

QR for Least Squares

A = QR (Householder). Solve Rx = Qᵀb by back-substitution.

More stable than normal equations (AᵀA can square the condition number: κ(AᵀA) = κ(A)²).

Cost: O(2mn² - 2n³/3) for m × n matrix (m ≥ n).

Iterative Methods for Ax = b

For large sparse systems, direct methods are too expensive (O(n³)). Iterative methods exploit sparsity.

Jacobi Method

Split A = D + (L + U) where D is diagonal:

x^(k+1) = D⁻¹(b - (L + U)x^(k))

Component-wise: xᵢ^(k+1) = (bᵢ - Σⱼ≠ᵢ aᵢⱼxⱼ^(k)) / aᵢᵢ.

Convergence: Converges if A is strictly diagonally dominant (|aᵢᵢ| > Σⱼ≠ᵢ |aᵢⱼ|).

Gauss-Seidel Method

Like Jacobi, but uses updated values immediately:

xᵢ^(k+1) = (bᵢ - Σⱼ<ᵢ aᵢⱼxⱼ^(k+1) - Σⱼ>ᵢ aᵢⱼxⱼ^(k)) / aᵢᵢ

Typically converges faster than Jacobi. Converges for SPD matrices.

Successive Over-Relaxation (SOR)

Accelerated Gauss-Seidel with relaxation parameter ω:

xᵢ^(k+1) = (1-ω)xᵢ^(k) + ω · GS_update

Optimal ω is problem-dependent (1 < ω < 2 for over-relaxation). Can dramatically speed convergence.

Conjugate Gradient (CG)

The method of choice for large sparse SPD systems.

Idea: Minimize f(x) = ½xᵀAx - bᵀx (whose gradient is Ax - b). Search along A-conjugate directions.

Algorithm:

r₀ = b - Ax₀,  p₀ = r₀
For k = 0, 1, 2, ...:
    αₖ = rₖᵀrₖ / (pₖᵀApₖ)
    xₖ₊₁ = xₖ + αₖpₖ
    rₖ₊₁ = rₖ - αₖApₖ
    if ‖rₖ₊₁‖ < tol: stop
    βₖ = rₖ₊₁ᵀrₖ₊₁ / (rₖᵀrₖ)
    pₖ₊₁ = rₖ₊₁ + βₖpₖ

Properties:

  • Converges in at most n iterations (exact arithmetic)
  • Each iteration: one matrix-vector multiply + O(n) vector operations
  • Convergence rate: ‖eₖ‖_A ≤ 2((√κ-1)/(√κ+1))^k ‖e₀‖_A
  • Fast when κ(A) is small (well-conditioned)

Cost per iteration: O(nnz) where nnz = number of non-zeros in A.

GMRES

Generalized Minimum Residual — the standard for non-symmetric sparse systems.

Minimizes ‖Ax - b‖ over the Krylov subspace K_k = span{r₀, Ar₀, A²r₀, ...}.

Properties: Converges in at most n iterations. Can be restarted (GMRES(m)) to bound memory.

BiCGSTAB

Bi-Conjugate Gradient Stabilized. For non-symmetric systems. More stable than plain BiCG.

Preconditioning

Transform Ax = b into M⁻¹Ax = M⁻¹b where M⁻¹A has a smaller condition number.

Good preconditioner M ≈ A but M⁻¹ is cheap to apply.

Common preconditioners:

  • Jacobi (diagonal): M = diag(A). Simplest.
  • SSOR: Symmetric SOR as preconditioner.
  • Incomplete LU (ILU): LU factorization keeping only the sparsity pattern of A.
  • Incomplete Cholesky (IC): For SPD systems.
  • Algebraic multigrid (AMG): Automatic multilevel preconditioner. Very effective.

Preconditioning can reduce iterations from thousands to tens.

Sparse Matrix Storage

| Format | Description | Best for | |---|---|---| | COO | (row, col, val) triples | Construction, conversion | | CSR | Row pointers + col indices + values | Row access, SpMV | | CSC | Col pointers + row indices + values | Column access, direct solvers | | Diagonal | Store diagonals only | Banded matrices | | BSR | Block sparse row | Block-structured matrices |

SpMV (Sparse Matrix-Vector multiply): O(nnz), the dominant cost in iterative methods.

Eigenvalue Computation

Power Iteration

Find the dominant eigenvalue (largest |λ|) and its eigenvector:

v^(k+1) = Av^(k) / ‖Av^(k)‖
λ ≈ (v^(k))ᵀ Av^(k) (Rayleigh quotient)

Convergence: Linear, rate |λ₂/λ₁|. Slow if the gap is small.

Inverse Iteration

Find eigenvalue closest to a shift σ:

(A - σI) v^(k+1) = v^(k)
normalize v^(k+1)

Convergence: Linear, rate |λ_closest - σ| / |λ_second_closest - σ|. Very fast when σ ≈ λ.

QR Algorithm

The standard method for computing all eigenvalues.

A₀ = A
For k = 0, 1, 2, ...:
    Qₖ Rₖ = Aₖ   (QR factorization)
    Aₖ₊₁ = Rₖ Qₖ  (reverse multiply)

Aₖ converges to upper triangular (Schur form). Diagonal entries → eigenvalues.

With shifts (Wilkinson shift): Cubic convergence. Reduces O(n³) iterations to O(n) iterations.

Cost: O(n³) total (after initial reduction to Hessenberg form in O(n³)).

Lanczos Algorithm

For large sparse symmetric matrices. Finds extreme eigenvalues.

Builds an orthonormal basis for the Krylov subspace. The eigenvalues of the tridiagonal projection approximate the extreme eigenvalues of A.

Cost: O(k · nnz) for k eigenvalues. Much cheaper than QR for sparse matrices.

Numerical issue: Loss of orthogonality. Requires reorthogonalization.

Arnoldi Algorithm

Lanczos generalized to non-symmetric matrices. Builds an upper Hessenberg projection.

SVD Computation

  1. Reduce to bidiagonal form via Householder: O(mn²).
  2. Apply iterative bidiagonal SVD (Golub-Kahan, divide-and-conquer): O(n²).

For large sparse: randomized SVD — project to low rank via random matrix, then compute SVD of the small matrix. O(mn log k) for rank-k approximation.

Summary: Which Method to Use

| Problem | Small/Dense | Large/Sparse | |---|---|---| | Ax = b (general) | LU with pivoting | GMRES + preconditioner | | Ax = b (SPD) | Cholesky | Conjugate gradient + preconditioner | | Ax ≈ b (least squares) | QR or SVD | LSQR, CGLS | | All eigenvalues | QR algorithm | N/A (use subset methods) | | Few eigenvalues (symmetric) | N/A | Lanczos | | Few eigenvalues (non-symmetric) | N/A | Arnoldi (ARPACK) | | SVD | Golub-Kahan | Randomized SVD |

Applications in CS

  • Machine learning: Solving linear systems in regression, kernel methods. SVD for PCA, recommendations. Sparse solvers for large-scale problems.
  • Graph algorithms: Spectral clustering uses Lanczos. PageRank uses power iteration.
  • Physics simulation: Finite element methods produce large sparse SPD systems → CG with AMG preconditioner.
  • Computer graphics: Solving Poisson equations for seamless cloning, mesh smoothing.
  • Optimization: Newton's method requires solving Hx = -g at each step. CG can solve approximately (truncated Newton).
  • Signal processing: Large Toeplitz systems from convolution → fast solvers (FFT-based).
  • Network analysis: Eigenvalues of adjacency/Laplacian matrices reveal graph properties.