Applications of Linear Algebra
Linear algebra is not just theory — it powers the most impactful algorithms in computer science. This file connects the mathematical foundations to real systems.
Principal Component Analysis (PCA)
PCA finds the directions of maximum variance in high-dimensional data, enabling dimensionality reduction.
Algorithm
Given data matrix X ∈ ℝⁿˣᵈ (n samples, d features):
- Center the data: X̄ = X - mean(X).
- Compute covariance matrix: C = (1/n) X̄ᵀX̄ ∈ ℝᵈˣᵈ.
- Eigendecompose: C = VΛVᵀ. Eigenvectors V are the principal components, eigenvalues Λ are the variances.
- Project: X_reduced = X̄V_k where V_k contains the top k eigenvectors.
Equivalently, compute the SVD of X̄ = UΣVᵀ. The right singular vectors V are the principal components.
Choosing k
- Explained variance ratio: λₖ / Σ λᵢ. Choose k so cumulative ratio ≥ 95%.
- Scree plot: Plot eigenvalues and look for an "elbow."
- Kaiser criterion: Keep components with λ > 1 (for standardized data).
Applications
- Data visualization (project to 2D/3D)
- Noise reduction (discard small-eigenvalue components)
- Feature extraction (face recognition: eigenfaces)
- Preprocessing for ML (reduce dimensionality, remove multicollinearity)
PageRank
Google's original ranking algorithm models the web as a Markov chain.
Setup
- Web pages are vertices, hyperlinks are directed edges.
- Transition matrix M: Mᵢⱼ = 1/outdeg(j) if j → i, else 0.
- Add damping factor α (typically 0.85) to handle dead ends and cycles:
G = αM + (1-α)(1/n)eeᵀ
Computing PageRank
PageRank vector r satisfies: r = Gr (stationary distribution).
Power iteration: r(t+1) = Gr(t). Converges because G is a stochastic matrix with a unique stationary distribution.
Key insight: This is an eigenvector problem — r is the dominant eigenvector of G (eigenvalue 1).
Why It Works
- Pages linked by many important pages get high rank.
- The random surfer model: probability of landing on each page after infinite random browsing.
- The damping factor ensures ergodicity (random teleportation prevents getting stuck).
Markov Chains and Linear Algebra
A Markov chain with transition matrix P and state distribution π:
π(t+1) = Pᵀ π(t) → π(∞) = stationary distribution
Stationary distribution: Left eigenvector of P for eigenvalue 1.
Mixing time: How fast the chain converges to stationarity. Determined by the spectral gap (1 - |λ₂|), where λ₂ is the second-largest eigenvalue magnitude.
Larger spectral gap → faster mixing.
Systems of Differential Equations
Linear ODEs: x'(t) = Ax(t).
Solution: x(t) = e^{At} x(0).
If A is diagonalizable (A = PDP⁻¹):
x(t) = P · diag(e^{λ₁t}, e^{λ₂t}, ..., e^{λₙt}) · P⁻¹ · x(0)
Stability: All solutions decay to 0 iff all eigenvalues have negative real part.
| Eigenvalue | System behavior | |---|---| | Re(λ) < 0 | Stable (decaying) | | Re(λ) = 0 | Marginally stable (oscillating) | | Re(λ) > 0 | Unstable (growing) | | Complex eigenvalues | Oscillation (spiral in/out) |
Computer Graphics Transformations
3D graphics use 4×4 matrices with homogeneous coordinates to represent all transformations uniformly.
A 3D point (x, y, z) becomes (x, y, z, 1) in homogeneous coordinates.
Transformations
Translation by (tx, ty, tz):
T = [1 0 0 tx]
[0 1 0 ty]
[0 0 1 tz]
[0 0 0 1]
Scaling by (sx, sy, sz):
S = [sx 0 0 0]
[0 sy 0 0]
[0 0 sz 0]
[0 0 0 1]
Rotation around z-axis by θ:
Rz = [cos θ -sin θ 0 0]
[sin θ cos θ 0 0]
[ 0 0 1 0]
[ 0 0 0 1]
The Graphics Pipeline
Model Matrix (M) → World coordinates
View Matrix (V) → Camera coordinates
Projection Matrix (P) → Clip coordinates
Viewport Transform → Screen coordinates
Combined: MVP = P · V · M. Each vertex is transformed by this single matrix multiplication.
Perspective projection:
P = [2n/(r-l) 0 (r+l)/(r-l) 0 ]
[ 0 2n/(t-b) (t+b)/(t-b) 0 ]
[ 0 0 -(f+n)/(f-n) -2fn/(f-n)]
[ 0 0 -1 0 ]
Least Squares Regression
Simple linear regression: y = β₀ + β₁x.
Multiple linear regression: y = Xβ + ε, solved by β̂ = (XᵀX)⁻¹Xᵀy.
Polynomial Regression
Fit y = β₀ + β₁x + β₂x² + ... + βₖxᵏ. Create the Vandermonde matrix:
X = [1 x₁ x₁² ... x₁ᵏ]
[1 x₂ x₂² ... x₂ᵏ]
[⋮ ⋮ ⋮ ⋱ ⋮ ]
[1 xₙ xₙ² ... xₙᵏ]
Then β̂ = (XᵀX)⁻¹Xᵀy (least squares).
Ridge Regression (L2 Regularization)
When XᵀX is ill-conditioned (nearly singular), add regularization:
β̂ = (XᵀX + λI)⁻¹Xᵀy
This is equivalent to adding a penalty λ‖β‖² to the loss function. The SVD provides insight: singular values σᵢ are replaced by σᵢ²/(σᵢ² + λ), shrinking small singular values.
Signal Processing (DFT as Linear Map)
The Discrete Fourier Transform is a linear transformation:
X = Fₙ x
where Fₙ is the n × n DFT matrix:
(Fₙ)ⱼₖ = ω^(jk), ω = e^(-2πi/n)
Fₙ is symmetric and (1/√n)Fₙ is unitary: Fₙ⁻¹ = (1/n)F̄ₙ.
The FFT computes Fₙx in O(n log n) by exploiting the structure of Fₙ (recursive factorization using Cooley-Tukey).
Recommendation Systems
Matrix factorization for collaborative filtering: approximate the user-item rating matrix R ≈ UVᵀ where U ∈ ℝⁿˣᵏ and V ∈ ℝᵐˣᵏ.
- U: user feature matrix (n users, k latent features)
- V: item feature matrix (m items, k latent features)
- Predicted rating: r̂ᵤᵢ = uᵤᵀ vᵢ
Training: Minimize ‖R - UVᵀ‖² + regularization. Solved via alternating least squares (ALS) or stochastic gradient descent.
Connection to SVD: The optimal rank-k approximation is the truncated SVD. Netflix Prize (2009) demonstrated the power of matrix factorization.
Graph Spectral Theory
The Laplacian matrix L = D - A (degree matrix minus adjacency matrix) encodes graph structure.
Properties of L:
- Symmetric positive semidefinite
- Smallest eigenvalue is 0 (eigenvector: all-ones vector)
- Number of zero eigenvalues = number of connected components
- Second-smallest eigenvalue λ₂ (algebraic connectivity / Fiedler value) measures how well-connected the graph is
Spectral clustering: Use eigenvectors of L to embed vertices in ℝᵏ, then cluster with k-means.
Cheeger's inequality: λ₂/2 ≤ h(G) ≤ √(2λ₂), where h(G) is the edge expansion (Cheeger constant). Links spectral and combinatorial properties.
Compressed Sensing
Recover a sparse signal x ∈ ℝⁿ from m ≪ n measurements y = Ax.
If x is k-sparse and A satisfies the Restricted Isometry Property (RIP), then x can be recovered exactly by solving:
minimize ‖x‖₁ subject to Ax = y
Random matrices (Gaussian, Bernoulli) satisfy RIP with high probability when m = O(k log(n/k)).
Applications: MRI acceleration, single-pixel cameras, radar.
Quantum Computing
Quantum states are unit vectors in ℂ^(2ⁿ) (n qubits).
- Quantum gates are unitary matrices: UU† = I.
- Measurement is projection onto orthogonal subspaces.
- Entanglement arises from tensor products: |ψ⟩ = α|00⟩ + β|11⟩ ∈ ℂ² ⊗ ℂ².
- Quantum algorithms are sequences of unitary transformations.
| Gate | Matrix | |---|---| | Hadamard | (1/√2)[[1,1],[1,-1]] | | Pauli-X (NOT) | [[0,1],[1,0]] | | CNOT | [[1,0,0,0],[0,1,0,0],[0,0,0,1],[0,0,1,0]] |
The entire field of quantum computing is linear algebra over ℂ.
Summary
Linear algebra is not an isolated mathematical topic — it is the computational language of modern science and engineering:
| Domain | Key Linear Algebra Concept | |---|---| | Machine learning | SVD, eigendecomposition, matrix factorization | | Computer graphics | 4×4 transformation matrices | | Signal processing | DFT/FFT as linear transforms | | Control theory | Eigenvalues (stability), state-space models | | Quantum computing | Unitary matrices, Hilbert spaces | | Graph algorithms | Laplacian, adjacency matrix spectra | | Optimization | Positive definiteness, condition number | | Statistics | Covariance matrices, projection | | Cryptography | Lattice problems, matrix operations mod q | | Data science | PCA, regression, dimensionality reduction |