Applications of Linear Algebra

Linear algebra is not just theory — it powers the most impactful algorithms in computer science. This file connects the mathematical foundations to real systems.

Principal Component Analysis (PCA)

PCA finds the directions of maximum variance in high-dimensional data, enabling dimensionality reduction.

Algorithm

Given data matrix X ∈ ℝⁿˣᵈ (n samples, d features):

Center the data: X̄ = X - mean(X).
Compute covariance matrix: C = (1/n) X̄ᵀX̄ ∈ ℝᵈˣᵈ.
Eigendecompose: C = VΛVᵀ. Eigenvectors V are the principal components, eigenvalues Λ are the variances.
Project: X_reduced = X̄V_k where V_k contains the top k eigenvectors.

Equivalently, compute the SVD of X̄ = UΣVᵀ. The right singular vectors V are the principal components.

Choosing k

Explained variance ratio: λₖ / Σ λᵢ. Choose k so cumulative ratio ≥ 95%.
Scree plot: Plot eigenvalues and look for an "elbow."
Kaiser criterion: Keep components with λ > 1 (for standardized data).

Applications

Data visualization (project to 2D/3D)
Noise reduction (discard small-eigenvalue components)
Feature extraction (face recognition: eigenfaces)
Preprocessing for ML (reduce dimensionality, remove multicollinearity)

PageRank

Google's original ranking algorithm models the web as a Markov chain.

Setup

Web pages are vertices, hyperlinks are directed edges.
Transition matrix M: Mᵢⱼ = 1/outdeg(j) if j → i, else 0.
Add damping factor α (typically 0.85) to handle dead ends and cycles:

G = αM + (1-α)(1/n)eeᵀ

Computing PageRank

PageRank vector r satisfies: r = Gr (stationary distribution).

Power iteration: r(t+1) = Gr(t). Converges because G is a stochastic matrix with a unique stationary distribution.

Key insight: This is an eigenvector problem — r is the dominant eigenvector of G (eigenvalue 1).

Why It Works

Pages linked by many important pages get high rank.
The random surfer model: probability of landing on each page after infinite random browsing.
The damping factor ensures ergodicity (random teleportation prevents getting stuck).

Markov Chains and Linear Algebra

A Markov chain with transition matrix P and state distribution π:

π(t+1) = Pᵀ π(t)     →     π(∞) = stationary distribution

Stationary distribution: Left eigenvector of P for eigenvalue 1.

Mixing time: How fast the chain converges to stationarity. Determined by the spectral gap (1 - |λ₂|), where λ₂ is the second-largest eigenvalue magnitude.

Larger spectral gap → faster mixing.

Systems of Differential Equations

Linear ODEs: x'(t) = Ax(t).

Solution: x(t) = e^{At} x(0).

If A is diagonalizable (A = PDP⁻¹):

x(t) = P · diag(e^{λ₁t}, e^{λ₂t}, ..., e^{λₙt}) · P⁻¹ · x(0)

Stability: All solutions decay to 0 iff all eigenvalues have negative real part.

Eigenvalue	System behavior
Re(λ) < 0	Stable (decaying)
Re(λ) = 0	Marginally stable (oscillating)
Re(λ) > 0	Unstable (growing)
Complex eigenvalues	Oscillation (spiral in/out)

Computer Graphics Transformations

3D graphics use 4×4 matrices with homogeneous coordinates to represent all transformations uniformly.

A 3D point (x, y, z) becomes (x, y, z, 1) in homogeneous coordinates.

Transformations

Translation by (tx, ty, tz):

T = [1  0  0  tx]
    [0  1  0  ty]
    [0  0  1  tz]
    [0  0  0   1]

Scaling by (sx, sy, sz):

S = [sx  0   0  0]
    [0   sy  0  0]
    [0   0  sz  0]
    [0   0   0  1]

Rotation around z-axis by θ:

Rz = [cos θ  -sin θ  0  0]
     [sin θ   cos θ  0  0]
     [  0       0    1  0]
     [  0       0    0  1]

The Graphics Pipeline

Model Matrix (M)        →  World coordinates
View Matrix (V)         →  Camera coordinates
Projection Matrix (P)   →  Clip coordinates
Viewport Transform      →  Screen coordinates

Combined: MVP = P · V · M. Each vertex is transformed by this single matrix multiplication.

Perspective projection:

P = [2n/(r-l)    0      (r+l)/(r-l)    0    ]
    [   0     2n/(t-b)  (t+b)/(t-b)    0    ]
    [   0        0     -(f+n)/(f-n)  -2fn/(f-n)]
    [   0        0         -1          0    ]

Least Squares Regression

Simple linear regression: y = β₀ + β₁x.

Multiple linear regression: y = Xβ + ε, solved by β̂ = (XᵀX)⁻¹Xᵀy.

Polynomial Regression

Fit y = β₀ + β₁x + β₂x² + ... + βₖxᵏ. Create the Vandermonde matrix:

X = [1   x₁   x₁²  ...  x₁ᵏ]
    [1   x₂   x₂²  ...  x₂ᵏ]
    [⋮    ⋮    ⋮    ⋱    ⋮  ]
    [1   xₙ   xₙ²  ...  xₙᵏ]

Then β̂ = (XᵀX)⁻¹Xᵀy (least squares).

Ridge Regression (L2 Regularization)

When XᵀX is ill-conditioned (nearly singular), add regularization:

β̂ = (XᵀX + λI)⁻¹Xᵀy

This is equivalent to adding a penalty λ‖β‖² to the loss function. The SVD provides insight: singular values σᵢ are replaced by σᵢ²/(σᵢ² + λ), shrinking small singular values.

Signal Processing (DFT as Linear Map)

The Discrete Fourier Transform is a linear transformation:

X = Fₙ x

where Fₙ is the n × n DFT matrix:

(Fₙ)ⱼₖ = ω^(jk),   ω = e^(-2πi/n)

Fₙ is symmetric and (1/√n)Fₙ is unitary: Fₙ⁻¹ = (1/n)F̄ₙ.

The FFT computes Fₙx in O(n log n) by exploiting the structure of Fₙ (recursive factorization using Cooley-Tukey).

Recommendation Systems

Matrix factorization for collaborative filtering: approximate the user-item rating matrix R ≈ UVᵀ where U ∈ ℝⁿˣᵏ and V ∈ ℝᵐˣᵏ.

U: user feature matrix (n users, k latent features)
V: item feature matrix (m items, k latent features)
Predicted rating: r̂ᵤᵢ = uᵤᵀ vᵢ

Training: Minimize ‖R - UVᵀ‖² + regularization. Solved via alternating least squares (ALS) or stochastic gradient descent.

Connection to SVD: The optimal rank-k approximation is the truncated SVD. Netflix Prize (2009) demonstrated the power of matrix factorization.

Graph Spectral Theory

The Laplacian matrix L = D - A (degree matrix minus adjacency matrix) encodes graph structure.

Properties of L:

Symmetric positive semidefinite
Smallest eigenvalue is 0 (eigenvector: all-ones vector)
Number of zero eigenvalues = number of connected components
Second-smallest eigenvalue λ₂ (algebraic connectivity / Fiedler value) measures how well-connected the graph is

Spectral clustering: Use eigenvectors of L to embed vertices in ℝᵏ, then cluster with k-means.

Cheeger's inequality: λ₂/2 ≤ h(G) ≤ √(2λ₂), where h(G) is the edge expansion (Cheeger constant). Links spectral and combinatorial properties.

Compressed Sensing

Recover a sparse signal x ∈ ℝⁿ from m ≪ n measurements y = Ax.

If x is k-sparse and A satisfies the Restricted Isometry Property (RIP), then x can be recovered exactly by solving:

minimize ‖x‖₁ subject to Ax = y

Random matrices (Gaussian, Bernoulli) satisfy RIP with high probability when m = O(k log(n/k)).

Applications: MRI acceleration, single-pixel cameras, radar.

Quantum Computing

Quantum states are unit vectors in ℂ^(2ⁿ) (n qubits).

Quantum gates are unitary matrices: UU† = I.
Measurement is projection onto orthogonal subspaces.
Entanglement arises from tensor products: |ψ⟩ = α|00⟩ + β|11⟩ ∈ ℂ² ⊗ ℂ².
Quantum algorithms are sequences of unitary transformations.

Gate	Matrix
Hadamard	(1/√2)[[1,1],[1,-1]]
Pauli-X (NOT)	[[0,1],[1,0]]
CNOT	[[1,0,0,0],[0,1,0,0],[0,0,0,1],[0,0,1,0]]

The entire field of quantum computing is linear algebra over ℂ.

Summary

Linear algebra is not an isolated mathematical topic — it is the computational language of modern science and engineering:

Domain	Key Linear Algebra Concept
Machine learning	SVD, eigendecomposition, matrix factorization
Computer graphics	4×4 transformation matrices
Signal processing	DFT/FFT as linear transforms
Control theory	Eigenvalues (stability), state-space models
Quantum computing	Unitary matrices, Hilbert spaces
Graph algorithms	Laplacian, adjacency matrix spectra
Optimization	Positive definiteness, condition number
Statistics	Covariance matrices, projection
Cryptography	Lattice problems, matrix operations mod q
Data science	PCA, regression, dimensionality reduction