Inner Product Spaces
Inner products generalize the dot product, providing notions of length, angle, distance, and orthogonality that are central to geometry, optimization, and approximation.
Inner Products
An inner product on a vector space V over F (ℝ or ℂ) is a function ⟨·, ·⟩: V × V → F satisfying:
- Linearity in first argument: ⟨au + bv, w⟩ = a⟨u, w⟩ + b⟨v, w⟩
- Conjugate symmetry: ⟨u, v⟩ = ⟨v, u⟩̄ (for ℝ: ⟨u, v⟩ = ⟨v, u⟩)
- Positive definiteness: ⟨v, v⟩ ≥ 0, with equality iff v = 0
A vector space with an inner product is an inner product space.
Standard Inner Products
ℝⁿ (dot product):
⟨u, v⟩ = uᵀv = Σᵢ uᵢvᵢ
ℂⁿ (Hermitian inner product):
⟨u, v⟩ = u*v = Σᵢ ūᵢvᵢ
Function space C([a,b]):
⟨f, g⟩ = ∫ₐᵇ f(x)g(x) dx
Weighted inner product: ⟨u, v⟩_W = uᵀWv for positive definite W.
Norms
The inner product induces a norm (length):
‖v‖ = √⟨v, v⟩
For ℝⁿ: ‖v‖ = √(v₁² + v₂² + ... + vₙ²) (Euclidean norm).
A unit vector: ‖v‖ = 1. Normalizing v: v̂ = v/‖v‖.
Other Common Norms
| Norm | Definition | Notation | |---|---|---| | L¹ (Manhattan) | Σ |vᵢ| | ‖v‖₁ | | L² (Euclidean) | √(Σ vᵢ²) | ‖v‖₂ | | L∞ (Chebyshev) | max |vᵢ| | ‖v‖_∞ | | Lᵖ | (Σ |vᵢ|ᵖ)^(1/p) | ‖v‖_p | | Frobenius (matrices) | √(Σ aᵢⱼ²) | ‖A‖_F |
Only the L² norm comes from an inner product.
Key Inequalities
Cauchy-Schwarz inequality:
|⟨u, v⟩| ≤ ‖u‖ · ‖v‖
Equality iff u and v are linearly dependent. This is perhaps the most important inequality in mathematics.
Triangle inequality:
‖u + v‖ ≤ ‖u‖ + ‖v‖
Follows from Cauchy-Schwarz.
Orthogonality
Two vectors are orthogonal (perpendicular) if ⟨u, v⟩ = 0, written u ⊥ v.
A set of vectors is orthogonal if every pair is orthogonal.
A set is orthonormal if orthogonal and every vector has unit norm.
Properties
- Orthogonal non-zero vectors are linearly independent.
- An orthonormal basis {e₁, ..., eₙ} satisfies ⟨eᵢ, eⱼ⟩ = δᵢⱼ (Kronecker delta).
- In an orthonormal basis, coordinates are simply: cᵢ = ⟨v, eᵢ⟩.
Orthogonal Complement
The orthogonal complement of subspace W:
W⊥ = {v ∈ V | ⟨v, w⟩ = 0 for all w ∈ W}
Properties:
- W⊥ is a subspace.
- dim(W) + dim(W⊥) = dim(V).
- V = W ⊕ W⊥ (direct sum).
- (W⊥)⊥ = W.
For a matrix A: (col(A))⊥ = null(Aᵀ) and (row(A))⊥ = null(A). This gives the fundamental theorem of linear algebra (four fundamental subspaces).
Gram-Schmidt Process
Converts any basis {v₁, ..., vₙ} into an orthonormal basis {e₁, ..., eₙ}.
Algorithm:
u₁ = v₁
u₂ = v₂ - proj_{u₁}(v₂)
u₃ = v₃ - proj_{u₁}(v₃) - proj_{u₂}(v₃)
⋮
uₖ = vₖ - Σᵢ₌₁ᵏ⁻¹ proj_{uᵢ}(vₖ)
eᵢ = uᵢ / ‖uᵢ‖
where proj_u(v) = (⟨v, u⟩ / ⟨u, u⟩) · u.
Time: O(n²m) for m vectors in ℝⁿ.
Numerical note: Classical Gram-Schmidt is numerically unstable. Modified Gram-Schmidt recomputes projections after each subtraction and is more stable. For best stability, use Householder reflections (QR decomposition).
Orthogonal Projection
The orthogonal projection of v onto subspace W is the closest point in W to v.
If {u₁, ..., uₖ} is an orthonormal basis for W:
proj_W(v) = Σᵢ₌₁ᵏ ⟨v, uᵢ⟩ · uᵢ
The projection matrix: P = UUᵀ where U = [u₁ | ... | uₖ].
Properties of projection matrices:
- P² = P (idempotent)
- Pᵀ = P (symmetric)
- I - P is the projection onto W⊥
Projection onto column space
For projection onto col(A):
proj = A(AᵀA)⁻¹Aᵀ v
The projection matrix is P = A(AᵀA)⁻¹Aᵀ.
QR Decomposition
Every m × n matrix A (with m ≥ n) can be factored as:
A = QR
where Q is m × n with orthonormal columns (Q'Q = I) and R is n × n upper triangular.
Computing QR:
- Gram-Schmidt: Apply to columns of A. Q = orthonormalized columns, R = upper triangular of coefficients. Numerically unstable.
- Householder reflections: Apply reflections to zero out below-diagonal entries. Numerically stable. O(2mn² - 2n³/3).
- Givens rotations: Zero out individual entries. Good for sparse matrices.
Applications:
- Solving least squares: Ax ≈ b → Rx = Qᵀb (back-substitution).
- Eigenvalue computation: QR algorithm.
- Orthonormalization.
Least Squares Approximation
When Ax = b has no exact solution (overdetermined system), find x that minimizes ‖Ax - b‖².
Normal Equations
The solution satisfies:
AᵀAx = Aᵀb
If AᵀA is invertible: x = (AᵀA)⁻¹Aᵀb.
The matrix A(AᵀA)⁻¹Aᵀ is the projection onto col(A). The residual b - Ax̂ is orthogonal to col(A).
Via QR Decomposition
A = QR, so AᵀAx = AᵀB becomes Rx = Qᵀb. Solve by back-substitution. More numerically stable than normal equations.
Via SVD
The most numerically robust method. See next file.
Geometric Interpretation
Least squares finds the projection of b onto the column space of A. The solution x̂ gives the coordinates in terms of the columns of A.
Ax̂ = proj_{col(A)}(b)
Example: Linear regression. Fit y = β₀ + β₁x to data points (xᵢ, yᵢ).
A = [1 x₁] b = [y₁]
[1 x₂] [y₂]
[ ⋮ ⋮] [ ⋮]
[1 xₙ] [yₙ]
β = (AᵀA)⁻¹Aᵀb
Fourier Series as Orthogonal Projection
The functions {1, cos(x), sin(x), cos(2x), sin(2x), ...} form an orthogonal set in C([−π, π]) with inner product ⟨f, g⟩ = ∫f(x)g(x)dx.
The Fourier series of f is its projection onto this orthogonal set:
f(x) ≈ a₀/2 + Σₙ (aₙ cos(nx) + bₙ sin(nx))
where aₙ = (1/π)⟨f, cos(nx)⟩ and bₙ = (1/π)⟨f, sin(nx)⟩.
This is a best approximation in the L² sense.
Real-World Applications
- Linear regression: Least squares fitting is orthogonal projection.
- Signal processing: Fourier analysis decomposes signals into orthogonal frequency components.
- Computer graphics: Orthogonal projection for rendering, normal vectors for lighting.
- Machine learning: Kernel methods define inner products on feature spaces. SVM uses the inner product to compute margins.
- Quantum mechanics: States are vectors in a Hilbert space (inner product space). Measurements are projections.
- Data compression: Project data onto principal components (orthogonal directions of maximum variance).
- Numerical methods: QR decomposition for solving linear systems, eigenvalue computation, and least squares.
- Error-correcting codes: Orthogonality of codewords enables error detection and correction.