Inner Product Spaces

Inner products generalize the dot product, providing notions of length, angle, distance, and orthogonality that are central to geometry, optimization, and approximation.

Inner Products

An inner product on a vector space V over F (ℝ or ℂ) is a function ⟨·, ·⟩: V × V → F satisfying:

Linearity in first argument: ⟨au + bv, w⟩ = a⟨u, w⟩ + b⟨v, w⟩
Conjugate symmetry: ⟨u, v⟩ = ⟨v, u⟩̄ (for ℝ: ⟨u, v⟩ = ⟨v, u⟩)
Positive definiteness: ⟨v, v⟩ ≥ 0, with equality iff v = 0

A vector space with an inner product is an inner product space.

Standard Inner Products

ℝⁿ (dot product):

⟨u, v⟩ = uᵀv = Σᵢ uᵢvᵢ

ℂⁿ (Hermitian inner product):

⟨u, v⟩ = u*v = Σᵢ ūᵢvᵢ

Function space C([a,b]):

⟨f, g⟩ = ∫ₐᵇ f(x)g(x) dx

Weighted inner product: ⟨u, v⟩_W = uᵀWv for positive definite W.

Norm	Definition	Notation
L¹ (Manhattan)	Σ \|vᵢ\|	‖v‖₁
L² (Euclidean)	√(Σ vᵢ²)	‖v‖₂
L∞ (Chebyshev)	max \|vᵢ\|	‖v‖_∞
Lᵖ	(Σ \|vᵢ\|ᵖ)^(1/p)	‖v‖_p
Frobenius (matrices)	√(Σ aᵢⱼ²)	‖A‖_F

Orthogonality

Two vectors are orthogonal (perpendicular) if ⟨u, v⟩ = 0, written u ⊥ v.

A set of vectors is orthogonal if every pair is orthogonal.

A set is orthonormal if orthogonal and every vector has unit norm.

Properties

Orthogonal non-zero vectors are linearly independent.
An orthonormal basis {e₁, ..., eₙ} satisfies ⟨eᵢ, eⱼ⟩ = δᵢⱼ (Kronecker delta).
In an orthonormal basis, coordinates are simply: cᵢ = ⟨v, eᵢ⟩.

Orthogonal Complement

The orthogonal complement of subspace W:

W⊥ = {v ∈ V | ⟨v, w⟩ = 0 for all w ∈ W}

Properties:

W⊥ is a subspace.
dim(W) + dim(W⊥) = dim(V).
V = W ⊕ W⊥ (direct sum).
(W⊥)⊥ = W.

For a matrix A: (col(A))⊥ = null(Aᵀ) and (row(A))⊥ = null(A). This gives the fundamental theorem of linear algebra (four fundamental subspaces).

Gram-Schmidt Process

Converts any basis {v₁, ..., vₙ} into an orthonormal basis {e₁, ..., eₙ}.

Algorithm:

u₁ = v₁
u₂ = v₂ - proj_{u₁}(v₂)
u₃ = v₃ - proj_{u₁}(v₃) - proj_{u₂}(v₃)
⋮
uₖ = vₖ - Σᵢ₌₁ᵏ⁻¹ proj_{uᵢ}(vₖ)

eᵢ = uᵢ / ‖uᵢ‖

where proj_u(v) = (⟨v, u⟩ / ⟨u, u⟩) · u.

Time: O(n²m) for m vectors in ℝⁿ.

Numerical note: Classical Gram-Schmidt is numerically unstable. Modified Gram-Schmidt recomputes projections after each subtraction and is more stable. For best stability, use Householder reflections (QR decomposition).

Orthogonal Projection

The orthogonal projection of v onto subspace W is the closest point in W to v.

If {u₁, ..., uₖ} is an orthonormal basis for W:

proj_W(v) = Σᵢ₌₁ᵏ ⟨v, uᵢ⟩ · uᵢ

The projection matrix: P = UUᵀ where U = [u₁ | ... | uₖ].

Properties of projection matrices:

P² = P (idempotent)
Pᵀ = P (symmetric)
I - P is the projection onto W⊥

Projection onto column space

For projection onto col(A):

proj = A(AᵀA)⁻¹Aᵀ v

The projection matrix is P = A(AᵀA)⁻¹Aᵀ.

QR Decomposition

Every m × n matrix A (with m ≥ n) can be factored as:

A = QR

where Q is m × n with orthonormal columns (Q'Q = I) and R is n × n upper triangular.

Computing QR:

Gram-Schmidt: Apply to columns of A. Q = orthonormalized columns, R = upper triangular of coefficients. Numerically unstable.
Householder reflections: Apply reflections to zero out below-diagonal entries. Numerically stable. O(2mn² - 2n³/3).
Givens rotations: Zero out individual entries. Good for sparse matrices.

Applications:

Solving least squares: Ax ≈ b → Rx = Qᵀb (back-substitution).
Eigenvalue computation: QR algorithm.
Orthonormalization.

Least Squares Approximation

When Ax = b has no exact solution (overdetermined system), find x that minimizes ‖Ax - b‖².

Normal Equations

The solution satisfies:

AᵀAx = Aᵀb

If AᵀA is invertible: x = (AᵀA)⁻¹Aᵀb.

The matrix A(AᵀA)⁻¹Aᵀ is the projection onto col(A). The residual b - Ax̂ is orthogonal to col(A).

Ax̂ = proj_{col(A)}(b)

Example: Linear regression. Fit y = β₀ + β₁x to data points (xᵢ, yᵢ).

A = [1  x₁]     b = [y₁]
    [1  x₂]         [y₂]
    [ ⋮   ⋮]         [ ⋮]
    [1  xₙ]         [yₙ]

β = (AᵀA)⁻¹Aᵀb

Fourier Series as Orthogonal Projection

The functions {1, cos(x), sin(x), cos(2x), sin(2x), ...} form an orthogonal set in C([−π, π]) with inner product ⟨f, g⟩ = ∫f(x)g(x)dx.

The Fourier series of f is its projection onto this orthogonal set:

f(x) ≈ a₀/2 + Σₙ (aₙ cos(nx) + bₙ sin(nx))

where aₙ = (1/π)⟨f, cos(nx)⟩ and bₙ = (1/π)⟨f, sin(nx)⟩.

This is a best approximation in the L² sense.

Real-World Applications

Linear regression: Least squares fitting is orthogonal projection.
Signal processing: Fourier analysis decomposes signals into orthogonal frequency components.
Computer graphics: Orthogonal projection for rendering, normal vectors for lighting.
Machine learning: Kernel methods define inner products on feature spaces. SVM uses the inner product to compute margins.
Quantum mechanics: States are vectors in a Hilbert space (inner product space). Measurements are projections.
Data compression: Project data onto principal components (orthogonal directions of maximum variance).
Numerical methods: QR decomposition for solving linear systems, eigenvalue computation, and least squares.
Error-correcting codes: Orthogonality of codewords enables error detection and correction.