Linear Transformations
A linear transformation (or linear map) is a function between vector spaces that preserves the two fundamental operations: addition and scalar multiplication.
Definition
A function T: V → W is a linear transformation if for all u, v ∈ V and c ∈ F:
T(u + v) = T(u) + T(v) (preserves addition)
T(cv) = cT(v) (preserves scalar multiplication)
Equivalently (combined): T(au + bv) = aT(u) + bT(v) for all scalars a, b.
Consequence: T(0) = 0 (linear maps always map the zero vector to zero).
Examples
Rotation in ℝ² by angle θ:
T(x, y) = (x cos θ - y sin θ, x sin θ + y cos θ)
Projection onto the x-axis: T(x, y) = (x, 0).
Scaling by factor c: T(v) = cv.
Differentiation: T: Pₙ → Pₙ₋₁, T(p) = p' (derivative). Linear because (f + g)' = f' + g' and (cf)' = cf'.
Integration: T: C([0,1]) → ℝ, T(f) = ∫₀¹ f(x)dx. Linear by linearity of integration.
Matrix multiplication: T(x) = Ax for fixed matrix A. This is the canonical example — every linear transformation between finite-dimensional spaces can be represented this way.
Non-Examples
- T(x) = x + 1 (not linear: T(0) = 1 ≠ 0)
- T(x) = x² (not linear: T(2x) = 4x² ≠ 2x² = 2T(x))
- T(x) = |x| (not linear: T(-1 + 1) = 0 ≠ 2 = T(-1) + T(1))
Kernel (Null Space)
The kernel (or null space) of T: V → W:
ker(T) = {v ∈ V | T(v) = 0}
The kernel is always a subspace of V.
Proof: 0 ∈ ker(T) since T(0) = 0. If u, v ∈ ker(T), then T(u + v) = T(u) + T(v) = 0. If v ∈ ker(T) and c ∈ F, then T(cv) = cT(v) = 0. ∎
For T(x) = Ax: ker(T) = null space of A = solution set of Ax = 0.
T is injective (one-to-one) iff ker(T) = {0}.
Proof: (⟹) If T(v) = 0 = T(0) and T is injective, then v = 0. (⟸) If T(u) = T(v), then T(u - v) = 0, so u - v ∈ ker(T) = {0}, so u = v. ∎
Image (Range)
The image (or range) of T: V → W:
im(T) = {T(v) | v ∈ V} = {w ∈ W | ∃v ∈ V, T(v) = w}
The image is always a subspace of W.
For T(x) = Ax: im(T) = column space of A = span of columns of A.
T is surjective (onto) iff im(T) = W.
Rank-Nullity Theorem
The most important theorem about linear transformations:
dim(V) = dim(ker(T)) + dim(im(T))
Or equivalently:
dim(V) = nullity(T) + rank(T)
where nullity = dim(ker(T)) and rank = dim(im(T)).
For a matrix A ∈ ℝᵐˣⁿ:
n = nullity(A) + rank(A)
Example: A is 5 × 3 with rank 2. Then nullity = 3 - 2 = 1 (the null space is 1-dimensional).
Consequences
- If dim(V) = dim(W) and T is injective, then T is also surjective (and vice versa).
- If dim(V) > dim(W), T cannot be injective (ker must be non-trivial).
- If dim(V) < dim(W), T cannot be surjective (im is too small).
Matrix Representation
Every linear transformation T: ℝⁿ → ℝᵐ can be represented by a unique m × n matrix A such that T(x) = Ax.
Finding the Matrix
If T: V → W with bases B = {v₁, ..., vₙ} and C = {w₁, ..., wₘ}:
The matrix [T]_B^C has columns that are the C-coordinates of T(v₁), T(v₂), ..., T(vₙ):
Column j of [T]_B^C = [T(vⱼ)]_C
Standard matrix: When B and C are standard bases, A = [T(e₁) | T(e₂) | ... | T(eₙ)].
Example: T: ℝ² → ℝ², T(x,y) = (2x + y, x - y).
T(e₁) = T(1,0) = (2, 1)
T(e₂) = T(0,1) = (1, -1)
A = [2 1]
[1 -1]
Change of Basis for Transformations
If B and B' are bases for V, and [T]_B is the matrix of T in basis B:
[T]_{B'} = P⁻¹ [T]_B P
where P is the change of basis matrix from B' to B.
Two matrices A and B are similar (A = P⁻¹BP for some invertible P) iff they represent the same linear transformation in different bases.
Composition
If T: U → V and S: V → W, the composition S ∘ T: U → W is linear:
(S ∘ T)(v) = S(T(v))
The matrix of S ∘ T is the product of matrices: [S ∘ T] = [S][T].
This is why matrix multiplication is defined the way it is.
Isomorphisms
A linear transformation T: V → W is an isomorphism if it is bijective (both injective and surjective).
Equivalently: T is invertible, and T⁻¹ is also linear.
For finite-dimensional spaces: V ≅ W iff dim(V) = dim(W).
In particular: Every n-dimensional vector space over F is isomorphic to Fⁿ.
Consequence: The choice of basis establishes an isomorphism between V and Fⁿ. This is why we can "always work with matrices."
Similarity
Matrices A and B are similar if there exists an invertible P such that B = P⁻¹AP.
Similar matrices represent the same linear transformation in different bases.
Similar matrices share:
- Determinant
- Trace
- Eigenvalues (with same multiplicities)
- Rank
- Characteristic polynomial
- Minimal polynomial
The goal of many algorithms (diagonalization, Jordan form) is to find a "nice" similar matrix.
Important Linear Transformations
| Transformation | Matrix (2D) | Effect | |---|---|---| | Identity | I | No change | | Scaling by c | cI | Uniform scale | | Rotation by θ | [[cos θ, -sin θ], [sin θ, cos θ]] | Rotate counterclockwise | | Reflection over x-axis | [[1,0],[0,-1]] | Flip vertically | | Shear | [[1,k],[0,1]] | Horizontal shear | | Projection onto x-axis | [[1,0],[0,0]] | Collapse to x-axis |
In 3D (and higher), transformations compose by matrix multiplication, which is the foundation of computer graphics pipelines.
Dual Space
The dual space V* is the vector space of all linear functionals f: V → F.
dim(V*) = dim(V).
The dual basis {f₁, ..., fₙ} satisfies fᵢ(vⱼ) = δᵢⱼ (Kronecker delta).
The dual space is important in:
- Optimization (Lagrange multipliers are dual vectors)
- Differential geometry (covectors, 1-forms)
- Quantum mechanics (bra vectors ⟨ψ|)
- Functional analysis
Real-World Applications
- Computer graphics: All 3D transformations (rotation, translation, scaling, projection) are linear maps represented as 4×4 matrices (using homogeneous coordinates for translation).
- Machine learning: Neural network layers are linear transformations followed by non-linear activations: y = σ(Wx + b).
- Image processing: Convolution (a linear operation) filters images. The blur, sharpen, and edge-detect operations are linear transformations.
- Control theory: State-space models use x(t+1) = Ax(t) + Bu(t). The matrix A describes system dynamics.
- Quantum computing: Quantum gates are unitary linear transformations on qubit state vectors.
- Signal processing: The Fourier transform is a linear transformation. So are wavelets, DCT, etc.
- Data science: Dimensionality reduction (PCA) projects data onto a lower-dimensional subspace via a linear map.