Linear Transformations

A linear transformation (or linear map) is a function between vector spaces that preserves the two fundamental operations: addition and scalar multiplication.

Definition

A function T: V → W is a linear transformation if for all u, v ∈ V and c ∈ F:

T(u + v) = T(u) + T(v)      (preserves addition)
T(cv) = cT(v)                (preserves scalar multiplication)

Equivalently (combined): T(au + bv) = aT(u) + bT(v) for all scalars a, b.

Consequence: T(0) = 0 (linear maps always map the zero vector to zero).

Examples

Rotation in ℝ² by angle θ:

T(x, y) = (x cos θ - y sin θ, x sin θ + y cos θ)

Projection onto the x-axis: T(x, y) = (x, 0).

Scaling by factor c: T(v) = cv.

Differentiation: T: Pₙ → Pₙ₋₁, T(p) = p' (derivative). Linear because (f + g)' = f' + g' and (cf)' = cf'.

Integration: T: C([0,1]) → ℝ, T(f) = ∫₀¹ f(x)dx. Linear by linearity of integration.

Matrix multiplication: T(x) = Ax for fixed matrix A. This is the canonical example — every linear transformation between finite-dimensional spaces can be represented this way.

Non-Examples

T(x) = x + 1 (not linear: T(0) = 1 ≠ 0)
T(x) = x² (not linear: T(2x) = 4x² ≠ 2x² = 2T(x))
T(x) = |x| (not linear: T(-1 + 1) = 0 ≠ 2 = T(-1) + T(1))

Kernel (Null Space)

The kernel (or null space) of T: V → W:

ker(T) = {v ∈ V | T(v) = 0}

The kernel is always a subspace of V.

Proof: 0 ∈ ker(T) since T(0) = 0. If u, v ∈ ker(T), then T(u + v) = T(u) + T(v) = 0. If v ∈ ker(T) and c ∈ F, then T(cv) = cT(v) = 0. ∎

For T(x) = Ax: ker(T) = null space of A = solution set of Ax = 0.

T is injective (one-to-one) iff ker(T) = {0}.

Proof: (⟹) If T(v) = 0 = T(0) and T is injective, then v = 0. (⟸) If T(u) = T(v), then T(u - v) = 0, so u - v ∈ ker(T) = {0}, so u = v. ∎

Image (Range)

The image (or range) of T: V → W:

im(T) = {T(v) | v ∈ V} = {w ∈ W | ∃v ∈ V, T(v) = w}

The image is always a subspace of W.

For T(x) = Ax: im(T) = column space of A = span of columns of A.

T is surjective (onto) iff im(T) = W.

Rank-Nullity Theorem

The most important theorem about linear transformations:

dim(V) = dim(ker(T)) + dim(im(T))

Or equivalently:

dim(V) = nullity(T) + rank(T)

where nullity = dim(ker(T)) and rank = dim(im(T)).

For a matrix A ∈ ℝᵐˣⁿ:

n = nullity(A) + rank(A)

Example: A is 5 × 3 with rank 2. Then nullity = 3 - 2 = 1 (the null space is 1-dimensional).

Consequences

If dim(V) = dim(W) and T is injective, then T is also surjective (and vice versa).
If dim(V) > dim(W), T cannot be injective (ker must be non-trivial).
If dim(V) < dim(W), T cannot be surjective (im is too small).

Matrix Representation

Every linear transformation T: ℝⁿ → ℝᵐ can be represented by a unique m × n matrix A such that T(x) = Ax.

Finding the Matrix

If T: V → W with bases B = {v₁, ..., vₙ} and C = {w₁, ..., wₘ}:

The matrix [T]_B^C has columns that are the C-coordinates of T(v₁), T(v₂), ..., T(vₙ):

Column j of [T]_B^C = [T(vⱼ)]_C

Standard matrix: When B and C are standard bases, A = [T(e₁) | T(e₂) | ... | T(eₙ)].

Example: T: ℝ² → ℝ², T(x,y) = (2x + y, x - y).

T(e₁) = T(1,0) = (2, 1)
T(e₂) = T(0,1) = (1, -1)

A = [2   1]
    [1  -1]

Change of Basis for Transformations

If B and B' are bases for V, and [T]_B is the matrix of T in basis B:

[T]_{B'} = P⁻¹ [T]_B P

where P is the change of basis matrix from B' to B.

Two matrices A and B are similar (A = P⁻¹BP for some invertible P) iff they represent the same linear transformation in different bases.

Composition

If T: U → V and S: V → W, the composition S ∘ T: U → W is linear:

(S ∘ T)(v) = S(T(v))

The matrix of S ∘ T is the product of matrices: [S ∘ T] = [S][T].

This is why matrix multiplication is defined the way it is.

Isomorphisms

A linear transformation T: V → W is an isomorphism if it is bijective (both injective and surjective).

Equivalently: T is invertible, and T⁻¹ is also linear.

For finite-dimensional spaces: V ≅ W iff dim(V) = dim(W).

In particular: Every n-dimensional vector space over F is isomorphic to Fⁿ.

Consequence: The choice of basis establishes an isomorphism between V and Fⁿ. This is why we can "always work with matrices."

Similarity

Matrices A and B are similar if there exists an invertible P such that B = P⁻¹AP.

Similar matrices represent the same linear transformation in different bases.

Similar matrices share:

Determinant
Trace
Eigenvalues (with same multiplicities)
Rank
Characteristic polynomial
Minimal polynomial

The goal of many algorithms (diagonalization, Jordan form) is to find a "nice" similar matrix.

Important Linear Transformations

Transformation	Matrix (2D)	Effect
Identity	I	No change
Scaling by c	cI	Uniform scale
Rotation by θ	[[cos θ, -sin θ], [sin θ, cos θ]]	Rotate counterclockwise
Reflection over x-axis	[[1,0],[0,-1]]	Flip vertically
Shear	[[1,k],[0,1]]	Horizontal shear
Projection onto x-axis	[[1,0],[0,0]]	Collapse to x-axis

In 3D (and higher), transformations compose by matrix multiplication, which is the foundation of computer graphics pipelines.

Dual Space

The dual space V* is the vector space of all linear functionals f: V → F.

dim(V*) = dim(V).

The dual basis {f₁, ..., fₙ} satisfies fᵢ(vⱼ) = δᵢⱼ (Kronecker delta).

The dual space is important in:

Optimization (Lagrange multipliers are dual vectors)
Differential geometry (covectors, 1-forms)
Quantum mechanics (bra vectors ⟨ψ|)
Functional analysis

Real-World Applications

Computer graphics: All 3D transformations (rotation, translation, scaling, projection) are linear maps represented as 4×4 matrices (using homogeneous coordinates for translation).
Machine learning: Neural network layers are linear transformations followed by non-linear activations: y = σ(Wx + b).
Image processing: Convolution (a linear operation) filters images. The blur, sharpen, and edge-detect operations are linear transformations.
Control theory: State-space models use x(t+1) = Ax(t) + Bu(t). The matrix A describes system dynamics.
Quantum computing: Quantum gates are unitary linear transformations on qubit state vectors.
Signal processing: The Fourier transform is a linear transformation. So are wavelets, DCT, etc.
Data science: Dimensionality reduction (PCA) projects data onto a lower-dimensional subspace via a linear map.