6 min read
On this page

Matrices

A matrix is a rectangular array of numbers arranged in rows and columns. Matrices represent linear transformations, systems of equations, graphs, and data.

Basic Definitions

An m × n matrix A has m rows and n columns:

A = [a₁₁  a₁₂  ...  a₁ₙ]
    [a₂₁  a₂₂  ...  a₂ₙ]
    [ ⋮    ⋮   ⋱    ⋮  ]
    [aₘ₁  aₘ₂  ...  aₘₙ]

We write A ∈ ℝᵐˣⁿ or Aᵢⱼ for the entry in row i, column j.

A square matrix has m = n.

Matrix Operations

Addition and Scalar Multiplication

Element-wise (matrices must have the same dimensions):

(A + B)ᵢⱼ = Aᵢⱼ + Bᵢⱼ
(cA)ᵢⱼ = c · Aᵢⱼ

Matrix Multiplication

For A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ, the product C = AB ∈ ℝᵐˣᵖ:

Cᵢⱼ = Σₖ₌₁ⁿ Aᵢₖ Bₖⱼ

Row i of A dotted with column j of B.

Properties:

  • Associative: (AB)C = A(BC)
  • Distributive: A(B + C) = AB + AC
  • NOT commutative: AB ≠ BA in general
  • Identity: AI = IA = A where I is the identity matrix
  • (AB)ᵀ = BᵀAᵀ (transpose reverses order)

Complexity: Naive multiplication of two n × n matrices: O(n³). Strassen: O(n^2.807). Best known: O(n^2.371) (theoretical).

Transpose

(Aᵀ)ᵢⱼ = Aⱼᵢ

Flips rows and columns.

Properties:

  • (Aᵀ)ᵀ = A
  • (A + B)ᵀ = Aᵀ + Bᵀ
  • (cA)ᵀ = cAᵀ
  • (AB)ᵀ = BᵀAᵀ

Special Matrix Types

Diagonal Matrix

Non-zero entries only on the main diagonal: Aᵢⱼ = 0 for i ≠ j.

D = diag(d₁, d₂, ..., dₙ) = [d₁  0  ...  0 ]
                               [0  d₂  ...  0 ]
                               [⋮   ⋮   ⋱   ⋮ ]
                               [0   0  ... dₙ ]

Diagonal matrices multiply by scaling each coordinate: Dv = (d₁v₁, d₂v₂, ..., dₙvₙ).

Triangular Matrices

Upper triangular: Aᵢⱼ = 0 for i > j (zeros below diagonal). Lower triangular: Aᵢⱼ = 0 for i < j (zeros above diagonal).

Properties:

  • Product of upper triangular matrices is upper triangular.
  • Determinant = product of diagonal entries.
  • Eigenvalues = diagonal entries.

Symmetric Matrix

Aᵀ = A (Aᵢⱼ = Aⱼᵢ).

Properties:

  • All eigenvalues are real.
  • Eigenvectors for distinct eigenvalues are orthogonal.
  • Always diagonalizable: A = QΛQᵀ where Q is orthogonal.
  • Positive definite if all eigenvalues > 0.

Applications: Covariance matrices, graph Laplacians, kernel matrices.

Orthogonal Matrix

QᵀQ = QQᵀ = I (columns and rows are orthonormal).

Properties:

  • Q⁻¹ = Qᵀ (inverse = transpose)
  • |det(Q)| = 1
  • Preserves lengths: ‖Qv‖ = ‖v‖
  • Preserves angles: (Qu) · (Qv) = u · v
  • Represents rotations (det = +1) and reflections (det = -1)

Unitary Matrix (Complex)

UU = UU = I where U* = Ūᵀ (conjugate transpose).

The complex analog of orthogonal matrices. Preserves the complex inner product.

Hermitian Matrix (Complex)

A* = A. Complex analog of symmetric matrices. All eigenvalues are real.

Row Echelon Form

Row Operations

Three elementary row operations (don't change the solution set of a linear system):

  1. Swap two rows: Rᵢ ↔ Rⱼ
  2. Scale a row by a non-zero constant: Rᵢ → cRᵢ
  3. Add a multiple of one row to another: Rᵢ → Rᵢ + cRⱼ

Row Echelon Form (REF)

A matrix is in REF if:

  1. All zero rows are at the bottom.
  2. The leading entry (pivot) of each non-zero row is to the right of the pivot above it.
  3. All entries below a pivot are zero.
[1  2  3  4]
[0  0  5  6]     (REF — pivots at columns 1, 3)
[0  0  0  7]
[0  0  0  0]

Reduced Row Echelon Form (RREF)

REF plus: 4. Each pivot is 1. 5. Each pivot is the only non-zero entry in its column.

[1  2  0  0]
[0  0  1  0]     (RREF)
[0  0  0  1]
[0  0  0  0]

RREF is unique for a given matrix.

Gaussian Elimination

Algorithm to convert a matrix to REF (or RREF):

  1. Find the leftmost non-zero column (pivot column).
  2. Swap rows to bring a non-zero entry to the pivot position.
  3. Eliminate entries below the pivot using row operations.
  4. Move to the next row and column; repeat.
  5. (For RREF) Back-substitute to eliminate entries above pivots and scale pivots to 1.

Time: O(mn · min(m,n)) for an m × n matrix. O(n³) for square.

Solving Linear Systems

Augmented matrix [A | b]:

[1  2  1 | 5]     →     [1  0  0 | 1]
[2  3  1 | 8]     →     [0  1  0 | 1]     (RREF)
[3  5  2 | 13]    →     [0  0  1 | 2]

Solution: x₁ = 1, x₂ = 1, x₃ = 2.

Cases:

  • Unique solution: n pivots for n unknowns.
  • Infinitely many: Free variables (columns without pivots).
  • No solution: Pivot in the augmented column (inconsistent: 0 = non-zero).

Matrix Rank

The rank of A, written rank(A), is:

  • The number of pivots in REF.
  • The dimension of the column space.
  • The dimension of the row space.
  • The maximum number of linearly independent rows (or columns).

Properties:

  • rank(A) ≤ min(m, n)
  • rank(A) = rank(Aᵀ) (row rank = column rank)
  • rank(AB) ≤ min(rank(A), rank(B))
  • rank(A + B) ≤ rank(A) + rank(B)
  • Full rank: rank(A) = min(m, n)

Rank and Systems of Equations

For Ax = b:

  • Consistent iff rank(A) = rank([A|b]).
  • Unique solution iff rank(A) = n (number of unknowns).
  • The solution space of Ax = 0 has dimension n - rank(A) (nullity).

Matrix Inverse

For a square matrix A, the inverse A⁻¹ satisfies:

AA⁻¹ = A⁻¹A = I

A is invertible (non-singular) iff:

  • det(A) ≠ 0
  • rank(A) = n
  • Ax = 0 has only the trivial solution
  • Columns are linearly independent
  • 0 is not an eigenvalue

Computing the Inverse

Method 1: Row reduce [A | I] to [I | A⁻¹].

Method 2: A⁻¹ = (1/det(A)) · adj(A) where adj(A) is the adjugate (matrix of cofactors, transposed). Useful for 2×2 and 3×3, impractical for larger.

2×2 inverse:

[a b]⁻¹ = (1/(ad-bc)) [ d  -b]
[c d]                   [-c   a]

Properties

  • (A⁻¹)⁻¹ = A
  • (AB)⁻¹ = B⁻¹A⁻¹ (reverse order)
  • (Aᵀ)⁻¹ = (A⁻¹)ᵀ
  • (cA)⁻¹ = (1/c)A⁻¹
  • det(A⁻¹) = 1/det(A)

Warning: In numerical computation, explicitly computing A⁻¹ is almost always a bad idea. Use LU decomposition or other factorizations instead.

Block Matrices

Matrices can be partitioned into blocks (sub-matrices):

M = [A  B]
    [C  D]

Block multiplication works if the blocks are compatible:

[A B][E F]   [AE+BG  AF+BH]
[C D][G H] = [CE+DG  CF+DH]

Schur complement: If D is invertible, the Schur complement of D in M is A - BD⁻¹C. Used in block LU decomposition, Gaussian elimination on block matrices, and control theory.

Sparse Matrices

A matrix where most entries are zero. Common in practice (graphs, PDEs, NLP).

Storage formats:

  • COO (Coordinate): Store (row, col, value) triples.
  • CSR (Compressed Sparse Row): Row pointers + column indices + values.
  • CSC (Compressed Sparse Column): Column pointers + row indices + values.

CSR is efficient for row access and matrix-vector multiplication. CSC for column access.

Real-World Applications

  • Systems of equations: Every linear system Ax = b is a matrix equation. Gaussian elimination is the workhorse.
  • Computer graphics: 4×4 matrices represent 3D transformations (rotation, translation, scaling, perspective projection).
  • Machine learning: Data stored as matrices (rows = samples, columns = features). Weight matrices in neural networks.
  • Graph theory: Adjacency matrix, Laplacian matrix, incidence matrix.
  • Image processing: Images are matrices of pixel values. Convolutions are matrix operations.
  • Cryptography: Hill cipher uses matrix multiplication mod 26. Lattice-based crypto uses matrix operations.
  • Economics: Input-output models (Leontief), transition matrices for economic models.
  • Quantum computing: Quantum gates are unitary matrices. State evolution is matrix multiplication.