Matrices

A matrix is a rectangular array of numbers arranged in rows and columns. Matrices represent linear transformations, systems of equations, graphs, and data.

Basic Definitions

An m × n matrix A has m rows and n columns:

A = [a₁₁  a₁₂  ...  a₁ₙ]
    [a₂₁  a₂₂  ...  a₂ₙ]
    [ ⋮    ⋮   ⋱    ⋮  ]
    [aₘ₁  aₘ₂  ...  aₘₙ]

We write A ∈ ℝᵐˣⁿ or Aᵢⱼ for the entry in row i, column j.

A square matrix has m = n.

Matrix Operations

Addition and Scalar Multiplication

Element-wise (matrices must have the same dimensions):

(A + B)ᵢⱼ = Aᵢⱼ + Bᵢⱼ
(cA)ᵢⱼ = c · Aᵢⱼ

Matrix Multiplication

For A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ, the product C = AB ∈ ℝᵐˣᵖ:

Cᵢⱼ = Σₖ₌₁ⁿ Aᵢₖ Bₖⱼ

Row i of A dotted with column j of B.

Properties:

Associative: (AB)C = A(BC)
Distributive: A(B + C) = AB + AC
NOT commutative: AB ≠ BA in general
Identity: AI = IA = A where I is the identity matrix
(AB)ᵀ = BᵀAᵀ (transpose reverses order)

Complexity: Naive multiplication of two n × n matrices: O(n³). Strassen: O(n^2.807). Best known: O(n^2.371) (theoretical).

Transpose

(Aᵀ)ᵢⱼ = Aⱼᵢ

Flips rows and columns.

Properties:

(Aᵀ)ᵀ = A
(A + B)ᵀ = Aᵀ + Bᵀ
(cA)ᵀ = cAᵀ
(AB)ᵀ = BᵀAᵀ

Special Matrix Types

Diagonal Matrix

Non-zero entries only on the main diagonal: Aᵢⱼ = 0 for i ≠ j.

D = diag(d₁, d₂, ..., dₙ) = [d₁  0  ...  0 ]
                               [0  d₂  ...  0 ]
                               [⋮   ⋮   ⋱   ⋮ ]
                               [0   0  ... dₙ ]

Diagonal matrices multiply by scaling each coordinate: Dv = (d₁v₁, d₂v₂, ..., dₙvₙ).

Triangular Matrices

Upper triangular: Aᵢⱼ = 0 for i > j (zeros below diagonal). Lower triangular: Aᵢⱼ = 0 for i < j (zeros above diagonal).

Properties:

Product of upper triangular matrices is upper triangular.
Determinant = product of diagonal entries.
Eigenvalues = diagonal entries.

Symmetric Matrix

Aᵀ = A (Aᵢⱼ = Aⱼᵢ).

Properties:

All eigenvalues are real.
Eigenvectors for distinct eigenvalues are orthogonal.
Always diagonalizable: A = QΛQᵀ where Q is orthogonal.
Positive definite if all eigenvalues > 0.

Applications: Covariance matrices, graph Laplacians, kernel matrices.

Orthogonal Matrix

QᵀQ = QQᵀ = I (columns and rows are orthonormal).

Properties:

Q⁻¹ = Qᵀ (inverse = transpose)
|det(Q)| = 1
Preserves lengths: ‖Qv‖ = ‖v‖
Preserves angles: (Qu) · (Qv) = u · v
Represents rotations (det = +1) and reflections (det = -1)

Unitary Matrix (Complex)

UU = UU = I where U* = Ūᵀ (conjugate transpose).

The complex analog of orthogonal matrices. Preserves the complex inner product.

Hermitian Matrix (Complex)

A* = A. Complex analog of symmetric matrices. All eigenvalues are real.

Row Echelon Form

Row Operations

Three elementary row operations (don't change the solution set of a linear system):

Swap two rows: Rᵢ ↔ Rⱼ
Scale a row by a non-zero constant: Rᵢ → cRᵢ
Add a multiple of one row to another: Rᵢ → Rᵢ + cRⱼ

Row Echelon Form (REF)

A matrix is in REF if:

All zero rows are at the bottom.
The leading entry (pivot) of each non-zero row is to the right of the pivot above it.
All entries below a pivot are zero.

[1  2  3  4]
[0  0  5  6]     (REF — pivots at columns 1, 3)
[0  0  0  7]
[0  0  0  0]

Reduced Row Echelon Form (RREF)

REF plus: 4. Each pivot is 1. 5. Each pivot is the only non-zero entry in its column.

[1  2  0  0]
[0  0  1  0]     (RREF)
[0  0  0  1]
[0  0  0  0]

RREF is unique for a given matrix.

Gaussian Elimination

Algorithm to convert a matrix to REF (or RREF):

Find the leftmost non-zero column (pivot column).
Swap rows to bring a non-zero entry to the pivot position.
Eliminate entries below the pivot using row operations.
Move to the next row and column; repeat.
(For RREF) Back-substitute to eliminate entries above pivots and scale pivots to 1.

Time: O(mn · min(m,n)) for an m × n matrix. O(n³) for square.

Solving Linear Systems

Augmented matrix [A | b]:

[1  2  1 | 5]     →     [1  0  0 | 1]
[2  3  1 | 8]     →     [0  1  0 | 1]     (RREF)
[3  5  2 | 13]    →     [0  0  1 | 2]

Solution: x₁ = 1, x₂ = 1, x₃ = 2.

Cases:

Unique solution: n pivots for n unknowns.
Infinitely many: Free variables (columns without pivots).
No solution: Pivot in the augmented column (inconsistent: 0 = non-zero).

Matrix Rank

The rank of A, written rank(A), is:

The number of pivots in REF.
The dimension of the column space.
The dimension of the row space.
The maximum number of linearly independent rows (or columns).

Properties:

rank(A) ≤ min(m, n)
rank(A) = rank(Aᵀ) (row rank = column rank)
rank(AB) ≤ min(rank(A), rank(B))
rank(A + B) ≤ rank(A) + rank(B)
Full rank: rank(A) = min(m, n)

Rank and Systems of Equations

For Ax = b:

Consistent iff rank(A) = rank([A|b]).
Unique solution iff rank(A) = n (number of unknowns).
The solution space of Ax = 0 has dimension n - rank(A) (nullity).

Matrix Inverse

For a square matrix A, the inverse A⁻¹ satisfies:

AA⁻¹ = A⁻¹A = I

A is invertible (non-singular) iff:

det(A) ≠ 0
rank(A) = n
Ax = 0 has only the trivial solution
Columns are linearly independent
0 is not an eigenvalue

Computing the Inverse

Method 1: Row reduce [A | I] to [I | A⁻¹].

Method 2: A⁻¹ = (1/det(A)) · adj(A) where adj(A) is the adjugate (matrix of cofactors, transposed). Useful for 2×2 and 3×3, impractical for larger.

2×2 inverse:

[a b]⁻¹ = (1/(ad-bc)) [ d  -b]
[c d]                   [-c   a]

Properties

(A⁻¹)⁻¹ = A
(AB)⁻¹ = B⁻¹A⁻¹ (reverse order)
(Aᵀ)⁻¹ = (A⁻¹)ᵀ
(cA)⁻¹ = (1/c)A⁻¹
det(A⁻¹) = 1/det(A)

Warning: In numerical computation, explicitly computing A⁻¹ is almost always a bad idea. Use LU decomposition or other factorizations instead.

Block Matrices

Matrices can be partitioned into blocks (sub-matrices):

M = [A  B]
    [C  D]

Block multiplication works if the blocks are compatible:

[A B][E F]   [AE+BG  AF+BH]
[C D][G H] = [CE+DG  CF+DH]

Schur complement: If D is invertible, the Schur complement of D in M is A - BD⁻¹C. Used in block LU decomposition, Gaussian elimination on block matrices, and control theory.

Sparse Matrices

A matrix where most entries are zero. Common in practice (graphs, PDEs, NLP).

Storage formats:

COO (Coordinate): Store (row, col, value) triples.
CSR (Compressed Sparse Row): Row pointers + column indices + values.
CSC (Compressed Sparse Column): Column pointers + row indices + values.

CSR is efficient for row access and matrix-vector multiplication. CSC for column access.

Real-World Applications

Systems of equations: Every linear system Ax = b is a matrix equation. Gaussian elimination is the workhorse.
Computer graphics: 4×4 matrices represent 3D transformations (rotation, translation, scaling, perspective projection).
Machine learning: Data stored as matrices (rows = samples, columns = features). Weight matrices in neural networks.
Graph theory: Adjacency matrix, Laplacian matrix, incidence matrix.
Image processing: Images are matrices of pixel values. Convolutions are matrix operations.
Cryptography: Hill cipher uses matrix multiplication mod 26. Lattice-based crypto uses matrix operations.
Economics: Input-output models (Leontief), transition matrices for economic models.
Quantum computing: Quantum gates are unitary matrices. State evolution is matrix multiplication.