Trees
Trees are the most important graph structure in computer science — they model hierarchies, enable efficient search, and underpin countless algorithms.
Definition and Properties
A tree is a connected acyclic graph.
Equivalent Definitions
For a graph G with n vertices, the following are equivalent:
- G is a tree (connected and acyclic).
- G is connected and has exactly n - 1 edges.
- G is acyclic and has exactly n - 1 edges.
- There is exactly one path between every pair of vertices.
- G is connected, but removing any edge disconnects it (every edge is a bridge).
- G is acyclic, but adding any edge creates exactly one cycle.
Proof sketch (1 ⟹ 2): By induction on n. Base: n = 1 has 0 edges. For n ≥ 2, a tree has a leaf (vertex of degree 1) — removing it gives a tree with n - 1 vertices and n - 2 edges. So the original has n - 1 edges. ∎
Leaf Theorem
Every tree with n ≥ 2 vertices has at least 2 leaves (vertices of degree 1).
Proof: Sum of degrees = 2(n-1) = 2n - 2. If at most 1 leaf, then at least n - 1 vertices have degree ≥ 2. Sum of degrees ≥ 1 + 2(n-1) = 2n - 1 > 2n - 2. Contradiction. ∎
Forests
A forest is an acyclic graph (not necessarily connected). A forest is a disjoint union of trees.
A forest with n vertices and k components has exactly n - k edges.
Rooted Trees
A rooted tree has a designated vertex called the root. This induces a natural hierarchy:
- Parent: The vertex adjacent to v on the path from v to the root.
- Child: A vertex for which v is the parent.
- Siblings: Vertices with the same parent.
- Ancestor: Any vertex on the path from v to the root (including v itself).
- Descendant: Any vertex in the subtree rooted at v.
- Leaf: A vertex with no children.
- Internal node: A vertex with at least one child.
- Depth (level) of v: Length of the path from root to v.
- Height of v: Length of the longest path from v to any leaf in its subtree.
- Height of the tree: Height of the root = maximum depth of any leaf.
Ordered Trees
A rooted tree where the children of each node have a specific left-to-right ordering. Most trees in CS are ordered.
Binary Trees
A binary tree is a rooted tree where each node has at most 2 children (left child and right child).
Types
Full binary tree: Every node has 0 or 2 children (no node has exactly 1 child).
Complete binary tree: All levels are fully filled except possibly the last, which is filled left-to-right.
Perfect binary tree: All internal nodes have 2 children and all leaves are at the same depth.
Properties
For a binary tree with n nodes, ℓ leaves, and height h:
| Property | Formula | |---|---| | Full binary tree: leaves vs internals | ℓ = i + 1, n = 2ℓ - 1 | | Perfect binary tree of height h | n = 2^(h+1) - 1, ℓ = 2^h | | Minimum height | h ≥ ⌈log₂(n+1)⌉ - 1 | | Maximum height | h ≤ n - 1 (degenerate/skewed) | | Leaves in any binary tree | ℓ ≤ 2^h |
Counting Binary Trees
The number of distinct binary trees with n nodes is the Catalan number:
Cₙ = C(2n, n) / (n + 1)
C₀ = 1, C₁ = 1, C₂ = 2, C₃ = 5, C₄ = 14.
This counts structurally distinct trees (not labeled).
Tree Traversals
For Binary Trees
In-order (Left, Root, Right):
fn inorder(node):
if node is null: return
inorder(node.left)
visit(node)
inorder(node.right)
For BSTs, in-order gives sorted order.
Pre-order (Root, Left, Right):
fn preorder(node):
if node is null: return
visit(node)
preorder(node.left)
preorder(node.right)
Useful for: serialization, copying a tree, prefix notation.
Post-order (Left, Right, Root):
fn postorder(node):
if node is null: return
postorder(node.left)
postorder(node.right)
visit(node)
Useful for: deletion, evaluating expressions, computing sizes.
Level-order (BFS):
fn levelorder(root):
queue = [root]
while queue not empty:
node = queue.dequeue()
visit(node)
if node.left: queue.enqueue(node.left)
if node.right: queue.enqueue(node.right)
Reconstruction
A binary tree can be uniquely reconstructed from:
- In-order + pre-order
- In-order + post-order
But not from pre-order + post-order alone (ambiguous for trees with single children).
Spanning Trees
A spanning tree of a connected graph G is a subgraph that:
- Is a tree (connected, acyclic).
- Contains all vertices of G.
Every connected graph has a spanning tree. It has exactly n - 1 edges.
Counting Spanning Trees
Cayley's Formula: The number of labeled spanning trees of the complete graph Kₙ is:
τ(Kₙ) = nⁿ⁻²
For n = 4: 4² = 16 spanning trees of K₄.
Kirchhoff's Matrix Tree Theorem: For any graph G, the number of spanning trees equals any cofactor of the Laplacian matrix L = D - A, where D is the degree matrix and A is the adjacency matrix.
Prüfer Sequences
A Prüfer sequence provides a bijection between labeled trees on n vertices and sequences of length n - 2 over {1, 2, ..., n}.
Encoding (tree → sequence):
- Find the leaf with the smallest label.
- Record the label of its neighbor.
- Remove the leaf.
- Repeat until 2 vertices remain.
Decoding (sequence → tree): Reverse the process using the observation that vertices not in the sequence (at each step) are leaves.
This bijection proves Cayley's formula: there are nⁿ⁻² sequences of length n - 2 over n symbols.
Minimum Spanning Trees
Given a connected weighted graph, a minimum spanning tree (MST) is a spanning tree with the smallest total edge weight.
Cut Property
For any cut of the graph (partition of vertices into two non-empty sets), the lightest edge crossing the cut is in every MST (assuming unique edge weights).
Cycle Property
For any cycle in the graph, the heaviest edge in the cycle is not in any MST (assuming unique edge weights).
Kruskal's Algorithm
- Sort edges by weight.
- For each edge (in order), add it to the MST if it doesn't create a cycle.
Uses Union-Find for cycle detection. Time: O(E log E).
FUNCTION KRUSKAL(n, edges)
SORT edges by weight
uf ← new UNION_FIND(n)
mst ← empty list
FOR EACH (w, u, v) IN edges DO
IF FIND(uf, u) ≠ FIND(uf, v) THEN
UNION(uf, u, v)
APPEND (w, u, v) TO mst
RETURN mst
Prim's Algorithm
- Start with any vertex. Maintain a set S of visited vertices.
- Repeatedly add the lightest edge connecting S to V \ S.
- Until all vertices are in S.
Uses a priority queue. Time: O(E log V) with a binary heap, O(E + V log V) with a Fibonacci heap.
Borůvka's Algorithm
- For each component, find the lightest edge leaving it.
- Add all such edges.
- Repeat until one component remains.
Time: O(E log V). Each iteration at least halves the number of components. Parallelizes well.
Uniqueness
If all edge weights are distinct, the MST is unique.
If some weights are equal, there may be multiple MSTs, but the set of edge weights in any MST is the same (the multiset of MST edge weights is unique).
Real-World Applications
- Network design: MSTs minimize the cost of connecting all nodes. Used in telecommunications, power grids, and pipeline networks.
- Cluster analysis: Single-linkage clustering is equivalent to building an MST and removing the heaviest edges.
- File systems: Directory structures are trees.
- DOM trees: Web pages are represented as trees (HTML DOM).
- Decision trees: Machine learning models that recursively split data.
- Expression trees: Compilers represent mathematical expressions as binary trees.
- Merge trees: Version control merge operations on tree-structured data.
- Phylogenetic trees: Evolutionary relationships in biology.
- Huffman trees: Optimal prefix codes use binary trees.
- Game trees: Game AI uses trees to explore possible moves (minimax, MCTS).