6 min read
On this page

Trees

Trees are the most important graph structure in computer science — they model hierarchies, enable efficient search, and underpin countless algorithms.

Definition and Properties

A tree is a connected acyclic graph.

Equivalent Definitions

For a graph G with n vertices, the following are equivalent:

  1. G is a tree (connected and acyclic).
  2. G is connected and has exactly n - 1 edges.
  3. G is acyclic and has exactly n - 1 edges.
  4. There is exactly one path between every pair of vertices.
  5. G is connected, but removing any edge disconnects it (every edge is a bridge).
  6. G is acyclic, but adding any edge creates exactly one cycle.

Proof sketch (1 ⟹ 2): By induction on n. Base: n = 1 has 0 edges. For n ≥ 2, a tree has a leaf (vertex of degree 1) — removing it gives a tree with n - 1 vertices and n - 2 edges. So the original has n - 1 edges. ∎

Leaf Theorem

Every tree with n ≥ 2 vertices has at least 2 leaves (vertices of degree 1).

Proof: Sum of degrees = 2(n-1) = 2n - 2. If at most 1 leaf, then at least n - 1 vertices have degree ≥ 2. Sum of degrees ≥ 1 + 2(n-1) = 2n - 1 > 2n - 2. Contradiction. ∎

Forests

A forest is an acyclic graph (not necessarily connected). A forest is a disjoint union of trees.

A forest with n vertices and k components has exactly n - k edges.

Rooted Trees

A rooted tree has a designated vertex called the root. This induces a natural hierarchy:

  • Parent: The vertex adjacent to v on the path from v to the root.
  • Child: A vertex for which v is the parent.
  • Siblings: Vertices with the same parent.
  • Ancestor: Any vertex on the path from v to the root (including v itself).
  • Descendant: Any vertex in the subtree rooted at v.
  • Leaf: A vertex with no children.
  • Internal node: A vertex with at least one child.
  • Depth (level) of v: Length of the path from root to v.
  • Height of v: Length of the longest path from v to any leaf in its subtree.
  • Height of the tree: Height of the root = maximum depth of any leaf.

Ordered Trees

A rooted tree where the children of each node have a specific left-to-right ordering. Most trees in CS are ordered.

Binary Trees

A binary tree is a rooted tree where each node has at most 2 children (left child and right child).

Types

Full binary tree: Every node has 0 or 2 children (no node has exactly 1 child).

Complete binary tree: All levels are fully filled except possibly the last, which is filled left-to-right.

Perfect binary tree: All internal nodes have 2 children and all leaves are at the same depth.

Properties

For a binary tree with n nodes, ℓ leaves, and height h:

| Property | Formula | |---|---| | Full binary tree: leaves vs internals | ℓ = i + 1, n = 2ℓ - 1 | | Perfect binary tree of height h | n = 2^(h+1) - 1, ℓ = 2^h | | Minimum height | h ≥ ⌈log₂(n+1)⌉ - 1 | | Maximum height | h ≤ n - 1 (degenerate/skewed) | | Leaves in any binary tree | ℓ ≤ 2^h |

Counting Binary Trees

The number of distinct binary trees with n nodes is the Catalan number:

Cₙ = C(2n, n) / (n + 1)

C₀ = 1, C₁ = 1, C₂ = 2, C₃ = 5, C₄ = 14.

This counts structurally distinct trees (not labeled).

Tree Traversals

For Binary Trees

In-order (Left, Root, Right):

fn inorder(node):
    if node is null: return
    inorder(node.left)
    visit(node)
    inorder(node.right)

For BSTs, in-order gives sorted order.

Pre-order (Root, Left, Right):

fn preorder(node):
    if node is null: return
    visit(node)
    preorder(node.left)
    preorder(node.right)

Useful for: serialization, copying a tree, prefix notation.

Post-order (Left, Right, Root):

fn postorder(node):
    if node is null: return
    postorder(node.left)
    postorder(node.right)
    visit(node)

Useful for: deletion, evaluating expressions, computing sizes.

Level-order (BFS):

fn levelorder(root):
    queue = [root]
    while queue not empty:
        node = queue.dequeue()
        visit(node)
        if node.left: queue.enqueue(node.left)
        if node.right: queue.enqueue(node.right)

Reconstruction

A binary tree can be uniquely reconstructed from:

  • In-order + pre-order
  • In-order + post-order

But not from pre-order + post-order alone (ambiguous for trees with single children).

Spanning Trees

A spanning tree of a connected graph G is a subgraph that:

  1. Is a tree (connected, acyclic).
  2. Contains all vertices of G.

Every connected graph has a spanning tree. It has exactly n - 1 edges.

Counting Spanning Trees

Cayley's Formula: The number of labeled spanning trees of the complete graph Kₙ is:

τ(Kₙ) = nⁿ⁻²

For n = 4: 4² = 16 spanning trees of K₄.

Kirchhoff's Matrix Tree Theorem: For any graph G, the number of spanning trees equals any cofactor of the Laplacian matrix L = D - A, where D is the degree matrix and A is the adjacency matrix.

Prüfer Sequences

A Prüfer sequence provides a bijection between labeled trees on n vertices and sequences of length n - 2 over {1, 2, ..., n}.

Encoding (tree → sequence):

  1. Find the leaf with the smallest label.
  2. Record the label of its neighbor.
  3. Remove the leaf.
  4. Repeat until 2 vertices remain.

Decoding (sequence → tree): Reverse the process using the observation that vertices not in the sequence (at each step) are leaves.

This bijection proves Cayley's formula: there are nⁿ⁻² sequences of length n - 2 over n symbols.

Minimum Spanning Trees

Given a connected weighted graph, a minimum spanning tree (MST) is a spanning tree with the smallest total edge weight.

Cut Property

For any cut of the graph (partition of vertices into two non-empty sets), the lightest edge crossing the cut is in every MST (assuming unique edge weights).

Cycle Property

For any cycle in the graph, the heaviest edge in the cycle is not in any MST (assuming unique edge weights).

Kruskal's Algorithm

  1. Sort edges by weight.
  2. For each edge (in order), add it to the MST if it doesn't create a cycle.

Uses Union-Find for cycle detection. Time: O(E log E).

FUNCTION KRUSKAL(n, edges)
    SORT edges by weight
    uf ← new UNION_FIND(n)
    mst ← empty list
    FOR EACH (w, u, v) IN edges DO
        IF FIND(uf, u) ≠ FIND(uf, v) THEN
            UNION(uf, u, v)
            APPEND (w, u, v) TO mst
    RETURN mst

Prim's Algorithm

  1. Start with any vertex. Maintain a set S of visited vertices.
  2. Repeatedly add the lightest edge connecting S to V \ S.
  3. Until all vertices are in S.

Uses a priority queue. Time: O(E log V) with a binary heap, O(E + V log V) with a Fibonacci heap.

Borůvka's Algorithm

  1. For each component, find the lightest edge leaving it.
  2. Add all such edges.
  3. Repeat until one component remains.

Time: O(E log V). Each iteration at least halves the number of components. Parallelizes well.

Uniqueness

If all edge weights are distinct, the MST is unique.

If some weights are equal, there may be multiple MSTs, but the set of edge weights in any MST is the same (the multiset of MST edge weights is unique).

Real-World Applications

  • Network design: MSTs minimize the cost of connecting all nodes. Used in telecommunications, power grids, and pipeline networks.
  • Cluster analysis: Single-linkage clustering is equivalent to building an MST and removing the heaviest edges.
  • File systems: Directory structures are trees.
  • DOM trees: Web pages are represented as trees (HTML DOM).
  • Decision trees: Machine learning models that recursively split data.
  • Expression trees: Compilers represent mathematical expressions as binary trees.
  • Merge trees: Version control merge operations on tree-structured data.
  • Phylogenetic trees: Evolutionary relationships in biology.
  • Huffman trees: Optimal prefix codes use binary trees.
  • Game trees: Game AI uses trees to explore possible moves (minimax, MCTS).