5 min read
On this page

Trees & Graphs

Trees and graphs are non-linear data structures that model hierarchical and networked relationships. Binary search trees provide O(log n) lookup. Heaps provide O(1) access to the minimum or maximum element. Graphs model connections between entities. In C, you implement these with structs and pointers, managing all memory manually. For complex balanced trees, using a well-tested library is often better than writing your own.

Binary Search Tree

A binary search tree (BST) stores elements such that for every node, all values in the left subtree are smaller and all values in the right subtree are larger.

#include <stdio.h>
#include <stdlib.h>

struct TreeNode {
    int key;
    struct TreeNode *left;
    struct TreeNode *right;
};

struct TreeNode *create_tree_node(int key) {
    struct TreeNode *node = malloc(sizeof(struct TreeNode));
    if (node == NULL) return NULL;
    node->key = key;
    node->left = NULL;
    node->right = NULL;
    return node;
}

Insert

struct TreeNode *bst_insert(struct TreeNode *root, int key) {
    if (root == NULL) {
        return create_tree_node(key);
    }
    if (key < root->key) {
        root->left = bst_insert(root->left, key);
    } else if (key > root->key) {
        root->right = bst_insert(root->right, key);
    }
    /* Duplicate keys are ignored */
    return root;
}
struct TreeNode *bst_search(struct TreeNode *root, int key) {
    if (root == NULL || root->key == key) {
        return root;
    }
    if (key < root->key) {
        return bst_search(root->left, key);
    }
    return bst_search(root->right, key);
}

Search follows one path from root to leaf, checking at most O(h) nodes where h is the tree height. For a balanced tree, h = O(log n).

Delete

Deletion has three cases: leaf node, node with one child, and node with two children.

struct TreeNode *find_min(struct TreeNode *root) {
    while (root->left != NULL) {
        root = root->left;
    }
    return root;
}

struct TreeNode *bst_delete(struct TreeNode *root, int key) {
    if (root == NULL) return NULL;

    if (key < root->key) {
        root->left = bst_delete(root->left, key);
    } else if (key > root->key) {
        root->right = bst_delete(root->right, key);
    } else {
        /* Found the node to delete */
        if (root->left == NULL) {
            struct TreeNode *right = root->right;
            free(root);
            return right;
        }
        if (root->right == NULL) {
            struct TreeNode *left = root->left;
            free(root);
            return left;
        }
        /* Two children: replace with in-order successor */
        struct TreeNode *successor = find_min(root->right);
        root->key = successor->key;
        root->right = bst_delete(root->right, successor->key);
    }
    return root;
}

Traversals

void inorder(struct TreeNode *root) {
    if (root == NULL) return;
    inorder(root->left);
    printf("%d ", root->key);
    inorder(root->right);
}

void preorder(struct TreeNode *root) {
    if (root == NULL) return;
    printf("%d ", root->key);
    preorder(root->left);
    preorder(root->right);
}

void free_tree(struct TreeNode *root) {
    if (root == NULL) return;
    free_tree(root->left);
    free_tree(root->right);
    free(root);
}

In-order traversal of a BST produces keys in sorted order.

int main(void) {
    struct TreeNode *root = NULL;
    root = bst_insert(root, 50);
    root = bst_insert(root, 30);
    root = bst_insert(root, 70);
    root = bst_insert(root, 20);
    root = bst_insert(root, 40);

    printf("In-order: ");
    inorder(root);
    printf("\n");

    root = bst_delete(root, 30);
    printf("After deleting 30: ");
    inorder(root);
    printf("\n");

    free_tree(root);
    return 0;
}
In-order: 20 30 40 50 70
After deleting 30: 20 40 50 70

Balanced Trees

An unbalanced BST degrades to a linked list if elements are inserted in sorted order, giving O(n) operations. Balanced trees maintain O(log n) height.

AVL Trees

AVL trees maintain a balance factor (height difference between left and right subtrees) of at most 1 for every node. After each insertion or deletion, rotations restore balance. AVL trees are strictly balanced and provide the fastest lookups, but rotations on every modification add overhead.

Red-Black Trees

Red-black trees are less strictly balanced than AVL trees but require fewer rotations on average. Each node is colored red or black, and the tree maintains invariants that guarantee the longest path is at most twice the shortest.

Implementing either balanced tree correctly in C is complex — hundreds of lines for insertion alone. Unless you are writing an operating system kernel or a database, use a library.

The Linux Kernel's rbtree

The Linux kernel includes a red-black tree implementation (include/linux/rbtree.h) used throughout the kernel for process scheduling, memory management, and file system indexing. It uses the intrusive pattern: the rb_node struct is embedded in your data structure.

#include <linux/rbtree.h>

struct MyData {
    int key;
    int value;
    struct rb_node node; /* Embedded red-black tree node */
};

This is the production-quality approach for performance-critical C code. For application code, consider using an existing hash table instead of implementing a balanced tree.

Heap: Priority Queue

A binary heap is a complete binary tree stored in an array. A min-heap ensures every parent is smaller than its children, so the minimum element is always at index 0.

struct MinHeap {
    int *data;
    size_t size;
    size_t capacity;
};

struct MinHeap *heap_create(size_t capacity) {
    struct MinHeap *heap = malloc(sizeof(struct MinHeap));
    if (heap == NULL) return NULL;
    heap->data = malloc(capacity * sizeof(int));
    if (heap->data == NULL) { free(heap); return NULL; }
    heap->size = 0;
    heap->capacity = capacity;
    return heap;
}

Array Representation

For a node at index i:

  • Parent: (i - 1) / 2
  • Left child: 2 * i + 1
  • Right child: 2 * i + 2

No pointers are needed. The tree structure is implicit in the array indices.

Insert (Sift Up)

static void swap(int *a, int *b) {
    int tmp = *a; *a = *b; *b = tmp;
}

void heap_push(struct MinHeap *heap, int value) {
    if (heap->size >= heap->capacity) return; /* Full */

    heap->data[heap->size] = value;
    size_t i = heap->size;
    heap->size++;

    /* Sift up: swap with parent while smaller */
    while (i > 0) {
        size_t parent = (i - 1) / 2;
        if (heap->data[i] >= heap->data[parent]) break;
        swap(&heap->data[i], &heap->data[parent]);
        i = parent;
    }
}

Extract Minimum (Sift Down)

int heap_pop(struct MinHeap *heap) {
    int min = heap->data[0];
    heap->size--;
    heap->data[0] = heap->data[heap->size];

    /* Sift down: swap with smaller child while larger */
    size_t i = 0;
    while (1) {
        size_t left = 2 * i + 1;
        size_t right = 2 * i + 2;
        size_t smallest = i;

        if (left < heap->size && heap->data[left] < heap->data[smallest])
            smallest = left;
        if (right < heap->size && heap->data[right] < heap->data[smallest])
            smallest = right;
        if (smallest == i) break;

        swap(&heap->data[i], &heap->data[smallest]);
        i = smallest;
    }
    return min;
}

Both push and pop are O(log n). The heap is the simplest way to implement a priority queue in C.

Graphs

Graphs consist of vertices (nodes) connected by edges. They can be directed or undirected, weighted or unweighted.

Adjacency List Representation

struct Edge {
    int dest;
    int weight;
    struct Edge *next;
};

struct Graph {
    int num_vertices;
    struct Edge **adj; /* Array of adjacency lists */
};

struct Graph *graph_create(int num_vertices) {
    struct Graph *g = malloc(sizeof(struct Graph));
    g->num_vertices = num_vertices;
    g->adj = calloc(num_vertices, sizeof(struct Edge *));
    return g;
}

void graph_add_edge(struct Graph *g, int src, int dest, int weight) {
    struct Edge *edge = malloc(sizeof(struct Edge));
    edge->dest = dest;
    edge->weight = weight;
    edge->next = g->adj[src];
    g->adj[src] = edge;
}
void bfs(struct Graph *g, int start) {
    int *visited = calloc(g->num_vertices, sizeof(int));
    int *queue = malloc(g->num_vertices * sizeof(int));
    int front = 0, back = 0;

    visited[start] = 1;
    queue[back++] = start;

    while (front < back) {
        int v = queue[front++];
        printf("%d ", v);

        struct Edge *e = g->adj[v];
        while (e != NULL) {
            if (!visited[e->dest]) {
                visited[e->dest] = 1;
                queue[back++] = e->dest;
            }
            e = e->next;
        }
    }
    printf("\n");

    free(visited);
    free(queue);
}
static void dfs_visit(struct Graph *g, int v, int *visited) {
    visited[v] = 1;
    printf("%d ", v);

    struct Edge *e = g->adj[v];
    while (e != NULL) {
        if (!visited[e->dest]) {
            dfs_visit(g, e->dest, visited);
        }
        e = e->next;
    }
}

void dfs(struct Graph *g, int start) {
    int *visited = calloc(g->num_vertices, sizeof(int));
    dfs_visit(g, start, visited);
    printf("\n");
    free(visited);
}

Adjacency Matrix vs Adjacency List

An adjacency matrix uses O(V^2) space and provides O(1) edge lookup. An adjacency list uses O(V + E) space and provides O(degree) edge lookup. For sparse graphs (few edges relative to vertices), adjacency lists are more memory-efficient. For dense graphs, a matrix is simpler.

When to Implement vs When to Use a Library

Implement yourself:

  • Simple BSTs for learning
  • Heaps (straightforward, hard to get wrong)
  • Adjacency list graphs for specific algorithms

Use a library:

  • Balanced trees (AVL, red-black) — the implementation is complex and error-prone
  • Graph algorithms like Dijkstra, A*, minimum spanning tree — use well-tested implementations
  • Any data structure in production code where correctness is critical

Common Pitfalls

  • Unbalanced BSTs — Inserting sorted data into a plain BST creates a linked list with O(n) operations. Use a balanced tree or randomize insertion order.
  • Stack overflow from deep recursion — A tree with 1 million nodes can be 1 million levels deep if unbalanced. Use iterative traversal or ensure the tree is balanced.
  • Memory leaks in tree deletion — Deleting a subtree requires recursively freeing all nodes. Forgetting to free children leaks their entire subtree.
  • Off-by-one in heap indexing — Parent, left child, and right child formulas differ depending on whether the array is 0-indexed or 1-indexed. Be consistent.
  • Graph memory management — Each edge is a separate allocation. Freeing a graph requires iterating all adjacency lists and freeing every edge node, then the lists, then the graph.
  • Confusing graph representations — Adjacency matrix edge addition is O(1) but uses O(V^2) memory. Adjacency list edge addition is O(1) but edge lookup is O(degree). Choose based on your access patterns.

Key Takeaways

  • Binary search trees provide O(log n) operations when balanced. Unbalanced BSTs degrade to O(n). Use balanced trees (AVL, red-black) for guaranteed performance.
  • Balanced tree implementations are complex. Use the Linux kernel's rbtree or a library rather than writing your own for production code.
  • Binary heaps store a tree in an array with implicit parent-child relationships. Push and pop are O(log n). Heaps are the simplest priority queue implementation.
  • Graphs use adjacency lists (sparse graphs) or adjacency matrices (dense graphs). BFS uses a queue; DFS uses recursion or a stack.
  • Implement simple BSTs and heaps for learning. Use well-tested libraries for balanced trees and complex graph algorithms in production.