Tries

A trie (prefix tree) is a tree where each edge represents a character and paths from root to nodes represent prefixes. Tries excel at string-related operations: prefix search, autocomplete, and dictionary lookup.

Standard Trie

Each node has up to |Σ| children (one per alphabet character). A flag marks word endings.

Root
├── a
│   ├── n (word: "an")
│   │   └── d (word: "and")
│   └── p
│       └── p
│           └── l
│               └── e (word: "apple")
├── b
│   └── e (word: "be")
└── t
    └── o (word: "to")

Operations

Operation	Time
Insert word	O(m) where m = word length
Search word	O(m)
Prefix search	O(p + k) where p = prefix length, k = results
Delete word	O(m)

Implementation

STRUCTURE TrieNode
    children : array of 26 entries (one per lowercase letter), initially all NULL
    is_end   : boolean, initially false

PROCEDURE INSERT(root, word)
    node ← root
    FOR EACH character c IN word DO
        idx ← ORD(c) - ORD('a')
        IF node.children[idx] = NULL THEN
            node.children[idx] ← new TrieNode
        node ← node.children[idx]
    node.is_end ← true

FUNCTION SEARCH(root, word)
    node ← root
    FOR EACH character c IN word DO
        idx ← ORD(c) - ORD('a')
        IF node.children[idx] = NULL THEN RETURN false
        node ← node.children[idx]
    RETURN node.is_end

Space Analysis

Worst case: O(n × m × |Σ|) where n = number of words, m = average length. Each node has |Σ| child pointers.

Space optimization: Use hash maps instead of arrays for children (sparse nodes). Or use compressed tries.

Compressed Trie (Patricia / Radix Tree)

Compress chains of single-child nodes into one edge labeled with a string.

Standard trie:           Radix tree:
  r                        r
  └─ o                     └─ omane → (word)
     └─ m                     └─ tic → (word)
        └─ a
           └─ n
              └─ e (word: "romane")
              └─ tic (word: "romantic" via backtrack... actually:)

Better example:
Words: "romane", "romanus", "romulus"

Radix tree:
  rom
  ├── an
  │   ├── e (romane)
  │   └── us (romanus)
  └── ulus (romulus)

Advantages: Fewer nodes. Less memory. Fewer pointer dereferences.

Used in: Linux kernel routing tables, HTTP routers (httprouter, Actix-web), in-memory databases.

Ternary Search Trie (TST)

Each node has three children: less-than, equal-to, greater-than.

Advantages: Space-efficient for sparse alphabets (no wasted child pointers). Combines trie efficiency with BST space usage.

Operations: O(m + log n) search (m = key length, log n from BST comparisons).

Suffix Trie and Suffix Tree

Suffix Trie

A trie of all suffixes of a string.

For "banana $": Insert "banana$ ", "anana $", "nana$ ", "ana $", "na$ ", "a $", "$ ".

Space: O(n²) — too large for practical use.

Suffix Tree

A compressed suffix trie. Each edge stores a substring (represented by start/end indices into the original string).

Space: O(n) — linear!

Construction: Ukkonen's algorithm builds a suffix tree in O(n) time (online, left to right).

Operations:

Pattern search: O(m) — follow the path matching the pattern
Number of occurrences: O(m) — count leaves in the subtree
Longest repeated substring: Deepest internal node
Longest common substring of two strings: Build generalized suffix tree, find deepest node with leaves from both strings

Suffix Automaton

A minimal DFA recognizing all suffixes of a string. O(n) states and transitions.

Equivalent power to suffix tree but often more practical (simpler to implement, natural for online construction).

Applications: Count distinct substrings, find shortest non-occurring substring, string matching.

Applications

Autocomplete

Prefix search in a trie. Given prefix "app":

Navigate to the node for "app".
Enumerate all words in the subtree (DFS/BFS).
Return top-k results (by frequency, recency, or relevance).

Spell Checking

Check if the word exists in the dictionary trie.
If not, generate candidates:
- Delete a character
- Insert a character
- Replace a character
- Transpose adjacent characters
Check candidates against the trie.

IP Routing (Longest Prefix Match)

IP routing tables use longest prefix matching: find the most specific route matching a destination IP.

Tries (specifically Patricia tries / radix trees) are the standard data structure for this. Binary trie on IP address bits, compressed with path compression.

Linux kernel: Uses a compressed trie (LC-trie) for the routing table.

Longest Common Substring

Build a generalized suffix tree for strings S₁ and S₂. The deepest internal node with leaves from both strings represents the longest common substring.

Time: O(n + m) construction + O(n + m) traversal.

Applications in CS

Search engines: Autocomplete, spell correction, prefix search.
Text editors/IDEs: Autocompletion, syntax highlighting (keyword tries).
Networking: IP routing tables (longest prefix match). DNS lookup.
Compression: LZW compression uses a trie to track seen substrings.
Bioinformatics: Suffix trees for DNA sequence analysis, pattern matching, repeat finding.
Databases: Trie-based indexes for string columns. Prefix queries.
Web frameworks: URL routing (radix tree matching request paths to handlers).
Natural language processing: Morphological analysis, dictionary lookup, n-gram language models.