Tries
A trie (prefix tree) is a tree where each edge represents a character and paths from root to nodes represent prefixes. Tries excel at string-related operations: prefix search, autocomplete, and dictionary lookup.
Standard Trie
Each node has up to |Σ| children (one per alphabet character). A flag marks word endings.
Root
├── a
│ ├── n (word: "an")
│ │ └── d (word: "and")
│ └── p
│ └── p
│ └── l
│ └── e (word: "apple")
├── b
│ └── e (word: "be")
└── t
└── o (word: "to")
Operations
| Operation | Time | |---|---| | Insert word | O(m) where m = word length | | Search word | O(m) | | Prefix search | O(p + k) where p = prefix length, k = results | | Delete word | O(m) |
Implementation
STRUCTURE TrieNode
children : array of 26 entries (one per lowercase letter), initially all NULL
is_end : boolean, initially false
PROCEDURE INSERT(root, word)
node ← root
FOR EACH character c IN word DO
idx ← ORD(c) - ORD('a')
IF node.children[idx] = NULL THEN
node.children[idx] ← new TrieNode
node ← node.children[idx]
node.is_end ← true
FUNCTION SEARCH(root, word)
node ← root
FOR EACH character c IN word DO
idx ← ORD(c) - ORD('a')
IF node.children[idx] = NULL THEN RETURN false
node ← node.children[idx]
RETURN node.is_end
Space Analysis
Worst case: O(n × m × |Σ|) where n = number of words, m = average length. Each node has |Σ| child pointers.
Space optimization: Use hash maps instead of arrays for children (sparse nodes). Or use compressed tries.
Compressed Trie (Patricia / Radix Tree)
Compress chains of single-child nodes into one edge labeled with a string.
Standard trie: Radix tree:
r r
└─ o └─ omane → (word)
└─ m └─ tic → (word)
└─ a
└─ n
└─ e (word: "romane")
└─ tic (word: "romantic" via backtrack... actually:)
Better example:
Words: "romane", "romanus", "romulus"
Radix tree:
rom
├── an
│ ├── e (romane)
│ └── us (romanus)
└── ulus (romulus)
Advantages: Fewer nodes. Less memory. Fewer pointer dereferences.
Used in: Linux kernel routing tables, HTTP routers (httprouter, Actix-web), in-memory databases.
Ternary Search Trie (TST)
Each node has three children: less-than, equal-to, greater-than.
b
/|\
a e t
| |
n o
|
d
Advantages: Space-efficient for sparse alphabets (no wasted child pointers). Combines trie efficiency with BST space usage.
Operations: O(m + log n) search (m = key length, log n from BST comparisons).
Suffix Trie and Suffix Tree
Suffix Trie
A trie of all suffixes of a string.
For "banana", "anana", "ana", "a".
Space: O(n²) — too large for practical use.
Suffix Tree
A compressed suffix trie. Each edge stores a substring (represented by start/end indices into the original string).
Space: O(n) — linear!
Construction: Ukkonen's algorithm builds a suffix tree in O(n) time (online, left to right).
Operations:
- Pattern search: O(m) — follow the path matching the pattern
- Number of occurrences: O(m) — count leaves in the subtree
- Longest repeated substring: Deepest internal node
- Longest common substring of two strings: Build generalized suffix tree, find deepest node with leaves from both strings
Suffix Automaton
A minimal DFA recognizing all suffixes of a string. O(n) states and transitions.
Equivalent power to suffix tree but often more practical (simpler to implement, natural for online construction).
Applications: Count distinct substrings, find shortest non-occurring substring, string matching.
Applications
Autocomplete
Prefix search in a trie. Given prefix "app":
- Navigate to the node for "app".
- Enumerate all words in the subtree (DFS/BFS).
- Return top-k results (by frequency, recency, or relevance).
Spell Checking
- Check if the word exists in the dictionary trie.
- If not, generate candidates:
- Delete a character
- Insert a character
- Replace a character
- Transpose adjacent characters
- Check candidates against the trie.
IP Routing (Longest Prefix Match)
IP routing tables use longest prefix matching: find the most specific route matching a destination IP.
Tries (specifically Patricia tries / radix trees) are the standard data structure for this. Binary trie on IP address bits, compressed with path compression.
Linux kernel: Uses a compressed trie (LC-trie) for the routing table.
Longest Common Substring
Build a generalized suffix tree for strings S₁ and S₂. The deepest internal node with leaves from both strings represents the longest common substring.
Time: O(n + m) construction + O(n + m) traversal.
Applications in CS
- Search engines: Autocomplete, spell correction, prefix search.
- Text editors/IDEs: Autocompletion, syntax highlighting (keyword tries).
- Networking: IP routing tables (longest prefix match). DNS lookup.
- Compression: LZW compression uses a trie to track seen substrings.
- Bioinformatics: Suffix trees for DNA sequence analysis, pattern matching, repeat finding.
- Databases: Trie-based indexes for string columns. Prefix queries.
- Web frameworks: URL routing (radix tree matching request paths to handlers).
- Natural language processing: Morphological analysis, dictionary lookup, n-gram language models.