4 min read
On this page

Autocomplete & Typeahead

Autocomplete predicts what a user is typing and offers suggestions before they finish. It reduces keystrokes, corrects spelling, guides users toward valid queries, and surfaces popular content. The system must respond within 100 milliseconds to feel instantaneous.

Trie Data Structure

A trie (prefix tree) is the foundational data structure for autocomplete. Each node represents a character, and paths from root to leaf represent complete strings.

Basic Trie Structure

Inserting: "cat", "car", "card", "care", "dog"

         (root)
        /      \
       c        d
       |        |
       a        o
      / \       |
     t   r      g*
     *   |
        / \
       d   e
       *   *

* marks end of a valid word

Looking up all words starting with "car" means traversing to the "r" node under "c-a" and collecting all descendants: "car", "card", "care".

Trie Complexity

Operations:
  Insert: O(m) where m = length of string
  Search prefix: O(m) to reach prefix node + O(n) to collect results
  Space: O(alphabet_size * m * number_of_strings)

For autocomplete with 10 million search queries:
  Naive trie: several GB of memory
  Compressed trie: reduces to hundreds of MB

Compressed Tries (Radix Trees)

Compressing chains of single-child nodes dramatically reduces memory usage.

Uncompressed:
  c -> a -> r -> d
                 e

Compressed (radix tree):
  "ca" -> "r" -> "d"
                  "e"

Single-child chains become single nodes with multi-character labels.

Google's autocomplete handles billions of queries. Their system uses compressed trie variants distributed across data centers, with each node storing aggregate query frequency data.

Prefix Matching

Prefix matching retrieves all entries that start with the user's current input. It is the core operation behind typeahead.

User types: "sys"

Prefix search returns:
  "system design" (2.1M searches/month)
  "system of a down" (890K searches/month)
  "system requirements" (450K searches/month)
  "system32" (320K searches/month)
  "systematic review" (210K searches/month)

Matching Strategies Beyond Prefix

Pure prefix matching misses valid suggestions when users type middle words or make typos.

Prefix match:
  Input: "design"
  Matches: "design patterns", "design thinking"
  Misses: "system design" (query starts with "system", not "design")

Infix match:
  Input: "design"
  Also matches: "system design", "web design principles"
  More results but more expensive (cannot use trie directly)

Fuzzy match:
  Input: "desgin" (typo)
  Matches: "design" with edit distance 1
  Uses Levenshtein distance or similar

Spotify search uses fuzzy prefix matching. Typing "beatls" still suggests "The Beatles" by combining prefix search with edit distance tolerance. This is critical for mobile users where typos are frequent.

Multi-Word Prefix Matching

Input: "new york p"
Strategy: match prefix of last word, exact match earlier words

  Tokenized: ["new", "york", "p"]
  Filter: entries containing "new" AND "york"
  Prefix match last token: entries starting with "p"
  
  Results:
    "new york pizza"
    "new york parks"
    "new york post"

Suggestion Ranking

Returning all prefix matches is not enough. The top 5-10 suggestions must be the most useful ones.

Frequency-Based Ranking

The simplest approach ranks by how often each query was searched.

Prefix: "pyth"
Suggestions ranked by search frequency:
  1. "python" (45M/month)
  2. "python tutorial" (12M/month)
  3. "python download" (8M/month)
  4. "python list comprehension" (3M/month)
  5. "pythagorean theorem" (2M/month)

Store frequency counts at each trie node. When traversing to collect suggestions, use a priority queue to return the top-K by frequency without scanning all descendants.

Trie with frequency:
  "pyth" node:
    Top suggestions pre-computed:
      ["python":45M, "python tutorial":12M, "python download":8M, ...]
    
  No need to traverse subtree at query time.
  Pre-computed top-K stored at each internal node.

Personalized Ranking

Combine global frequency with user-specific signals.

Ranking score = global_frequency * 0.4 + personal_frequency * 0.3 
                + recency * 0.2 + context * 0.1

User A (data scientist):
  "pandas" --> personal boost (searched 50 times this month)
  Suggestion: "pandas dataframe" ranked #1

User B (animal enthusiast):
  "pandas" --> no personal boost
  Suggestion: "pandas bear habitat" ranked #1

YouTube personalizes autocomplete based on your watch history and subscriptions. Two users typing the same prefix see different suggestions tailored to their interests.

Contextual Ranking

The same prefix should suggest different completions based on context.

Context signals:
  - Current page or section (shopping, news, code editor)
  - Time of day (morning: "coffee near me", evening: "dinner near me")
  - Location (local businesses and landmarks)
  - Device type (shorter suggestions on mobile)
  - Recent searches in this session

Suggestions should reflect current trends, not just historical popularity.

Time-weighted frequency:
  score = sum of (search_count * decay_factor^age_in_days)
  
  Recent searches contribute more than old ones.
  
  Example decay factor: 0.95 per day
    Today's searches: count * 1.0
    Yesterday: count * 0.95
    Last week: count * 0.70
    Last month: count * 0.21

Twitter trending topics use aggressive time decay. A sudden spike in searches for a topic pushes it into autocomplete suggestions rapidly, even if its all-time search count is low.

Real-Time Updates

Autocomplete data must reflect new content and changing search patterns without rebuilding the entire index.

Updating Frequency Counts

Batch update approach:
  1. Collect search logs from the past hour
  2. Aggregate query frequencies
  3. Merge with existing trie data
  4. Rebuild pre-computed top-K at affected nodes
  5. Swap in new data atomically
  
  Latency: minutes to hours
  Suitable for: most autocomplete systems

Real-time update approach:
  1. Each search query updates a streaming counter (Redis, Kafka Streams)
  2. Counter updates propagate to trie nodes
  3. Top-K suggestions at affected nodes are recomputed
  
  Latency: seconds
  Suitable for: trending topics, breaking news

Adding New Suggestions

When a new product, article, or entity is created, it should appear in autocomplete without waiting for the next full rebuild.

New content ingestion:
  1. New product "iPhone 17 Pro" is listed
  2. Product title is tokenized and added to the autocomplete index
  3. Initial frequency is set based on category average or editorial boost
  4. As users search for it, organic frequency takes over

Amazon updates autocomplete for new product listings within minutes of publication. When a new product launches, its title appears in suggestions almost immediately.

Serving Architecture

Autocomplete serving at scale:

  Client types "sys"
    --> CDN cache check (popular prefixes cached at edge)
    --> If cache miss, request to autocomplete service
    --> Autocomplete service queries in-memory trie
    --> Returns top 10 suggestions in under 50ms
    --> Response cached at CDN for next user

  Scaling:
    Shard by prefix range: a-f on shard 1, g-m on shard 2, etc.
    Replicate each shard for read throughput
    In-memory data for speed (trie fits in RAM for most applications)

Debouncing & Client-Side Optimization

Not every keystroke should trigger a server request.

Client-side optimizations:
  - Debounce: wait 100-200ms after last keystroke before requesting
  - Local cache: store recent prefix-to-suggestions mappings
  - Prefetch: when user types "sys", prefetch "syst" results
  - Abort: cancel in-flight request when user types next character
  - Minimum prefix length: do not query until 2-3 characters typed

Data Collection & Privacy

Autocomplete suggestions are derived from user behavior, which raises privacy considerations.

Privacy-conscious data collection:
  - Aggregate queries: store counts, not individual searches
  - Minimum threshold: only suggest queries searched by 10+ unique users
  - Filtering: remove suggestions containing personal information, hate speech
  - Right to be forgotten: ability to remove specific suggestions
  - Anonymous aggregation: separate query logs from user identity

Google has removed autocomplete suggestions that were defamatory or misleading, and provides mechanisms for requesting removal. Their suggestion generation explicitly filters sensitive categories.

Common Pitfalls

  • Querying on every keystroke. Without debouncing, a fast typist generates 10+ requests per second per user. Debounce at 100-200ms on the client.
  • Not pre-computing top-K results. Traversing the full subtree on each request is too slow. Store pre-computed top suggestions at each trie node.
  • Ignoring offensive or sensitive suggestions. Autocomplete amplifies visibility. A blocklist and content moderation pipeline is essential.
  • Stale suggestions. If the autocomplete index only refreshes daily, trending topics and new products are invisible for hours. Balance freshness with rebuild cost.
  • Treating all characters equally. Autocomplete in languages with complex scripts (CJK, Arabic) requires specialized tokenization, not just character-level trie traversal.
  • No fallback for empty results. When the prefix matches nothing, show popular queries or recent searches rather than an empty dropdown.

Key Takeaways

  • Tries provide O(m) prefix lookup but need compression and pre-computed top-K at each node for production performance.
  • Suggestion ranking combines global frequency, personalization, recency, and context. Pure frequency ranking misses user intent.
  • Autocomplete must respond in under 100ms. This requires in-memory data structures, edge caching, and client-side debouncing.
  • Real-time updates keep suggestions relevant for trending topics and new content. Batch rebuilds handle the long tail.
  • Privacy filtering and offensive content moderation are not optional. Autocomplete amplifies the visibility of every suggestion it shows.