Autocomplete / Type-ahead System for a Search Box – Part 2
In Part 1, we covered the problem statement, requirements, and capacity estimation.
In this part, weβll deep-dive into the core data structures, ranking optimizations, and high-level system architecture that make autocomplete systems fast and scalable at Google-level traffic.
π Table of Contents (Part 2)
- Choosing the Core Data Structure
- Optimizing the Trie Structure
- How Top K Suggestions Are Retrieved
- Building and Maintaining Top K Lists
- High Level System Architecture
- Storage, Replication, and Availability
Choosing the Core Data Structure
Autocomplete is fundamentally a prefix-matching problem.
Given a prefix, return the most relevant strings that start with it β fast.
This immediately eliminates naive solutions like scanning a list or database table. We need a structure that is prefix-aware by design.
Why a Trie Works Well
A Trie (Prefix Tree) stores strings character by character.
- Each node represents a prefix
- All descendants share that prefix
Key Advantages
- Lookup time depends only on prefix length
- No scanning of unrelated strings
- Naturally groups similar queries
- Perfect fit for prefix search
Example
Stored queries:
be, bee, bell, bent, belt
Trie traversal for prefix "be" leads to a subtree containing:
bee, bell, bent, belt
This is exactly what autocomplete needs β fast prefix isolation.
Optimizing the Trie Structure
A naive trie becomes extremely large when storing billions of queries.
Problem with a Naive Trie
- Too many nodes
- Deep chains with single children
- High memory overhead
- Excessive pointer chasing
Compressed Trie (Radix Tree)
In many cases, nodes have only one child.
These chains can be merged into a single edge storing multiple characters.
Benefits
- Reduced memory usage
- Fewer pointer traversals
- Faster cache-friendly lookups
This optimized trie is often called a Radix Tree or Compact Trie.
Storing Popularity Information
How popular is a full search query?
Each complete query ends at a terminal node in the trie.
At that node, we store:
- Search frequency
- Or a weighted score combining frequency + recency
Example
User searches:
"apple" β 100 searches
"application" β 60 searches
"app store" β 200 searches
Trie view:
app
βββ apple (freq=100)
βββ application (freq=60)
βββ app store (freq=200)
At this stage:
- We know popularity of full queries
- We cannot yet answer prefix queries fast
This layer answers:
βHow popular is each query?β
How Top K Suggestions Are Retrieved
Given a prefix, how do we return the best suggestions instantly?
Naive Approach β
For prefix "be":
- Traverse to prefix node
- Traverse entire subtree
- Collect all terminal queries
- Sort by frequency
- Return top K
π« This is far too slow for real-time systems.
Pre Computed Top K Optimization β
To guarantee fast responses:
π Each trie node stores the top K suggestions for that prefix.
So at query time:
Traverse to prefix node β return stored top-K
No subtree traversal.
No sorting.
No computation on request path.
Storage Optimization
To reduce memory overhead:
- Store references to terminal nodes
- Store frequency alongside references
This is a classic space vs latency trade-off β and autocomplete is extremely read-heavy, so itβs worth it.
Building and Maintaining Top K Lists
Top-K lists are built bottom-up.
Build Process
- Leaf nodes know their own frequency
- Parent nodes merge childrenβs top-K lists
- Only top K entries are retained
- Process propagates upward to root
This ensures:
- Every prefix node has pre-sorted suggestions
- Lookup time is O(prefix length)
Two-Layer Mental Model
Think of autocomplete as two separate layers:
Layer 1: Popularity Tracking (Data Layer)
- Tracks how often full queries are searched
- Stored at terminal nodes
- Updated offline via logs and aggregation
Layer 2: Fast Retrieval (Serving Layer)
- Uses popularity data to compute rankings
- Stores only top-K per prefix
- Optimized purely for read latency
High Level System Architecture
Autocomplete is a read-heavy, latency-critical pipeline.
Serve from memory first β fall back to structured storage β never compute on the request path
Autocomplete is not a compute problem β itβs a data access problem.
Logical Components
User
β
Browser / Mobile Client
β
Global Load Balancer
β
Stateless Application Servers
β
In-Memory Cache (Redis)
β
Distributed Trie Storage
Each layer has a single responsibility, which keeps the system scalable and debuggable.
Step-by-Step Request Flow
1οΈβ£ Client Layer (Browser / Mobile)
- Sends only the current prefix
- Requests are debounced (e.g., 150 ms)
- Avoids flooding backend while typing
Example:
"prefix": "how",
"k": 5,
"locale": "en-IN"
2οΈβ£ Load Balancer
Responsibilities:
- Route to nearest region
- Distribute traffic evenly
- Remove unhealthy servers
At scale:
- Global LB (GeoDNS / Anycast)
- Regional LB inside data centers
3οΈβ£ Application Servers (Stateless)
Act as coordinators, not data holders.
They handle:
- Input normalization (
"How "β"how") - Validation
- Cache orchestration
- Fallback logic
Stateless design means:
- Easy horizontal scaling
- No data migration during scale-up
4οΈβ£ Redis Cache (Hot Prefix Layer)
First data stop.
Cache Key
autocomplete:locale:prefix
Example:
autocomplete:en:how
Value:
[
"how to cook rice",
"how to lose weight",
"how to make tea"
]
Why Redis works well:
- Small set of prefixes dominates traffic
- Sub-millisecond memory access
- TTL removes stale data automatically
Prefixes like
"a","th","how"are often served 99% from cache.
5οΈβ£ Cache Miss β Trie Lookup
On cache miss:
- Query distributed trie storage
- Trie node already contains pre-computed top-K
- No traversal or sorting at runtime
Important:
- Trie servers are read-only on serving path
- Writes and rebuilds happen offline
6οΈβ£ Response & Cache Population
- Results returned to client
- Same response written back to Redis
- TTL applied (10β30 minutes)
Cold prefixes naturally become hot.
Failure Handling
Redis Down
- App servers bypass cache
- Read directly from trie
- Slight latency increase, system remains available
Trie Shard Down
- Traffic routed to replica
- Load balancer avoids unhealthy node
App Server Down
- Instantly removed
- No data loss (stateless)
Regional Deployment
User (India)
β
India Load Balancer
β
India App Servers
β
India Redis
β
India Trie Replica
Each region:
- Serves local users
- Has independent cache
- Shares replicated trie data
Conceptual Architecture Diagram
+---------+ +--------------+ +------------------+
| Client | ---> | Load Balancer| ---> | Application Server|
+---------+ +--------------+ +------------------+
|
Cache Hit | Cache Miss
+-----------+ v
| Redis | +-----------------+
+-----------+ | Distributed Trie|
| Storage |
+-----------------+
Storage, Replication, and Availability
A full autocomplete trie cannot live on a single machine.
Distributed Trie Storage
The trie is:
- Logically one structure
- Physically distributed across many servers
Partitioning Strategy
Common approach: prefix-based partitioning
Examples:
-
aβcβ Shard 1 -
dβfβ Shard 2 - Hot prefixes (
a,s,th) further subdivided
This keeps shards balanced and scalable.
Replication Strategy
Each trie partition is replicated across multiple servers.
Why Replication Matters
-
High Availability
Node failures donβt impact service -
Fault Tolerance
Handles hardware failures and deployments -
Read Scalability
Replicas share massive QPS load
Storage Technology Choices
Cassandra (Durable Storage)
Best fit when:
- Large trie partitions
- Very high read throughput
- Cross-region replication needed
Usage pattern:
- Trie partitions stored as serialized blocks
- Loaded into memory on startup
- Cassandra acts as durable backing store
Strengths:
- Horizontal scalability
- High availability
- Tunable consistency
ZooKeeper (Metadata & Coordination)
ZooKeeper is not used to store trie data.
It is used for:
- Partition metadata
- Prefix-to-server mappings
- Leader election (if required)
- Atomic version switching during trie rebuilds
Replication Guarantees
Replication ensures:
- β High availability
- β Fault tolerance
- β Read scalability
π― Final Takeaway (Part 2)
A real-world autocomplete system succeeds because it:
- Uses tries for prefix isolation
- Pre-computes top-K suggestions
- Serves requests from memory first
- Pushes all heavy computation offline
- Scales via partitioning and replication
This is the level of clarity interviewers look for.
