Design Typeahead System

A typeahead system, often called autocomplete, is a feature in software programs that predicts the rest of a word or phrase that a user is typing. This functionality helps users to find and select from a pre-populated list of values as they type, reducing the amount of typing needed and making the input process faster and more efficient.

An everyday example of a typeahead system is Google's search box. As you begin typing your search query, Google provides a dropdown list of suggested search terms that match what you have typed so far. These suggestions are based on popular searches and the content indexed by Google's search engine. This allows you to see and select your intended search without having to type out the entire term, making the search process faster and more efficient.

When a user begins to type in the search box, the typeahead feature could suggest book titles, authors, or genres that match the typed characters. For instance, if the user types in "Har", the system could suggest "Harry Potter", "Harper Lee", etc. This kind of typeahead system can greatly enhance the user experience by providing instant feedback and helping users formulate their search queries.

In this problem, we will design a typeahead system for a large e-commerce site. The system will provide search suggestions to users as they type in the search box. The suggestions will be based on the most popular searches and the content indexed by the e-commerce site's search engine.

Functional requirement:

  • The system should return search suggestions as the user types in the search box.
  • The system should return top 10 results for each query.
  • The system should prioritize more relevant suggestions (e.g., based on popularity or other relevance metrics).
  • The system should handle updates to the data, e.g., new entries added.
  • Use a Trie to store the data.

Scale requirement:

  • 100 million daily active users.
  • Search data should be updated daily
  • Assuming average search queries/user/day is 20.
  • Assuming the average number of characters of a query is 10.
  • Assuming 10% of the queries are new, each query is 10 character, each character is 2 bytes, then each query adds 20 bytes, and new data will be stored for one year.
1. Resource Estimation