Back to Problems
Functional RequirementsNon-Functional RequirementsCapacity PlanningHigh Level DesignDetailed Design
The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

Top K Heavy Hitter

Functional Requirements

For a platform like Spotify, design a sytem that returns the top K songs played in the last 7 days. The same solution can also be applied to getting the top K most popular items of an e-commerce platform.

  • When a user plays a song, increment count for that song
  • Return the top K songs played for the last 7 days

Non-Functional Requirements

How many users are there?

  • 100,000,000 DAU

How many times does a user interact with content (listens to a song)?

  • 100 times a day.

How many times does a user request top k?

  • 2 times a day for last 7 days

How accurate do the results need to be?

  • Very accurate for time periods earlier than yesterday and
  • Somewhat accurate for the current hour.

Does the response need to be immediate for the last 7 days Top K?

  • Yes, an immediate response is needed.

How long should the raw data be retained?

  • Data retention for 1 year.

Capacity Planning

A user interacts with the content 100 times a day but only requests top K two times a day. Therefore we have a write-heavy system. Let's calculate the write QPS first.

QPS = Total number of query requests per day / (Number of seconds in a day)

Given a DAU (Daily Active Users) of 100M, and makes 100 read equest per day per user, the write QPS is about 100,000 and the read QPS is about 2000.

Before we jump into storage estimates, let's take a rough look at the event data schema that will be recorded.

{ "songId": "string", "userId": "string", "timestamp": "ISO 8601 datetime", "location": "string", "deviceType": "string", "playDuration": "integer" }

String Fields:

  • Assuming UTF-8 encoding (which is typical for JSON), most characters will use 1 byte each. The length of the string fields (songId, userId, location, deviceType) will vary, but let's estimate:
  • songId and userId: 36 characters each (assuming UUIDs) = 36 bytes each.
  • timestamp: ISO 8601 datetime string, e.g., "2023-04-01T12:00:00Z", approximately 20 characters = 20 bytes.
  • location: Varies widely, but let's estimate 30 characters = 30 bytes.
  • deviceType: Short string, e.g., "smartphone", approximately 10 characters = 10 bytes.

Integer Field:

  • playDuration is an integer. In JSON, numbers are encoded as text, but let's assume an average of 3 digits for the play duration, equating to approximately 3 characters = 3 bytes.

In total, we are looking at about 200 bytes of data per event. With this assumption, we can calculate 200 byte * 100MM users * 100 writes per user per day = 2TB of storage per day and about 700TB for 12 months data retention period.

High Level Design

We have a unique design requirement of somewhat accurate for the current hour and very accurate count for time periods earlier than yesterday. This unique requirement means a natural way to design the system is to have two data flow paths.

For the daily count that requires more accurate data, we can take our time and process the data accurately with a batch pipeline.

The current hour has looser accurate requirement which means we could leverage some sort of probabilistic data structure that offers speed with the cost of slightly lower accuracy.

The write QPS is quite large so we'd need a message queue as a buffer.

Since we want two consumers to consumes messages in different paths, we would need a log-based message queue.

Top K Heavy Hitter Design Diagram

Fast Path

The fast path is designed to handle real-time data with a focus on speed over accuracy. This path utilizes probabilistic data structures such as a Count-Min Sketch to estimate the frequency of songs being played. Count-Min Sketch provides a way to count the frequency of events in a stream with high accuracy using less memory but introduces a possibility of overestimation.

For the fast path, we implement the following steps:

  • Data Ingestion: As events come in, they're immediately pushed to a log-based message queue to handle the high write QPS efficiently.
  • Real-Time Processing: Events are streamed from the queue to a real-time processing system (like Apache Flink or Kafka Streams) where a Count-Min Sketch data structure is updated in real-time with each song play event.
  • Top K Estimation: The real-time processing system uses the Count-Min Sketch to estimate the Top K songs in real-time or relys on . This estimation can be refreshed every minute to provide near real-time insights.

Slow Path

The slow path focuses on accuracy and processes data with a delay, typically in batches. This path is ideal for generating accurate counts for periods earlier than the current hour or day.

  • Batch Processing: At regular intervals hourly, the system batches events from the message queue and processes them to update the counts of each song accurately.
  • Data Aggregation: The batch processing system aggregates song play counts using a more traditional database or data warehouse solution, ensuring very accurate counts.
  • Top K Calculation: Once the batch processing is complete, a Top K algorithm is applied to the accurate counts to determine the Top K songs for the period.

Detailed Design

We have glossed over how Count Service returns the real-time 'mostly accurate' count. Let's dive deep into how this is achieved.

Getting Top K using Count-Min Sketch and Min Heap

To efficiently track the Top K most played songs, we can use a combination of a Count-Min Sketch and a Min Heap. This method balances the need for memory efficiency and the ability to quickly update and retrieve the Top K items in a streaming dataset.

How it Works

Grasping the building blocks ("the lego pieces")

This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.

Core Building blocks

At the bare minimum you should know the core building blocks of system design

  • Scaling stateless services with load balancing
  • Scaling database reads with replication and caching
  • Scaling database writes with partition (aka sharding)
  • Scaling data flow with message queues

System Design Template

With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:

System Design Template

Additional Building Blocks

Additionally, you will want to understand these concepts

  • Processing large amount of data (aka “big data”) with batch and stream processing
    • Particularly useful for solving data-intensive problems such as designing an analytics app
  • Achieving consistency across services using distribution transaction or event sourcing
    • Particularly useful for solving problems that require strict transactions such as designing financial apps
  • Full text search: full-text index
  • Storing data for the long term: data warehousing

On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.

Working through problems and building solutions using the building blocks

Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.

Read the rest of this article and practice this problem with a FREE account
In terms of scaling the design, we need to consider partitioning the incoming stream. That means we have 'n' real-time aggregations to deal with. All of these will produce its min heap as the output and we would need a step to merge the n minheaps. The output of aggregations would be timestamp[minute]vs top-k heap. This (merging n sorted list) can be done by O(n*k) and so can be solved using an active-passive node setup. The solution article could detail this aspect
Tue Jun 04 2024