Problem description
Design a coding contest platform like LeetCode and HackerRank.
Functional requirements
- Users should be able to solve coding questions by submitting their solution code and running it against built-in test cases to get results.
- Users can participate in daily contests, which involve solving one or many coding questions. Users will be ranked based on the questions they solve and the time it took them. A leaderboard will be displayed during the contest, showing live scores of the top 50 users with their usernames and scores.
Scalability requirements
- Supporting 10K users participating in contests concurrently
- There is one contest a day
- Each contest lasts for 2 hours
- Each user submits solutions 20 times on average
- Each submission executes 20 test cases on average
- User-submitted code has a retention policy of one month, after which it is deleted.
Non-functional requirements
- Low latency: The leaderboard should be updated live.
- High availability: Code submission should be available at all times.
- Security: User-submitted code should not pose a threat to the service.
Resource Estimation
Assuming that the storage space required for the solution of each coding question is 1KB.
Assuming that the Read:Write ratio is 100:1.
QPS: 463
Storage: 12GB
Using the resource estimator
QPS Calculations
To calculate Queries Per Second (QPS), we need to look at the peak load or the maximum number of operations that our system will need to support. Services we need to support
- Executing user-submitted solutions
- Updating leaderboard based on grading of user-submitted solutions
- User profile creation and updates, etc.
During the contest, each submission needs to execute against 20 test cases on average so this is clearly the bottleneck of the entire service.
Let's start by understanding the total number of submissions per day: 10k users * 20 submissions/user = 200k submissions
Since each submission executes 20 test cases on average: 200k submissions * 20 test cases/submission = 4 million test case executions
Assuming that the contest runs for 2 hours (or 7200 seconds), then we can calculate the peak QPS:
4 million test case executions / 7200 seconds = ~556 QPS
This estimate assumes that the load is evenly distributed throughout the contest duration, but in real-world scenarios, we often see peak loads at the start and end of contests. Let’s add a 2x multiplier to the QPS and get 556*6 ~= 1200QPS
Storage Calculations
Let’s first take a look at the types of data we need to store:
- User information: This is provided by the user upon sign-up, such as email, username.
- Contest information: Contest details, problem sets, test case inputs and expected outputs (in file formats).
- User submissions: Based on the scalability requirements in the problem description, each user makes 20 submissions per contest and each submission is about 100KB (including the code, metadata, etc.), then the total data for submissions in a day would be
10k users * 20 submissions/user * 100KB/submission = 200MB/day
. In a month of time, this will be200MB * 30 = 6GB
. - Test case execution results: Each submission has 20 test case results and each result is 10KB (including input, expected output, actual output, pass/fail status), the total data for test case results in a day would be
200k submissions * 20 test cases/submission * 10KB/test case = 40GB/day
. In a month of time, this will be 1.2TB.
Clearly, the storage for executed test case results is the bottleneck and it’s about 1.2TB a month. Since the retention policy is for one month, we can assume that this is the maximum storage our system needs.
API design
Code submission:
POST /problems/{problem_id}/submit
Request body:
{ user_id: string code: string language: string }
Response body:
{ status: success or fail test_case: [ { status: success or fail | timeout } … ] }
Fetching the leaderboard
GET /contests/{contest_id}/leaderboard
Response body:
{ ranking: [ {user_id, score} ] }
High Level Design
Services
- User service: ① Responsible for user sign-up, updating, and retrieving user information to be displayed on the user profile page, backed by the user database.
- Code submission service: the main workhorse of the platform. ② The service accepts user-submitted code and ③ enqueues them as messages into a message queue (see the Message Queue section for detailed examples). ④ The workers fetch from the queue, extract the user code and run them against the test cases inside docker containers and ⑤ writes the output to the test result database (or cloud storage). ⑥ The outputs are compared with the expected outputs and the results are sent back to the client. User scores and rankings are also updated based on output comparisons. If user code passes all test cases for a problem, the problem is marked as a success and its corresponding score is added to the user’s total. ⑦ User scores and rankings are stored in an in-memory data store, such as Redis. Redis has a built-in data structure called sorted set that is tailored for the leaderboard purpose. We can use (contest_id, user_id) as the key and scores as the value and Redis sorted set will automatically ranks the users by their scores.
- Contest service: ⑧ responsible for retrieving contest related information such as questions, date of contest etc from the contest database.
Detailed Design Q&A
Q1: How to prevent users from submitting malicious code that messes up our service internals?
Grasping the building blocks ("the lego pieces")
This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.
Core Building blocks
At the bare minimum you should know the core building blocks of system design
- Scaling stateless services with load balancing
- Scaling database reads with replication and caching
- Scaling database writes with partition (aka sharding)
- Scaling data flow with message queues
System Design Template
With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:
Additional Building Blocks
Additionally, you will want to understand these concepts
- Processing large amount of data (aka “big data”) with batch and stream processing
- Particularly useful for solving data-intensive problems such as designing an analytics app
- Achieving consistency across services using distribution transaction or event sourcing
- Particularly useful for solving problems that require strict transactions such as designing financial apps
- Full text search: full-text index
- Storing data for the long term: data warehousing
On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.
Working through problems and building solutions using the building blocks
Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.