The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Design LeetCode

Difficulty: easy

Description: Design a coding contest platform like LeetCode.

Introduction

There is a question bank that stores thousands of programming questions. For each question, users can write, compile and submit code against test cases and get results. User submitted code needs to be persisted. Every week, there is a contest where people compete to solve questions as quickly as possible. Users will be ranked based on their accuracy and speed. We’ll need a leaderboard that shows rankings in real time.

Functional Requirements

Core Requirements

View Problems: Users should be able to view problem descriptions, examples, constraints, and browse a list of problems.
Submit Solution: Users should be able to solve coding questions by submitting their solution code and running it against built-in test cases to get results.
Coding Contest: User can participate in coding contest. The contest is a timed event with a fixed duration of 2 hours consisting of four questions. The score is calculated based on the number of questions solved and the time taken to solve them. The results will be displayed in real time. The leaderboard will show the top 50 users with their usernames and scores.

Out of Scope

Authentication
Authorization
User management
Contest history

Scale Requirements

Supporting 10k users participating contests concurrently
There is at most one contest a day
Each contest lasts 2 hours
Each user submits solutions 20 times on average
Each submission executes 20 test cases on average
User submitted code have a retention policy of 1 month after which they will be deleted.
Assuming that the storage space required for the solution of each coding question is 10KB.
Assuming that the read:write ratio is 2:1.

Non-Functional Requirements

High availability: the website should be accessible 24/7
High scalability: the website should be able to handle 10k users submitting code concurrently
Low latency: during a contest, the leaderboard should be updated in real time
Security: user code cannot break server

API Endpoints

GET /problems?start={start}&end={end}

Get problems in a range. If start and end are not provided, get all problems with default page size.

Response Body:

{
  problems: [
    { id: string, title: string, difficulty: string }
  ]
}

GET /problems/:problem_id

Get problem details, including description, constraints, examples, and starter code.

Response Body:

{
  id: string
  title: string
  difficulty: string
  description: string
  constraints: string
  examples: [{
    input: string
    output: string
  }]
  starter_code: {
    language: string
    code: string
  }
}

POST /problems/:problem_id/submission

Submit code for a problem and get results.

Request Body:

{
  user_id: string
  code: string
  language: string
}

Response Body:

{
  status: success | fail
  test_cases: [
    { status: success | fail | timeout }
  ]
}

GET /contests/:contest_id/leaderboard

Get leaderboard for a contest.

Response Body:

{
  ranking: [
    { user_id: string, score: number }
  ]
}

High Level Design

1. View Problems

Users should be able to view problem descriptions, examples, constraints, and browse a list of problems.

The design for viewing problems uses a standard three-tier architecture:

Client: Frontend application sending HTTP GET requests
Problems Service: Backend service with endpoints:
- GET /problems?start={start}&end={end}: List problems
- GET /problems/:problem_id: Get specific problem details
Database: Stores problem data (id, title, description, difficulty, examples, constraints)

Leetcode Design Diagram 0 View Problem

This simple setup efficiently handles problem viewing requests while maintaining separation of concerns.

2. Submit Solution

Users should be able to solve coding questions by submitting their solution code and running it against built-in test cases to get results.

To design the code evaluation flow for our LeetCode-like platform, we need to consider the high concurrency requirement of 10,000 users participating in contests simultaneously, each submitting solutions about 20 times on average. This is a classic use case using our design template.

We use a Code Evaluation Service as the entry point for code submissions. This service will be responsible for receiving code from users and initiating the evaluation process. However, to handle the high volume of submissions efficiently and securely, we introduce a Message Queue as a buffer between the submission process and the actual code execution.

Note that most system design questions requires microservices. For a simpler problem like this, we can use a single service to handle both problem service and code evaluation. However, code submission can be compute heavy and we do not want to bring down the entire service when handling a large number of submissions. Therefore, we opt to separate the two services.

Leetcode Design Diagram 1 Submit Solution

When a user submits a solution, the Code Evaluation Service pushes a message containing the submission details (user ID, problem ID, and code) to the Message Queue. This decouples the submission process from the execution process, allowing us to handle traffic spikes more effectively.

On the other side of the Message Queue, we have multiple Code Execution Workers. These workers continuously poll the queue for new submissions. When they receive a message, they fetch the necessary test cases from the Problems database, execute the code in a sandboxed environment, and record the results. Let's compare different execution options for the sandbox environment:

Code Execution Options

API Server

Not Suitable

Direct execution on the API server

How it works:

The user-submitted code is directly executed on the same server that hosts the API. The code is typically saved as a file on the server's filesystem and then run using the appropriate interpreter or compiler.

Pros:

Simple to implement
Low latency as there's no additional layer between the API and code execution

Cons:

Severe security risks: Malicious code can potentially access or modify server data
Poor isolation: A bug in user code could crash the entire server
Resource management issues: One user's code could consume all server resources

Conclusion:

Not suitable for running user-submitted code. The security and stability risks far outweigh the simplicity benefits.

We will choose Container (Docker) over serverless functions for the faster startup time and less vendor lock-in.

To ensure security and isolation, each submission runs in a separate container, preventing malicious code from affecting our system or other users' submissions. We can use technologies like Docker for this purpose.

After execution, the results (including whether the code passed all test cases, execution time, and memory usage) are saved in the Submissions table.

Async Result Retrieval

Since code execution can take time (especially for complex problems or when the system is under heavy load), the submission process is asynchronous. Here's how the frontend handles getting results:

Submission Response: When a user submits their solution, the Code Evaluation Service immediately returns a submission ID rather than waiting for execution to complete.
Polling Mechanism: The client uses this submission ID to poll the backend periodically (typically every 1-2 seconds) via an endpoint like GET /submissions/{submission_id}/status.
Status Updates: During polling:
- If the submission is still being processed, the endpoint returns a status like "pending" or "processing"
- Once execution completes, the endpoint responds with the final result, including:
  - Overall status (success, fail, timeout)
  - Individual test case outcomes
  - Execution details (time, memory usage)

This is actually how LeetCode handles the result retrieval. We can see it in the Chrome DevTools: LeetCode polling endpoint

What about WebSocket or Server-Sent Events?

WebSocket is overkill for simple status updates. It's designed for bidirectional, real-time communication, but we only need one-way status updates at predictable intervals.
SSE still requires keeping long-lived HTTP connections open, which can be resource-intensive when managing thousands of concurrent users during contests.

Since the polling load is predictable (after submission, frontend polls every 1-2 seconds only while waiting for results), the additional complexity of real-time connections isn't justified. Simple HTTP polling provides adequate user experience while being easier to implement.

3. Coding Contest

User can participate in coding contest. The contest is a timed event with a fixed duration of 2 hours consisting of four questions. The score is calculated based on the number of questions solved and the time taken to solve them. The results will be displayed in real time. The leaderboard will show the top 50 users with their usernames and scores.

A contest itself is bascially four problems with a time window. Running the contest itself is straightforward. We can use a centralized service to manage the contest, Contest Service. After a user submits a solution, it's graded by the Code Evaluation Service as described in the previous section. The results are then sent to the Contest Service to record the user's score in the database.

The key decision here is where to store the contest data, user's score, and the leaderboard. We need real time updates for the leaderboard and data changes frequently. We can use an in-memory database like Redis to store the leaderboard. Redis is a fast, in-memory database that is well suited for high performance, real time data processing. Check the deep dive below for more details on how the leaderboard is implemented.

Leetcode Design Diagram 2 Coding Contest

Deep Dive Questions

How does the Code Evaluation Service decides if a submission is correct?

The Code Evaluation Service decides if a submission is correct by comparing the output of the code with the expected output.

Here's an example of how this works:

Requirement:

Problem: Add two numbers
Input: 2, 3
Expected Output: 5
Time Limit: 1 second

User submitted code:

def add(a, b):
  return a + b

Test case files:

test_case_1.in

2, 3

test_case_1_expected_output.out

test_case_1.out

The Code Evaluation Service will run the code with the input and get an output file test_case_1.out. It will then compare test_case_1.out with test_case_1_expected_output.out. If they are the same, the submission is marked as correct. Otherwise, it's marked as incorrect.

If the code fails to execute, the submission is marked as failed. If the code execution time exceeds the time limit, the submission is marked as timeout.

The test cases files such as test_case_1.in and test_case_1_expected_output.out are stored in the database.

How to ensure security and isolation in code execution?

Running the code in a sandboxed environment like a container already provides a measure of security and isolation. We can further enhance security by limiting the resources that each code execution can use, such as CPU, memory, and disk I/O. We can also use technologies like seccomp to further limit the system calls that the code can make.

To be specific, to prevent malicious code from affecting other users' submissions, we can use the following techniques:

Resource Limitation: Limit the resources that each code execution can use, such as CPU, memory, and disk I/O.
Time Limitation: Limit the execution time of each code execution.
System Call Limitation: Use technologies like seccomp to limit the system calls that the code can make.
User Isolation: Run each user's code in a separate container.
Network Isolation: Limit the network access of each code execution.
File System Isolation: Limit the file system access of each code execution.

Here's sample command while running docker run ...:

--cpus="0.5" --memory="512M" --cap-drop="ALL" --security-opt="seccomp=unconfined" --network="none" --read-only

This limits the code execution to use at most 0.5 CPU, 512 MB of memory, and disable all capabilities.

How to implement a leaderboard that supports top N queries in real time?

Let's take a look our options:

Leaderboard Implementation Options

Redis Sorted Set

Very Suitable

In-memory data structure store, using ZSET for leaderboard

How it works:

Uses a sorted set data structure to maintain scores. Each user is a member of the set, with their score as the sorting criteria.

Pros:

Extremely fast read and write operations
Built-in ranking and range queries
Scalable and can handle high concurrency

Cons:

Data is volatile (in-memory), requiring persistence strategies
May require additional infrastructure setup

Conclusion:

Highly suitable for real-time leaderboards with frequent updates and queries.

We want to use a in-memory database like Redis because it's updated very frequently and older data can be discarded. Redis actually comes with a built-in leaderboard implementation. The data structure behind it is a sorted set (ZSET). Sorted sets combine unique elements (members) with a score, and they keep the elements ordered by the score automatically.

Sorted Set Internals

Internally, the sorted set is implemented as a hash map and a skip list. The hash map stores the mapping between the user IDs and the scores. The skip list keeps the scores in ascending order. Skip list is not a common data structure, so it's worth explaining it here.

Skip list: A skip list is a probabilistic data structure that allows for fast search, insert, and delete operations. It's a series of linked lists with additional pointers to intermediate nodes, allowing for efficient traversal and search operations. Functionally, it's similar to a balanced binary search tree, but it's implemented using linked lists and randomization. The time complexity of search, insert, and delete operations is O(log n) on average. The randomization is used to ensure that the skip list is balanced, so the performance is good even if the list is not balanced. This is similar to how binary search trees need to be balanced by rotating nodes. Finally, the skip list can return top N in O(N + log M) time where N is the number of elements to return and M is the number of elements in the list.

Here's an example of a skip list:

Level 3:  [ 1 ]-------------------------------------> [ 20 ]
            |                                           |
Level 2:  [ 1 ]-----------------> [ 7 ]-------------> [ 20 ]
            |                       |                   |
Level 1:  [ 1 ]-----> [ 3 ]-----> [ 7 ]------> [ 15 ]-[ 20 ]
            |           |           |             |     |
Level 0:  [ 1 ] [ 2 ] [ 3 ] [ 5 ] [ 7 ] [ 10 ] [ 15 ] [ 20 ] [ 25 ]

Levels: Higher levels have fewer elements, allowing "skips" to speed up searching. For example, level 3 can directly skip from node 1 to 20, avoiding unnecessary steps.
Nodes: Each node holds a value (e.g., 1, 3, 7, etc.). At level 0, every node is linked like a standard linked list.
Links: Vertical links connect nodes across different levels, while horizontal links connect nodes at the same level.

The search works by starting at the highest level and moving down as needed:

Start at the highest level (e.g., Level 3).
Move right until the next node’s value is greater than or equal to the target.
Move down a level if the next value is too large.
Repeat this process at each level, moving right as far as possible, then down, until the target is found or confirmed absent.

Example: Searching for 15

Start at node 1 on Level 3, drop down since 20 > 15.
On Level 2, move from 1 to 7, then drop down again.
On Level 1, move to 15 and stop.

This approach gives an average O(log n) time complexity by efficiently skipping unnecessary nodes.

The skip list is a series of linked lists with additional pointers to intermediate nodes, allowing for efficient traversal and search operations.

To get the leaderboard, i.e. get the top N users, we can go down to the level 0 and find the last element in the list, and then perform a backward iteration to get the top N users (the linked lists in a skip list are doubly linked).

In Redis, this is implemented as ZSET. We use ZADD to add a user and its score, and use ZRANGE to get the top N users.

Leaderboard Example

Here's a simple example:

ZADD leaderboard 100 user1
ZADD leaderboard 200 user2
ZADD leaderboard 300 user3
ZREVRANGE leaderboard 0 1 # returns the top 2 users

How to handle a large number of contestants submitting solutions at the same time?

Handling a large number of contestants submitting solutions simultaneously presents several challenges, particularly during peak times like the beginning and end of a contest. We can address this using a two-phase processing approach combined with other strategies:

Message Queue: Use a message queue to decouple the submission process from the execution process. This allows us to buffer submissions during peak times and process them as resources become available.
Pre-scaling: Since auto-scaling may not be fast enough to handle sudden spikes, pre-scale the execution servers before the contest starts. This is especially important for the first and last 10 minutes of a contest when submission rates are typically highest.
Two-Phase Processing:
- Phase 1 (During Contest): Implement partial evaluation
  - Run submissions against only about 10% of the test cases.
  - Provide a partial standing for users, giving them immediate feedback.
- Phase 2 (Post-Contest): Complete full evaluation
  - After the contest deadline, run the submissions on the remaining test cases.
  - Determine the official results based on full evaluation.

This two-phase approach:

Reduces the load on the system during the contest.
Allows for more accurate final results.
Is similar to the approach used by platforms like Codeforces for large contests.

This approach immediate feedback for users with efficient resource utilization. While it means that final results won't be available immediately after the contest ends, it allows for handling a much larger number of concurrent submissions with existing resources. The trade-off between real-time complete results and system scalability is often necessary for large-scale coding competitions.

The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

System Design Master Template

Comments

Ankit Jain

I think leaderboard should be a separate service. Contest should just handle showing and creating contest. Leaderboard service should independently get called from Evaluation service if the problem is part of a contest and the leaderboard should be calcualted accordingly.

Mon Aug 18 2025

Joe Mcerlane

One thing that seems missing is how the client gets the results from a submission how do we return the payload (see below) when we have now given the submission to an async process? { status: success | fail test_cases: [ { status: success | fail | timeout } ] }

Mon Jun 23 2025

SystemDesignSchool

After a user submits their solution, the system returns a submission ID, which the client uses to poll the backend periodically (e.g., every second) for the result. If the submission is still being processed, the polling endpoint returns a status like "pending". Once the execution is complete, the endpoint responds with the final result, including the overall status and individual test case outcomes in the payload.

Thu Jun 26 2025

Jorge Allwin

Interesting setup — I’ve noticed during online contests that high memory usage from browser tabs and IDEs can throttle performance, especially on mid-range systems. Before contests, I clear out background memory load with a small footprint RAM cleaner, which helps avoid lag during rapid code submissions and keeps things smoother overall. https://memoryreduct.com/mem-reduct-download/

Tue May 20 2025

Prakash Prabhu.M

Since we're using 2 phase processing technique, does that mean the leaderboard we display realtime has the possibility to change at the end when all test cases are evaluated?

Thu May 15 2025

Kshitij Sabale

yes

Sat May 17 2025

pankaj Modi

Skip list requires O(M+logK) time complexity and not O(M(logK))

Mon May 12 2025

The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

On This Page

Introduction Functional Requirements Core Requirements Out of Scope Scale Requirements Non-Functional Requirements API Endpoints High Level Design 1. View Problems 2. Submit Solution 3. Coding Contest Deep Dive Questions How does the Code Evaluation Service decides if a submission is correct?How to ensure security and isolation in code execution?How to implement a leaderboard that supports top N queries in real time?How to handle a large number of contestants submitting solutions at the same time?