The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Rate Limiter System Design

A rate limiter is a system designed to prevent a user from making too many requests to a service within a certain timeframe. This is used to protect services from abuse and can be implemented for various scopes, such as per user, per IP address, etc.

Functional Requirements

The client can be identified using either the cookie, user ID, or IP address.
If the throttling threshold is surpassed, the rate limiter will reject the request and send a "429 Too Many Requests" status code back to the client.
If the throttling threshold is not surpassed, the rate limiter will forward the request to the downstream server.
The rate limiter should "smooth out" traffic spikes. If there is a surge in traffic within a short period of time, the rate limiter should reject the surging requests.

Nonfunctional Requirements

Scale requirements:

100 million daily active users
Data retention is not needed

Other requirements:

Scalability: The rate limiter should handle a large number of requests. As the user base grows, the system should scale accordingly.
Reliability: It should accurately track and limit requests without errors.
Availability: The rate limiter should be highly available; otherwise, it could become a single point of failure.
Consistency: It should ensure all nodes see the same request count for each client to prevent exceeding the limit.
Latency: The overhead added by the rate limiter should be minimal, to not affect the system's performance.
Efficiency: It should have optimized resource usage to handle a high volume of requests.
Privacy: No specific privacy requirements for rate limiting, as it doesn't require storing sensitive data.

Resource Estimation

Assumptions

Each user makes 1,000 requests per day.
Each request and response are 1 KB in size.
We need to store each user's remaining capacity and the timestamp of their last request. Let's say that takes 50 bytes.

Traffic Estimates: For 100 million daily active users, each making 1,000 requests, if we assume an even distribution, that would roughly be around 1,150,000 requests per second.

Storage Estimates: N/A since data retention is not needed as stated in the requirement.

Bandwidth Estimates: If every write operation has an average size of 500 bytes, with 1 million write operations daily, we will have approximately 500MB of incoming data daily. For the read requests, given the 100:1 read to write ratio, there will be 100 million read requests daily. Therefore, the service will serve approximately 50GB of data daily for the read requests.

Memory Estimates for Cache: If we want to cache some of the hot URLs that are frequently accessed, and let's say we want to cache 20% of daily read requests, we would need to cache 20 million URLs. Assuming each URL takes 500 bytes in memory, we would need about 10GB of memory.

API Design

Our system can expose the following REST APIs:

POST /shouldAllowRequest

Request body:

{
  "clientId": "123",
  "timestamp": "2023-07-13T07:20:50.52Z"
}

Response:

{ "allowed": true }

{ "allowed": false }

High-Level Design

Design Rate Limiter

Incoming Request: The data flow starts when the client (e.g., a user's web browser) makes a request to the service. This request will include some form of client identifier, such as an IP address or user ID, so the rate limiter can track requests on a per-client basis.
Load Balancer: The request first hits the Load Balancer, which routes it to one of the Rate Limiter nodes. If we use sticky sessions based on the client identifier, all requests from the same client will be routed to the same Rate Limiter node.
Rate Limiter Node and In-memory Data Store: The Rate Limiter node receives the request and first checks its local cache (e.g., Redis) to retrieve the current count of requests for this client. If the client is not in the cache (which would be the case for the client's first request), the count is initialized to 1. If the client is in the cache, the count is incremented. Rate Check: The Rate Limiter then checks if the incremented count exceeds the rate limit. If it does, the Rate Limiter responds to the client's request with an error message indicating that they've exceeded their rate limit.
Forward to Service: If the count does not exceed the rate limit, the Rate Limiter forwards the request to the service. If the count does exceed the rate limit, the Rate Limiter returns 429 to the load balancer to return to the client.

Finally, the service processes the request and generates a response, which is sent back to the Rate Limiter. The Rate Limiter node then forwards the service's response back to load balancer and to the client.

Detailed Design

Which Rate Limiting Algorithm Should We Use?

The choice of a rate-limiting algorithm often depends on your specific use case and system requirements, including factors such as how strictly you need to enforce the rate limits and how much complexity you're willing to manage. Here are a few common rate-limiting algorithms and their trade-offs:

Fixed Window Counter

Grasping the building blocks ("the lego pieces")

This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.

Core Building blocks

At the bare minimum you should know the core building blocks of system design

Scaling stateless services with load balancing
Scaling database reads with replication and caching
Scaling database writes with partition (aka sharding)
Scaling data flow with message queues

System Design Template

With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:

System Design Template

Additional Building Blocks

Additionally, you will want to understand these concepts

Processing large amount of data (aka “big data”) with batch and stream processing
- Particularly useful for solving data-intensive problems such as designing an analytics app
Achieving consistency across services using distribution transaction or event sourcing
- Particularly useful for solving problems that require strict transactions such as designing financial apps
Full text search: full-text index
Storing data for the long term: data warehousing

On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.

Working through problems and building solutions using the building blocks

Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.

Pro Member Exclusive

Upgrade your account to continue

Benefits

Unlimited access to practice tool with AI grading

Unlimited access to expert-written solutions

Unlock 100+ lessons with full course access

Access to all future content while subscribed

The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

System Design Master Template

Comments

Ashish Gusain

Is Leaky bucket code correct ?

Sat May 17 2025

jugpreet singh

if remaining_capacity is None or last_request_time is None: This is meant to initialize data for new user. However, when a user has no remaining capacity, then we don't set that value in redis # If not, update the last request time but don't decrement the capacity r.hset(user_id, 'last_request_time', current_time) This will cause the initialization to happen again for the user and they get more requests to go through. So, it is not correct.

Mon May 26 2025

Nikita V

It looks like the Leaky Bucket code example is Token Bucket, again.

Fri May 16 2025

mrwarpigs

Noticed the requirements for the practice especification differs from the solution. In the solution there is no data retention, while it happens in the practice. No big deal though.

Tue Jun 11 2024

The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

On This Page

Functional Requirements Nonfunctional Requirements Resource Estimation API Design High-Level Design Detailed Design Which Rate Limiting Algorithm Should We Use?Data Store Database Type Database Partitioning Database Replication Data Retention and Cleanup Concurrency Analytics Sample Implementation