A rate limiter is a system designed to prevent a user from making too many requests to a service within a certain timeframe. This is used to protect services from abuse and can be implemented for various scopes, such as per user, per IP address, etc.
Functional Requirements
- The client can be identified using either the cookie, user ID, or IP address.
- If the throttling threshold is surpassed, the rate limiter will reject the request and send a "429 Too Many Requests" status code back to the client.
- If the throttling threshold is not surpassed, the rate limiter will forward the request to the downstream server.
- The rate limiter should "smooth out" traffic spikes. If there is a surge in traffic within a short period of time, the rate limiter should reject the surging requests.
Nonfunctional Requirements
Scale requirements:
- 100 million daily active users
- Data retention is not needed
Other requirements:
- Scalability: The rate limiter should handle a large number of requests. As the user base grows, the system should scale accordingly.
- Reliability: It should accurately track and limit requests without errors.
- Availability: The rate limiter should be highly available; otherwise, it could become a single point of failure.
- Consistency: It should ensure all nodes see the same request count for each client to prevent exceeding the limit.
- Latency: The overhead added by the rate limiter should be minimal, to not affect the system's performance.
- Efficiency: It should have optimized resource usage to handle a high volume of requests.
- Privacy: No specific privacy requirements for rate limiting, as it doesn't require storing sensitive data.
Resource Estimation
Assumptions
- Each user makes 1,000 requests per day.
- Each request and response are 1 KB in size.
- We need to store each user's remaining capacity and the timestamp of their last request. Let's say that takes 50 bytes.
Traffic Estimates: For 100 million daily active users, each making 1,000 requests, if we assume an even distribution, that would roughly be around 1,150,000 requests per second.
Storage Estimates: N/A since data retention is not needed as stated in the requirement.
Bandwidth Estimates: If every write operation has an average size of 500 bytes, with 1 million write operations daily, we will have approximately 500MB of incoming data daily. For the read requests, given the 100:1 read to write ratio, there will be 100 million read requests daily. Therefore, the service will serve approximately 50GB of data daily for the read requests.
Memory Estimates for Cache: If we want to cache some of the hot URLs that are frequently accessed, and let's say we want to cache 20% of daily read requests, we would need to cache 20 million URLs. Assuming each URL takes 500 bytes in memory, we would need about 10GB of memory.
API Design
Our system can expose the following REST APIs:
POST /shouldAllowRequest
Request body:
{
"clientId": "123",
"timestamp": "2023-07-13T07:20:50.52Z"
}
Response:
{ "allowed": true }
or
{ "allowed": false }
High-Level Design
-
Incoming Request: The data flow starts when the client (e.g., a user's web browser) makes a request to the service. This request will include some form of client identifier, such as an IP address or user ID, so the rate limiter can track requests on a per-client basis.
-
Load Balancer: The request first hits the Load Balancer, which routes it to one of the Rate Limiter nodes. If we use sticky sessions based on the client identifier, all requests from the same client will be routed to the same Rate Limiter node.
-
Rate Limiter Node and In-memory Data Store: The Rate Limiter node receives the request and first checks its local cache (e.g., Redis) to retrieve the current count of requests for this client. If the client is not in the cache (which would be the case for the client's first request), the count is initialized to 1. If the client is in the cache, the count is incremented. Rate Check: The Rate Limiter then checks if the incremented count exceeds the rate limit. If it does, the Rate Limiter responds to the client's request with an error message indicating that they've exceeded their rate limit.
-
Forward to Service: If the count does not exceed the rate limit, the Rate Limiter forwards the request to the service. If the count does exceed the rate limit, the Rate Limiter returns 429 to the load balancer to return to the client.
Finally, the service processes the request and generates a response, which is sent back to the Rate Limiter. The Rate Limiter node then forwards the service's response back to load balancer and to the client.
Detailed Design
Which Rate Limiting Algorithm Should We Use?
The choice of a rate-limiting algorithm often depends on your specific use case and system requirements, including factors such as how strictly you need to enforce the rate limits and how much complexity you're willing to manage. Here are a few common rate-limiting algorithms and their trade-offs:
Fixed Window Counter
Grasping the building blocks ("the lego pieces")
This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.
Core Building blocks
At the bare minimum you should know the core building blocks of system design
- Scaling stateless services with load balancing
- Scaling database reads with replication and caching
- Scaling database writes with partition (aka sharding)
- Scaling data flow with message queues
System Design Template
With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:
Additional Building Blocks
Additionally, you will want to understand these concepts
- Processing large amount of data (aka “big data”) with batch and stream processing
- Particularly useful for solving data-intensive problems such as designing an analytics app
- Achieving consistency across services using distribution transaction or event sourcing
- Particularly useful for solving problems that require strict transactions such as designing financial apps
- Full text search: full-text index
- Storing data for the long term: data warehousing
On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.
Working through problems and building solutions using the building blocks
Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.