Back to Problems
Solution
Functional RequirementProblem decompositionSubproblem 1: Finding nearby driversSubproblem 2: Drivers’ confirmation for a new rideRacing conditions from driversFAQQuick feedback survey
The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

Design a Ride-Hailing Service Like Uber

Let us design a ride hailing App like Uber from scratch. We will only design the core flow of servicing a ride. This include both the rider and driver’s experience – the user request a ride; our service find a driver; the driver serves it. We will focus on the hard and meaty design problems specific to a ride hailing App. For readers unfamiliar with the building blocks and basic concepts, we will provide reference to their corresponding materials.

Design Uber

Functional Requirement

Below is the high level overview of the life of a ride.

  1. The rider send the ride request information to our backend service. The request consists of

    • Rider’s pick up location
    • Rider’s destination
  2. The backend service find a driver to serve the ride and send the ride information to the driver.

    2.1 The service first find a list of nearby drivers available

    2.2 Then send the ride information to all available drivers

    2.3 Pick the first confirmed driver

    2.4 Send the ride information to the driver

  3. The rider and driver both goes to the pick up location. In the meantime, the rider receives realtime update of the driver’s location.

Design Uber Functional Requirement

Problem decomposition

Problem decomposition is a key aspect of system design. It involves breaking down a complex problem into smaller, more manageable subproblems. One way to break down is by listing the operation flow and identifying the complex or ambiguous steps, each such steps is a subproblem. Let’s review the functional requirements in the last section, the flow can be specified as below.

  1. The Rider sends ride request to the backend.
  2. Backend received the ride request.
  3. The backend finds nearby drivers and send each of them the ride request.
  4. Many drivers confirm the request.
  5. Among those confirmed, the backend selects one and only one driver to serve the ride.
  6. The selected driver goes to the pick up location.
  7. The rider receives the driver’s location update in real time.

The obvious complex steps are #3, #5, and #7. Therefore, we turn the into three subproblems.

[S1] Locating nearby drivers: Efficiently finding available drivers near the rider's pickup location.

[S2] Managing driver confirmations: Handling multiple drivers confirming the same ride request simultaneously.

[S3] Real-time location updates: Providing the rider with accurate and up-to-date information about the driver's location.

The other steps are common place, simple, and clearly defined. When S1, S2, and S3 are solved, they will be plugged in the above flow to form the final solution. In the following sections, we will delve into the implementation details of each subproblem and highlight how we overcome the key challenges involved.

Building blocks

The design uses simple building blocks from our basic class. First, we will explain what each element does on its own. Then, we will show how they work together to implement each subproblems and handle scalability challenges. We call each element [A], [B], and so on; the endpoints where they receive requests are called [X1], [X2], ..., [Xn].

[Z] Driver and Rider’s mobile App

Design Uber Mobile App

Mobile apps perform three functions: they accept user input, display status updates to the users, and automatically manage failures, retries, and other details on behalf of the users. As clients, they do not have any endpoints, but they connect to the endpoints of WebSocket servers, as described below.

[A] Driver web socket service

Design Uber Rider Websocket Service

[A] is a WebSocket (WS) service that acts as the bridge between the driver’s App and our backend services. Our backend services need to push new ride requests to the driver and get realtime response. Meanwhile the driver App provides information about driver’s status (location, app closed, etc). WebSocket is ideal for such constant bidirectional communication. [A] has two end points.

  • [A1] accepts WebSocket connection from the driver’s App.
  • [A2] accepts ride requests from Ride Request Server (component [D]).

[B] Rider web socket service

Design Uber Rider Websocket Service

[B]’s role is similar to [A] but for the riders. There are two-way communication between the rider App and the servers. The rider App need to receive asynchronous responses about the request result, driver’s new locations, payment results, etc. [B] has two end points.

  • [B1] accepts WebSocket connection from the rider’s App.
  • [B2] accepts driver location updates from Location Update Server (component [C]).

[C] Location update service

Design Uber Location Update Service

This service receives driver’s location update from [Z] driver’s App and process them. Therefore it has one endpoint:

  • [C1] receives the location updates from driver’s app.

Upon receiving each update, it respond 200 OK to the App immediately. The actual processing of the location update is done by background process/thread pool running on each server. The processing has two parts:

  • Update the Driver Location Realtime Service (component [E]) about this new location
  • Query On going ride DB (component [F]) to see if the given driver_id is serving a ride. If so, send the Rider’s WS which in turn push the new location to the rider.

Each request is served on a best effort basis. If there are any failures during the processing, we will just log the error and drop the request. This is acceptable because each update is uncritical and transient errors won’t stop later location updates from the driver to be processed correctly.

[D] Ride Request Serice

Design Uber Rider Request Service

Ride Request Server acts as the broker between rider and drivers. It receives the ride request from rider’s App, then query Driver Location Realtime Service (Component [E]) to find available drivers close to the pick up location. After that, it send the rider request to those drivers. The driver confirmed first will serve the ride. [D] then update the ride status to [B] and register the <drive_id, rider_id> pair to the On going ride DB (Component [F]). Therefore it has two end points.

  • [D1] accepts new ride requests from [B] the rider’s WS service.
  • [D2] accepts ride serving confirmation from [A] the driver’s WS service.

[D] need to consider edge cases including race conditions from drivers and random failures. We will discuss about those cases in the contents below.

[E] Driver Location Realtime Service

Design Uber Driver Location Realtime Service

This service serves internal queries about drivers’ locations. The server has a in-memory map from driver_id to an coordinate of <x, y> that denote the last updated driver location. This service need to be sharded to store the large number of online drivers. It also needs an efficient index structure to serve queries like “finding all drivers within 5 miles of location <x, y>.” [E] has two endpoints for the above read and write accordingly.

  • [E1] answers query about the nearby drivers IDs for an input location and radius limit.
  • [E2] updates a new driver’s new location.

We will discuss more about its sharding and index later on.

[F] On going ride DB

Design Uber On-going DB

This is the DB that stores on going ride. Each ride has driver_id and rider_id, besides other fields. This table is used in the following cases

  • [F1] appends the new ride to the table in [F]. This is used by [D] Rider Request Service.
  • [F2] returns the ride status for a given driver_id. For example, after the driver accepted a new ride, the corresponding WS (component [A])need to query this DB to ensure that the driver’s confirmation is effective and she is expected to serve the ride.

Complete architecture

Below is the integration of the aforementioned building blocks.

Design Uber Complete Design Architecture

Let’s now dive into the details of how do they corporate together to solve the three subproblems we defined previously.

Subproblem 1: Finding nearby drivers

The hard part is [E1] that answers queries for nearby drivers. In API form, [E] needs to implement a RPC with the following signature. The return result is a list of driver IDs.

GetNearyByDrivers(x: Int, y: Int, radius: Float) → List[Int]

Design Uber Find Nearby Drivers

The flow is straightforward.

  1. Driver App periodically update the driver’s location to [E]. (See the section of Subproblem: Driver location realtime update for more details.)
  2. [D] query [E1] to get a list of nearby drivers.

As we mentioned in the high level design section, this API need to solve two challenges.

  • Accommodate the large number of online drivers.
  • Provide a reasonable speed for each API call.

The large number of drivers

[E] need to store all drivers’ location. When the number of driver is large, a single server can’t hold all the data. Sharding is the typical solution and our choice. The problem is how to shard? Intuitively, we should take advantage of the fact that the API of GetNearyByDrivers returns nearby ****drivers to <x, y>. Therefore, one way of sharding goes like this:

  1. Model the physical world as a 2D plane and divide each place to be many squares of 2 miles by 2 miles
  2. Get an estimation about the average number of drivers in each square. This can be done by historical data when our riding service ramps up or by public data.
  3. Specify a range, [low, high], for the number of drivers in each shard. The range depends on the serving capacity of each physical server. Each server should be able to hold multiple shard: In case our service grow and the number of drivers grow in each shard, re-sharding will be easier. For squares that has higher number, divide it until the number of each square is small enough. For squares that has lower numbers, combine adjacent ones until the result number is big enough.
  4. For each query of <x, y, radius>, the service will find all shards that could have nearby drivers. (The algorithm for answering the query is in the next section.)Then it query each shard and combine the results as the response. In the illustration below, the example query is served by asking three shards.

Design Uber Find Nearby Drivers Illustration for sharding of GetNearByDriver API. Each red dot is a driver, the green circle is an example of a query for nearby drivers of <x, y>

Illustration for sharding of GetNearByDriver API. Each red dot is a driver, the green circle is an example of a query for nearby drivers of <x, y>

Notice that there is possibly more optimized and sophisticated sharding methods – hence more complicated – but the above method is sufficient and possibly best for bootstrapping a new product or job interviews.

<aside> đź’ˇ **Design pattern knowledge card: Sharding**

Sharding is the “standard” solution for too large dataset that can’t be stored or queried by one machine. It divides the dataset in someway so that the whole dataset is stored on many machines.

</aside>

Fast API speed

To response fast, each shard need to find nearby drivers to <x, y> fast enough. When the shard is small enough, we can do a linear search to find drivers close enough. Otherwise it requires an efficient indexing structure.

Like sharding, below is a indexing not optimal for mature businesses that need to exploit the last performance potential, but sufficient for early stage products and job interviews.

  1. Use a binary search tree to store the coordinates of x axis and y axis separately. Each driver location has an entry in the two indexes.
  2. For each query, translate into two ranges [Xmin, Xmax], [Ymin, Ymax] so that drivers in the specified box could be close to <x, y> enough.
  3. Do [Xmin, Xmax] range serach on the x-axis index and [Ymin, Ymax] on the y-axis range. Get the intersection of the two results.
  4. Filter each location in the intersection and return those whose distance to <x, y> is less than the given radius.

High API throughput

Another challenge is that the incoming query volume become too high for a server to handle. This will result in denial of service even when each query can be served very fast. Again, just like sharding is the classic solution for data size, the classic solution here is replication.

We will have multiple server that answer queries for the same shard. This replication will result in inconsistencies between each copies. That means the same query will get different results – some with more updated driver location some more outdated. This is OK given the approximation nature of this API and the ever changing nature of each driver’s location.

<aside> 💡 **Design pattern knowledge card: Replication** When the query speed is fast enough but the request volume is too high to be handled by one machine, the design pattern is replication. It’s an intuitive idea: replicate the server so that the request volumes are shared among them. Replication is recognized in cache, load balancing, master-slave etc. They are different forms of replication. </aside>

Subproblem 2: Drivers’ confirmation for a new ride

Design Uber Driver Confirmation

After getting the driver ID list from [E], the Rider Request Service [D] will send the rider information to each driver’s App. Then the driver decide whether to accept the ride request. Those who accept will send the confirmation to the [D2] endpoint. This can be described as following.

  1. [D] send the new ride’s data to each driver’s connected WebSocket – the [A2] endpoint.

  2. Some drivers choose to accept on their App.

  3. Their [A] WS service calls the [D2] endpoint to deliver the driver’s confirmation.

  4. [D] process the driver’s confirmation

    4.1 Query [F], via [F1] endpoint, to see if the ride is still pending driver confirmation.

    4.2 If so, update [F] and set the driver_id for the given ride.

  5. For those driver whose confirmed, their WS periodically calls [F2] to check the ride’s status. When the status is not pending driver confirmation, it notify the driver to serve if the confirmed driver’s ID is identical to the current driver. Otherwise, it notify the driver that the ride is aborted.

While the flow is easy to define, several issues arise because the distributed nature of the coordinating severs.

  1. Multiple drivers could confirm concurrently, yet we can have only one driver to serve the ride.
  2. [D] could crash during the process or lose connection to [A], [B], [E], or [F]. Left as it is, this could result in unacceptable experience for the users like the driver arrives at the pick up location only to find the user has left or canceled the ride.

Racing conditions from drivers

Let’s look at the first one. Firstly, we use an ride ID generated from rider’s App, like an UUID, to distinguish each ride. Through the life of this ride, the ID will be unique.

To serve each confirmation in [D2], we try to do a transaction on F (F1 endpoint). The transaction has two parts

  • Query [F] to find the status of the given ride ID
  • If the ride is not being served by any drivers, change the status to be on going and add the rider_id; Otherwise, abort the transaction.

In essence those racing requests are serialized by the DB’s writer. While this solves the racing condition, too many concurrent writes can jeopardize the DB. To avoid that, we can put a message queue or some other buffering writer in front of DB. We won’t go to details here because it should be straightforward after reading our master template.

After sending the confirmation via [D2], [A] will query [F2] to check the status of the rider until it knows whether the ride should be served by the given driver, then it send information to [A] accordingly.

Fault recovery with eventual consistency

Let’s look at the second challenge which is fault recovery. Fault-recovery is a common topic in system design and usually the real questions behind the interviews.

The Rider App must have a retry logic after sending the ride request. We won’t go into details of how it works because it’s not critical to this design. Devoted readers should use it as an exercise after learning this material. In a nutshell, after sending the ride request to [D], the Rider App [Z] will constantly query [F] to make sure the ride is being served. If there’s no entry in [F] by the ride ID, [Z] will try to connect with [D] and request with the identical ride (same ride ID). In this way, the ride will eventually be served. But such retry could result in inconsistencies like two drivers are trying to serve the same ride, each of them are confirmed with a different instance of [D].

There are other failure scenarios. For example, Rider App will retry with the same ride ID when it lost connection to [B], [B] will also retry when it lose connection to [D]. Therefore it’s possible that multiple [D] instance are trying to serve the same ride ID.

Our setup can handle this case easily because [F] serve as a single source of truth. Only one driver will be in the DB table for the given ride ID. For this reason, only that driver’s App will direct the driver to the pick up location.

<aside> đź’ˇ **Design pattern knowledge card: Write-process-read** The driver confirmation implementation is a classic design pattern of write-process-read. The general form is below. 1. Client send an update request to the server. 2. Server respond immediately that the request is registered. 3. A backend server different than the one in #2 begin the processing of the request. 4. In the meantime, the client periodically query another server to read the result of the processing.

Notice there are four roles:

  • The client
  • The request receiving server
  • The request processing server
  • The query server for the read requests

This pattern is the solution for most write request because it beautifully breaks down a data-critical and potentially complex, slow process into easy and reliable parts. It makes the system easy to achieve the following requirements:

  • Fast response to the client.
  • Eventual consistency from fault recovery.
  • Separate of concern and therefore separate scaling

This pattern is harder to grasp, but you will feel confident about it after reading more solution examples.

</aside>

Subproblem 3: Driver location realtime update

Design Uber Driver Location Realtime Update

The flow of realtime driver location update:

  1. The driver App periodically send driver’s location to the connected WS (the [A1] endpoint). For example <x, y, driver_id>

  2. [A] will relay the update data to [C] Location Update Service, the [C1] endpoint

  3. [C] will respond 200 OK to [A] immediately after receiving the data.m

  4. [C] then put the location data to a queue. A pool of worker will pick it up and process in the following routine:

    4.1 Send the data to [E2] endpoint so that the Driver Location Realtime Service can update the driver’s location

    4.2 Query [F2] to find if the given driver_id is in an ongoing drive. If so, send a request to [B2] endpoint so that the WebSocket server will push the new location to the rider’s App.

Async update processing

In step 4, we use a queue to process the update so that request C1’s receiving won’t be blocked by the time consuming processing. This will allow [C1] to have a very high throughput when income volume is high. The processing speed will be slow; that means the rider’s update will have higher latency. But without this optimization, the alternative is to lose request update which means some unlucky rider may see not driver movement for a long time. It’s a worse experience than higher latency.

Errors are ignored

In the whole location update flow, any errors are logged so that monitoring system can pick up systematic error pattern anomalies. However, we don’t try to recover or retry when any error happens. Transient errors are just dropped. This is because in step 1, the update is periodical, they are “retried” already.

Other essential questions

The subproblems should be the core content of the design. We addressed the hard questions from each one. They make the majority of the discussion during a system design interview. Below are some other essential questions that could be covered in a interview.

Scaling Rider Request Service

Load balancing will work fine here. Instances of [D] are identical and not sharded.

Location update service scalability

The location update server itself can be load balanced in any method like round-robin.

Ride request service scalability

This service is stateless “in nature” so it can be load balanced like the above location update service. It’s counter intuitive to say it is stateless because its flow is a multistep process and involves writing to DB. However, because DB update – appending the new ride row – is the last step in the processing and the only data writing. It’s stateless.

In fact, most Internet scale services are implemented by pure stateless components. That’s why serverless cloud computing like AWS Lambda gained so much traction.

Web socket server scalability

What if there are too many drivers or riders than one web socket server can handle?

We can load balance and distribute the user’s connection based on user_id shard. Notice that load balance without sharding won’t work because our servers need to communicate with the user App, therefore, they must know which WS server the user is on – if she is online.

How to add more shards?

The clean solution is to have a sharding service that stores the map from user_id to shard_id. Other methods will depend on syncing the client side and server side sharding algorithms, while it saves one service, in the long run, it will be more expensive to maintain and make other parts of the system hard to extend.

Ride ID generation

We use the rider App to generate ID because the rider App query the ride status and initiates the retries when necessary. If the ID is generated from our backend, there will be two main disadvantages:

  • The ID generation becomes a single point of failure. Obviously, the whole system can’t function without ride ID.
  • A new service is complex to maintain and operate.

Generating on client has disadvantages too. For example, it’s prone to security attack; we need to balance between ID collision and ID length – usually UUID will work, but it’s long. We choose to use client generated ID for simplicity which is usually the priority when starting a new project. However, the choice is debatable.

DB scalability

To simplify the already complex design, our solution read and write to DB directly. In real production environment, this is uncommon, to say the least. We usually have a message queue to buffer and serialize the write requests and a cache to scale the read requests. That’s part of our master template.

A more realistic [F] should be made of the following structure.

Design Uber Realistic F

FAQ

Why not process driver location update in [A] Driver’s WebSocket Server?

It’s a good practice (principle) to separate request accepting and request processing. This principle makes system failure harder to happen and maintaining easier. For example, if location update request processing goes into the WS service, then each time bugs in the processing logic caused the server to crash, the driver lost all connections to our backend! This makes deployment much harder. Our original design doesn’t have this issue – bugs are local to location update server but the whole system continues to function.


Quick feedback survey

Thanks for reading. It would be amazing if you could take 10s and fill out this survey to help us understand how we can write better content for you. Thank you!