Functional Requirements
We will focus on the core functionalities:
- Ride Request: Users should be able to request a ride by providing their location and destination. The system should find the nearest available driver to fulfill the ride request.
- Driver Tracking: When a rider is matched to a driver, the system should be able to track the real-time location of drivers and update their status (available, busy, offline) and update the rider accordingly.
Non-functional Requirements
- 100M Daily Active Users
- Read:write ratio = 10:1
- Data retention for 5 years
- Assuming 10 million ride requests per day
- Assuming each ride (including all data information related to the ride) is about 1KB
Using our resource calculator, we get about 1000 read RPS and 100 write RPS.
Since this is one of the hard system design questions, we won't spend too much time on these cookie-cutter calculations and will jump into the designs which are more interesting.
High-level Design
Overview
This design diagram may be intimidating at first with arrows going seemingly in all directions. Let's break it down into pieces, look at each one, and go through the sequence diagram of data flow so it will all make sense.
Entities
To satisfy our key functional requirements, we'll need the following entities:
1. Rider
- RiderID (Primary Key)
This table stores information about users who use the platform to request rides. It includes personal information such as name and contact details, and preferred payment methods for ride transactions.
2. Driver
- DriverID (Primary Key)
- Status (Available, Busy, Offline)
This table stores information specific to users who are registered as drivers on the platform and provide transportation services. It includes their personal details, vehicle information (make, model, year, etc.), preferences, and availability status.
3. Ride
- RideID (Primary Key)
- RiderID (Foreign Key from User)
- DriverID (Foreign Key from Driver)
- Status (Requested, In Progress, Completed, Cancelled)
This entity represents an individual ride from the moment a rider requests an estimated fare all the way until its completion. It records all pertinent details of the ride, including the identities of the rider and the driver, vehicle details, state, the planned route, the actual fare charged at the end of the trip, and timestamps marking the pickup and drop-off.
4. Location
- LocationID (Primary Key)
- DriverID (Foreign Key from Driver)
- Latitude
- Longitude
This entity stores the real-time location of drivers. It includes the latitude and longitude coordinates, as well as the timestamp of the last update. This entity is crucial for matching riders with nearby drivers and for tracking the progress of a ride.
5. RideRequest
- RequestID (Primary Key)
- RiderID (Foreign Key from User)
- Status (Pending, Accepted, Declined)
This table logs ride requests made by riders, tracking the request's status from pending to accepted or declined.
We can store everything in a SQL database but we should store location in an in-memory database it requires frequent read and write.
Components
- Rider App: The app riders use to request rides and get updates about their ride.
- Driver App: The app drivers use to get ride requests and update their location.
- Load Balancer and Firewall: This makes sure the WebSocket connections are always available and secure by distributing traffic and blocking unauthorized access.
- Rider WebSocket Service: Manages real-time communication between the Rider App and the backend services.
- Driver WebSocket Service: Manages real-time communication between the Driver App and the backend services.
- Ride Matcher: The main service that handles ride requests, finds available drivers, and updates ride statuses.
- Ride DB: A database that stores information about rides, like their status, driver assignments, and details.
- Location Service: Tracks and manages the real-time locations of drivers.
- Location DB (in-memory): A fast in-memory database that stores current driver locations for quick access.
You might notice two dedicated WebSocket services: one for riders and one for drivers. This is crucial for maintaining a live feed of driver locations, which is essential for both riders (to track their ride's progress) and Uber itself (to manage its fleet effectively). This is where WebSocket excels - two way communication. We have dedicated service for them because they are user-facing request handlers that would scale differently than the other services such as Rider Matcher.
WebSocket is what you would typically suggest in a system design interview. In production, Uber actually uses a more modern technology - QUIC/http3. We will cover this in the detailed design section as well comparing SSE, WebSocket and long polling.
Also note that even though they are called "WebSocket Service", they are essentially request handlers that handle HTTP REST APIs too.
Data Flow and Interactions
1. Driver Sign on and Sends Its Location
When a driver comes online it needs to start sharing its location data with Uber so the system can match it with nearby riders. The sequence of operations are:
- Establish WebSocket Connection: Driver App establishes a WebSocket connection with the Driver WebSocket Service.
- Send Location (Every Few Seconds): Driver App sends the driver's current location to the Driver WebSocket Service.
- Forward Location to Location Service: Driver WebSocket Service forwards the driver's location data to the Location Service.
- Update Driver Location: Location Service updates the driver's location in the Location DB.
2. Rider Requesting a Ride and Rider Matching
Grasping the building blocks ("the lego pieces")
This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.
Core Building blocks
At the bare minimum you should know the core building blocks of system design
- Scaling stateless services with load balancing
- Scaling database reads with replication and caching
- Scaling database writes with partition (aka sharding)
- Scaling data flow with message queues
System Design Template
With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:
Additional Building Blocks
Additionally, you will want to understand these concepts
- Processing large amount of data (aka “big data”) with batch and stream processing
- Particularly useful for solving data-intensive problems such as designing an analytics app
- Achieving consistency across services using distribution transaction or event sourcing
- Particularly useful for solving problems that require strict transactions such as designing financial apps
- Full text search: full-text index
- Storing data for the long term: data warehousing
On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.
Working through problems and building solutions using the building blocks
Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.