The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

System Design Uber, Lyft

Functional Requirements

We will focus on the core functionalities:

  • Ride Request: Users should be able to request a ride by providing their location and destination. The system should find the nearest available driver to fulfill the ride request.
  • Driver Tracking: When a rider is matched to a driver, the system should be able to track the real-time location of drivers and update their status (available, busy, offline) and update the rider accordingly.

Non-functional Requirements

  • 100M Daily Active Users
  • Read:write ratio = 10:1
  • Data retention for 5 years
  • Assuming 10 million ride requests per day
  • Assuming each ride (including all data information related to the ride) is about 1KB

Using our resource calculator, we get about 1000 read RPS and 100 write RPS.

Since this is one of the hard system design questions, we won't spend too much time on these cookie-cutter calculations and will jump into the designs which are more interesting.

High-level Design

Overview

Uber System Design Diagram

This design diagram may be intimidating at first with arrows going seemingly in all directions. Let's break it down into pieces, look at each one, and go through the sequence diagram of data flow so it will all make sense.

Entities

To satisfy our key functional requirements, we'll need the following entities:

1. Rider

  • RiderID (Primary Key)

This table stores information about users who use the platform to request rides. It includes personal information such as name and contact details, and preferred payment methods for ride transactions.

2. Driver

  • DriverID (Primary Key)
  • Status (Available, Busy, Offline)

This table stores information specific to users who are registered as drivers on the platform and provide transportation services. It includes their personal details, vehicle information (make, model, year, etc.), preferences, and availability status.

3. Ride

  • RideID (Primary Key)
  • RiderID (Foreign Key from User)
  • DriverID (Foreign Key from Driver)
  • Status (Requested, In Progress, Completed, Cancelled)

This entity represents an individual ride from the moment a rider requests an estimated fare all the way until its completion. It records all pertinent details of the ride, including the identities of the rider and the driver, vehicle details, state, the planned route, the actual fare charged at the end of the trip, and timestamps marking the pickup and drop-off.

4. Location

  • LocationID (Primary Key)
  • DriverID (Foreign Key from Driver)
  • Latitude
  • Longitude

This entity stores the real-time location of drivers. It includes the latitude and longitude coordinates, as well as the timestamp of the last update. This entity is crucial for matching riders with nearby drivers and for tracking the progress of a ride.

5. RideRequest

  • RequestID (Primary Key)
  • RiderID (Foreign Key from User)
  • Status (Pending, Accepted, Declined)

This table logs ride requests made by riders, tracking the request's status from pending to accepted or declined.

We can store everything in a SQL database but we should store location in an in-memory database it requires frequent read and write.

Components

  1. Rider App: The app riders use to request rides and get updates about their ride.
  2. Driver App: The app drivers use to get ride requests and update their location.
  3. Load Balancer and Firewall: This makes sure the WebSocket connections are always available and secure by distributing traffic and blocking unauthorized access.
  4. Rider WebSocket Service: Manages real-time communication between the Rider App and the backend services.
  5. Driver WebSocket Service: Manages real-time communication between the Driver App and the backend services.
  6. Ride Matcher: The main service that handles ride requests, finds available drivers, and updates ride statuses.
  7. Ride DB: A database that stores information about rides, like their status, driver assignments, and details.
  8. Location Service: Tracks and manages the real-time locations of drivers.
  9. Location DB (in-memory): A fast in-memory database that stores current driver locations for quick access.

You might notice two dedicated WebSocket services: one for riders and one for drivers. This is crucial for maintaining a live feed of driver locations, which is essential for both riders (to track their ride's progress) and Uber itself (to manage its fleet effectively). This is where WebSocket excels - two way communication. We have dedicated service for them because they are user-facing request handlers that would scale differently than the other services such as Rider Matcher.

WebSocket is what you would typically suggest in a system design interview. In production, Uber actually uses a more modern technology - QUIC/http3. We will cover this in the detailed design section as well comparing SSE, WebSocket and long polling.

Also note that even though they are called "WebSocket Service", they are essentially request handlers that handle HTTP REST APIs too.

Data Flow and Interactions

1. Driver Sign on and Sends Its Location

Uber System Design Driver Location Update Flow

When a driver comes online it needs to start sharing its location data with Uber so the system can match it with nearby riders. The sequence of operations are:

  • Establish WebSocket Connection: Driver App establishes a WebSocket connection with the Driver WebSocket Service.
  • Send Location (Every Few Seconds): Driver App sends the driver's current location to the Driver WebSocket Service.
  • Forward Location to Location Service: Driver WebSocket Service forwards the driver's location data to the Location Service.
  • Update Driver Location: Location Service updates the driver's location in the Location DB.

2. Rider Requesting a Ride and Rider Matching

Uber System Design Driver Rider Matching Flow

  • Request a Ride: Rider App sends a ride request to the Rider WebSocket Service.
  • Forward Ride Request: Rider WebSocket Service forwards the request to the Ride Matcher.
  • Create Ride Request: Ride Matcher creates a new ride entry in the Database with status created.
  • Match Nearby Driver: Ride Matcher queries the Location Service to find nearby drivers.
  • Find Nearby Drivers: Location Service retrieves driver locations from the Location DB.
  • Respond with Matched Driver: Location Service sends matched driver info to the Ride Matcher.
  • Update Ride Status: Ride Matcher updates ride status to in_progress and assigns the driver in the Database.
  • Respond to WebSocket Service: Ride Matcher sends matched driver and ride info back to the Rider WebSocket Service.
  • Respond to Rider App: Rider WebSocket Service sends driver and ride info to the Rider App.

3. Driver Updates Location

Uber System Design Driver Location Update to Rider Flow

  • Establish WebSocket Connection: Driver App establishes a WebSocket connection with the Driver WebSocket Service.
  • Send Location (Every Few Seconds): Driver App sends the driver's current location to the Driver WebSocket Service.
  • Forward Location to Location Service: Driver WebSocket Service forwards the driver's location data to the Location Service.
  • Update Driver Location: Location Service updates the driver's location in the Location DB. Note that this is for the system to keep track of the fleet status. We don't actually need this step for the rider to see the driver's location.
  • Send Driver Location to Rider WebSocket Service: Location Service sends the driver's location to the Rider WebSocket Service.
  • Forward Location to Rider App: Rider WebSocket Service sends the driver's location to the Rider App.

Detailed Design and Deep Dives

WebSocket Services

As mentioned above, you might notice two dedicated WebSocket services: one for riders and one for drivers. This is crucial for maintaining a live feed of driver locations, which is essential for both riders (to track their ride's progress) and Uber itself (to manage its fleet effectively). Sending from client to server is easy and standard. Sending from server to client though, requires more thining. Here are a few ways to do it:

WebSockets:

  • Overview: WebSockets provide a full-duplex communication channel over a single, long-lived connection established via an HTTP handshake. This allows for real-time, bidirectional data exchange between client and server.
  • Pros: Low latency, real-time bidirectional communication, efficient for high-frequency updates.
  • Cons: More complex to implement and manage, requires robust connection handling.

Server-Sent Events (SSE):

  • Overview: SSE is a standard allowing servers to push updates to the client over a single, long-lived HTTP connection. Communication is unidirectional, from server to client.
  • Pros: Simpler to implement than WebSockets, efficient for server-to-client updates, built-in reconnection.
  • Cons: Unidirectional (only server to client), less suitable for scenarios requiring frequent client-to-server communication.

HTTP Long Polling:

  • Overview: HTTP Long Polling is a technique where the client makes a request to the server, and the server holds the request open until new data is available. Once data is sent, the client immediately makes a new request, simulating real-time communication.
  • Pros: Simple to set up, compatible with all clients, works with standard HTTP.
  • Cons: Higher latency due to repeated connections, increased server and network load, less efficient for high-frequency updates.

QUIC/HTTP3:

  • Overview: QUIC is a transport protocol developed by Google, which serves as the foundation for HTTP/3. It provides features like reduced connection establishment time, improved congestion control, and multiplexing without head-of-line blocking, offering low-latency, reliable communication.
  • Pros: Provides low latency and high efficiency, with better connection management, multiplexing, and congestion control. Suitable for real-time applications requiring reliable and fast communication.
  • Cons: Relatively new and may not be supported across all environments. Implementation can be more complex than traditional HTTP but less so than WebSockets.
FeatureWebSocketsServer-Sent Events (SSE)HTTP Long PollingQUIC/HTTP3
OverviewFull-duplex, real-time communicationServer-to-client updates over HTTPEmulates real-time via repeated requestsLow-latency, modern transport protocol
Communication TypeBidirectionalUnidirectional (server to client)Unidirectional (server to client)Bidirectional
LatencyLowLowHighLow
Connection ManagementComplex (requires handling reconnects)built-in simple reconnectionSimpleEfficient (built-in features)
EfficiencyHigh (persistent connection)Medium (persistent connection)Low (frequent connections)High (optimized for modern networks)
Use Case SuitabilityReal-time, high-frequency updatesPeriodic server-to-client updatesFallback or low-frequency updatesReal-time, high-reliability applications
Server LoadLowLowHighLow (optimized performance)

In summary,

  • WebSockets: Best suited for applications requiring continuous, real-time communication with minimal delay, such as live location sharing.
  • SSE: Ideal for use cases where the server needs to push updates to the client periodically without requiring a bidirectional communication channel.
  • HTTP Long Polling: Useful as a fallback mechanism in environments where WebSockets and SSE are not available or practical, though it comes with higher latency and server load.
  • QUIC/HTTP3: Combines the benefits of low latency and high efficiency with modern protocol advantages, making it suitable for real-time applications needing reliable and fast communication, such as live video streaming or gaming.
Which option Uber actually use in production?

Uber initially used SSE (server-sent event) for this. Later they moved to QUIC/HTTP3 for all push messages including. QUIC is a modern transport protocol built on UDP that enhances web performance and reliability by offering reduced latency, multiplexing without head-of-line blocking, connection migration, and integrated security. Uber sees 10-30 percent reduction in tail-end latency by switching to QUIC.

How to Find Nearby Drivers

Naive Solution

Using a database to store the latitude and longitude of drivers and matching them with a SELECT query. This method is straightforward but can be inefficient as the number of drivers increases. Each query has to scan through potentially all driver records to find matches, leading to slow performance and high computational cost.

Here's an example:

SELECT driver_id, ( 3959 * acos( cos( radians(:latitude) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians(:longitude) ) + sin( radians(:latitude) ) * sin( radians( latitude ) ) ) ) AS distance FROM drivers WHERE distance < :radius ORDER BY distance;

Database Extensions like PostGIS

PostGIS extends PostgreSQL to handle geographic objects, allowing spatial queries.

  • How it works: PostGIS adds support for geographic objects to the PostgreSQL database. It can index these objects using R-trees or GiST indexes, enabling efficient spatial queries.

Here's an example:

CREATE EXTENSION postgis; SELECT driver_id, ST_Distance( ST_SetSRID(ST_MakePoint(:longitude, :latitude), 4326), location::geography ) AS distance FROM drivers WHERE ST_DWithin( location::geography, ST_SetSRID(ST_MakePoint(:longitude, :latitude), 4326)::geography, :radius ) ORDER BY distance;
  • Pros: Efficient for handling complex geographic queries, integrates well with PostgreSQL, supports advanced spatial functions.
  • Cons: Adds complexity to the database setup and maintenance, can have a steeper learning curve.

Geohashing

Geohashing encodes latitude and longitude into a single string, dividing the world into a grid of cells.

  • How it works: Geohashing converts geographic coordinates into a base32 string. Nearby locations have similar prefixes, allowing quick searches by comparing string prefixes.

Here's an example:

import geohash latitude, longitude = 37.7749, -122.4194 geohash_precision = 6 geo_hash = geohash.encode(latitude, longitude, precision=geohash_precision) # Store geo_hash in the database, and search using prefix matching # Example SQL for searching: SELECT * FROM drivers WHERE geohash LIKE '9q8yy%';
  • Pros: Simple to implement, allows quick proximity searches, reduces the number of comparisons.
  • Cons: Precision varies with the length of the hash, potential edge cases where nearby points are in different hash cells, which can complicate the search logic.

Quadtree

A tree data structure where each node has four children, used to partition a two-dimensional space by recursively subdividing it into four quadrants.

  • How it works: The space is divided into four quadrants. Each quadrant is recursively subdivided until each node contains a small number of points or a single point.

Uber System Design Quad Tree

Here's an sample implementation:

class Point: def __init__(self, x, y): self.x = x self.y = y class Rectangle: def __init__(self, x, y, w, h): self.x = x self.y = y self.w = w self.h = h def contains(self, point): return (point.x >= self.x - self.w and point.x < self.x + self.w and point.y >= self.y - self.h and point.y < self.y + self.h) def intersects(self, range): return not (range.x - range.w > self.x + self.w or range.x + range.w < self.x - self.w or range.y - range.h > self.y + self.h or range.y + range.h < self.y - self.h) class QuadTree: def __init__(self, boundary, capacity): self.boundary = boundary # Rectangle self.capacity = capacity # int self.points = [] # list of Points self.divided = False def subdivide(self): x = self.boundary.x y = self.boundary.y w = self.boundary.w / 2 h = self.boundary.h / 2 ne = Rectangle(x + w, y - h, w, h) self.northeast = QuadTree(ne, self.capacity) nw = Rectangle(x - w, y - h, w, h) self.northwest = QuadTree(nw, self.capacity) se = Rectangle(x + w, y + h, w, h) self.southeast = QuadTree(se, self.capacity) sw = Rectangle(x - w, y + h, w, h) self.southwest = QuadTree(sw, self.capacity) self.divided = True def insert(self, point): if not self.boundary.contains(point): return False if len(self.points) < self.capacity: self.points.append(point) return True else: if not self.divided: self.subdivide() if self.northeast.insert(point): return True elif self.northwest.insert(point): return True elif self.southeast.insert(point): return True elif self.southwest.insert(point): return True def query(self, range, found): if not self.boundary.intersects(range): return else: for p in self.points: if range.contains(p): found.append(p) if self.divided: self.northwest.query(range, found) self.northeast.query(range, found) self.southwest.query(range, found) self.southeast.query(range, found) # Example usage boundary = Rectangle(200, 200, 200, 200) qtree = QuadTree(boundary, 4) # Insert some points points = [Point(100, 100), Point(200, 200), Point(300, 300), Point(150, 150)] for p in points: qtree.insert(p) # Query range range_query = Rectangle(200, 200, 100, 100) found_points = [] qtree.query(range_query, found_points) print(f"Points found: {[ (p.x, p.y) for p in found_points ]}")
  • Pros: Efficient spatial indexing, handles dynamic data well, supports fast range and nearest-neighbor queries.
  • Cons:
    • Can become unbalanced if data is unevenly distributed, complexity in implementing and maintaining the structure.
    • Quadtree doesn't deal with dynamic data well. Imagine having to restructure the tree when drivers move around. It adds a massive overhead. It would make more sense for static data such as restaurants in Yelp.

Learn more about QuadTree in the System Design Domain Knowledge Course.

Hilbert Curves and Google S2

Hilbert curves map multi-dimensional data to one dimension while preserving locality of the data points, and Google S2 library partitions the earth's surface into cells using a similar method.

  • How it works: Hilbert curves fill space-filling curves that map multi-dimensional space into a one-dimensional sequence, maintaining locality. Google S2 uses spherical geometry to divide the Earth's surface into a hierarchy of cells, indexed for fast querying.

Here's an example:

import s2sphere latitude, longitude = 37.7749, -122.4194 level = 12 # S2 cell level for desired precision point = s2sphere.LatLng.from_degrees(latitude, longitude) cell = s2sphere.CellId.from_lat_lng(point).parent(level) # Store cell.id() in the database, and search using S2 cell covering # Example search for nearby cells region = s2sphere.LatLngRect.from_point_pair(s2sphere.LatLng.from_degrees(37.0, -123.0), s2sphere.LatLng.from_degrees(38.0, -121.0)) coverer = s2sphere.RegionCoverer() coverer.min_level = level coverer.max_level = level covering = coverer.get_covering(region) # Query database for drivers in these cells cell_ids = [cell.id() for cell in covering] SELECT * FROM drivers WHERE s2_cell_id IN (cell_ids);
  • Pros: Excellent locality preservation, efficient for large-scale geographic data, supports fast and scalable spatial queries.
  • Cons:
    • Complex to implement and understand, can require significant computational resources for large datasets. May introduce addtional latency to Uber's time-sensitive ride matching service.
    • S2 is designed to handle data on a sphere, making it ideal for representing the Earth's surface. However, Uber's primary concern is efficiently finding nearby drivers within relatively small geographic areas, where the curvature of the Earth is less of a factor.
    • The library is open source but not the usability is low, limiting Uber's ability to customize and integrate it into their existing systems

Learn more about Hilber Curve and Google S2 in the System Design Domain Knowledge Course.

Which Method Does Uber Actually Use? H3: Uber’s Hexagonal Hierarchical Spatial Index

H3 is a geospatial indexing system developed by Uber that uses a hexagonal grid to partition the world. It is designed for efficient and accurate spatial queries and analysis, addressing limitations found in other geospatial indexing methods. Feel free to play with it using H3's interactive map.

Uber System Design H3 Spatial Index

  • Hexagonal Grid System:

    • Six equidistant neighbors simplify spatial algorithms like nearest neighbor searches.
    • Hexagons are also excellent at filling space efficiently. On average, they can tile a polygon with a smaller margin of error than square tiles.
  • Hierarchical Indexing:

    • Supports 16 levels of resolution for flexible scaling and precision.
    • Enables efficient aggregation and detailed analysis.
  • Performance and Integration:

    • Optimized for large-scale data operations.
    • Integrates well with platforms like Amazon Redshift.

Here's a comparison between the technologies:

FeaturePostGISGeohashingQuadtreeGoogle S2Uber's H3
Grid ShapeVariable (supports multiple geometries)SquareSquareSquareHexagon
Resolution LevelsCustomizable (depends on spatial queries)Fixed, coarse to fineFixed, coarse to fineFixed, coarse to fine16 fixed levels
Real-World ApplicationsUrban planning, utilities managementSimple location-based servicesSimple location-based servicesRide-sharing, map servicesRide-sharing, logistics, urban planning
Reason Not Used by UberHigh complexity and overhead in relational databaseNon-uniform cell sizes, edge effectsVarying distances between neighborsComplexity in computationN/A (chosen for uniformity, performance)

Given Uber's reliance on the mission critical spatial indexing feature, it makes sense to build this in-house. It's an important In an interview though, geohashing is likely simple enough to explain and discuss with the interviewer.

Uber's Tech Stack

In an interview, the design is often simplified for clarity and time constraints. However, real-world systems like Uber's are more complex due to additional requirements, constraints, and legacy systems that influence architecture choices. Established companies need to integrate new technologies with existing systems, manage large-scale operations, and ensure global availability and compliance. They might also invest in cutting-edge technologies like QUIC/HTTP3 or develop proprietary frameworks, which wouldn't be practical in a simplified interview scenario. Let's explore Uber's actual tech stack based on their engineering blog.

Uber's tech stack is built on a robust and scalable architecture designed to handle its global operations, detailed across several layers.

Foundational Layer

Microservices Architecture

  • Microservices: Modular and scalable, allowing independent development, deployment, and scaling.
  • Languages: Predominantly Node.js, Java, and Go.
  • Containerization: Docker ensures consistent environments.

Real-Time Data Management

  • Apache Kafka: Manages real-time data streaming and event sourcing.
  • Cassandra: Provides distributed database management, ensuring high availability and fault tolerance.

Service Mesh

  • Communication Management: Built using Envoy, it manages microservice communication, providing load balancing, failure recovery, and security.

Edge and Customer-Facing Technologies

Marketplace System

  • Real-Time Transactions: Built with Python, Node.js, Go, and Java to handle real-time transactions reliably.

Resilience and Availability

  • Ringpop: Ensures high availability and resilience, providing a distributed systems framework.

Caching and Databases

  • Redis: Used for in-memory data caching to enhance performance.
  • Riak: Serves as a distributed NoSQL database for large-scale data storage and retrieval.

Web and Mobile Stack

Frontend Development

  • React and Redux: Used for developing the web frontend, providing a responsive and dynamic user experience.
  • Mobile Development: Uses native mobile frameworks and React Native for performance and consistency across platforms.

Infrastructure

Hybrid Cloud

  • Private and Public Cloud: Combines private data centers and public cloud services for scalability, flexibility, and cost-efficiency.

Monitoring and Observability

Tools and Practices

  • Prometheus and Grafana: Prometheus for metrics collection and Grafana for visualization, ensuring proactive performance tracking and issue identification.

References

The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

System Design Master Template

Comments