Back to Problems
Solution
Functional RequirementsNone-functional RequirementsHigh-level DesignData Flow and InteractionsDetailed Design and Deep DivesHow to Find Nearby DriversUber's Tech Stack
The System Design Courses

Go beyond memorizing solutions to specific problems. Learn the core concepts, patterns and templates to solve any problem.

Start Learning

System Design Uber

Functional Requirements

We will focus on the core functionalities:

  • Ride Request: Users should be able to request a ride by providing their location and destination. The system should find the nearest available driver to fulfill the ride request.
  • Driver Tracking: When a rider is matched to a driver, the system should be able to track the real-time location of drivers and update their status (available, busy, offline) and update the rider accordingly.

None-functional Requirements

  • 100M Daily Active Users
  • Read:write ratio = 10:1
  • Data retention for 5 years
  • Assuming 10 million ride requests per day
  • Assuming each ride (including all data information related to the ride) is about 1KB

Using our resource calculator, we get about 1000 read RPS and 100 write RPS.

Since this is one of the hard system design questions, we won't spend too much time on these cookie-cutter calculations and will jump into the designs which are more interesting.

High-level Design

Overview

Uber System Design Diagram

This design diagram may intimidating at first with arrows going seemingly all directions. Let's break it down into pieces, look at each one and go through the sequence diagram of data flow so it will all make sense.

Components

  1. Rider App: The app riders use to request rides and get updates about their ride.
  2. Driver App: The app drivers use to get ride requests and update their location.
  3. Load Balancer and Firewall: This makes sure the WebSocket connections are always available and secure by distributing traffic and blocking unauthorized access.
  4. Rider WebSocket Service: Manages real-time communication between the Rider App and the backend services.
  5. Driver WebSocket Service: Manages real-time communication between the Driver App and the backend services.
  6. Ride Matcher: The main service that handles ride requests, finds available drivers, and updates ride statuses.
  7. Ride DB: A database that stores information about rides, like their status, driver assignments, and details.
  8. Location Service: Tracks and manages the real-time locations of drivers.
  9. Location DB (in-memory): A fast in-memory database that stores current driver locations for quick access.

You might notice two dedicated WebSocket services: one for riders and one for drivers. This is crucial for maintaining a live feed of driver locations, which is essential for both riders (to track their ride's progress) and Uber itself (to manage its fleet effectively). This is where WebSocket excels - two way communication. We have dedicated service for them because they are user-facing request handlers that would scale differently than the other services such as Rider Matcher.

WebSocket is what you would typically suggest in a system design interview. In production, Uber actually uses a more advanced and modern technology - QUIC/http3. We will cover this in the detailed design section as well comparing SSE, WebSocket and long polling.

Also note that even though they are called "WebSocket Service", they are essentially request handlers that handle HTTP REST APIs too.

Data Flow and Interactions

1. Driver Sign on and Sends Its Location

Uber System Design Driver Location Update Flow

When a driver comes online it needs to start sharing its location data with Uber so the system can match it with nearby riders. The sequence of operations are:

  • Establish WebSocket Connection: Driver App establishes a WebSocket connection with the Driver WebSocket Service.
  • Send Location (Every Few Seconds): Driver App sends the driver's current location to the Driver WebSocket Service.
  • Forward Location to Location Service: Driver WebSocket Service forwards the driver's location data to the Location Service.
  • Update Driver Location: Location Service updates the driver's location in the Location DB.

2. Rider Requesting a Ride and Rider Matching

Uber System Design Driver Rider Matching Flow

  • Request a Ride: Rider App sends a ride request to the Rider WebSocket Service.
  • Forward Ride Request: Rider WebSocket Service forwards the request to the Ride Matcher.
  • Create Ride Request: Ride Matcher creates a new ride entry in the Database with status created.
  • Match Nearby Driver: Ride Matcher queries the Location Service to find nearby drivers.
  • Find Nearby Drivers: Location Service retrieves driver locations from the Location DB.
  • Respond with Matched Driver: Location Service sends matched driver info to the Ride Matcher.
  • Update Ride Status: Ride Matcher updates ride status to in_progress and assigns the driver in the Database.
  • Respond to WebSocket Service: Ride Matcher sends matched driver and ride info back to the Rider WebSocket Service.
  • Respond to Rider App: Rider WebSocket Service sends driver and ride info to the Rider App.

3. Driver Updates Location:

Uber System Design Driver Location Update to Rider Flow

  • Establish WebSocket Connection: Driver App establishes a WebSocket connection with the Driver WebSocket Service.
  • Send Location (Every Few Seconds): Driver App sends the driver's current location to the Driver WebSocket Service.
  • Forward Location to Location Service: Driver WebSocket Service forwards the driver's location data to the Location Service.
  • Update Driver Location: Location Service updates the driver's location in the Location DB. Note that this is for the system to keep track of the fleet status. We don't actually need this step for the rider to see the driver's location.
  • Send Driver Location to Rider WebSocket Service: Location Service sends the driver's location to the Rider WebSocket Service.
  • Forward Location to Rider App: Rider WebSocket Service sends the driver's location to the Rider App.

Detailed Design and Deep Dives

WebSocket Services

You might notice two dedicated WebSocket services: one for riders and one for drivers. This is crucial for maintaining a live feed of driver locations, which is essential for both riders (to track their ride's progress) and Uber itself (to manage its fleet effectively). Sending from client to server is easy and standard. Sending from server to client though, requires more thining. Here are a few ways to do it:

WebSockets:

  • Overview: WebSockets provide a full-duplex communication channel over a single, long-lived connection established via an HTTP handshake. This allows for real-time, bidirectional data exchange between client and server.
  • Pros: Low latency, real-time bidirectional communication, efficient for high-frequency updates.
  • Cons: More complex to implement and manage, requires robust connection handling.

Server-Sent Events (SSE):

  • Overview: SSE is a standard allowing servers to push updates to the client over a single, long-lived HTTP connection. Communication is unidirectional, from server to client.
  • Pros: Simpler to implement than WebSockets, efficient for server-to-client updates, built-in reconnection.
  • Cons: Unidirectional (only server to client), less suitable for scenarios requiring frequent client-to-server communication.

HTTP Long Polling:

  • Overview: HTTP Long Polling is a technique where the client makes a request to the server, and the server holds the request open until new data is available. Once data is sent, the client immediately makes a new request, simulating real-time communication.
  • Pros: Simple to set up, compatible with all clients, works with standard HTTP.
  • Cons: Higher latency due to repeated connections, increased server and network load, less efficient for high-frequency updates.

QUIC/HTTP3:

  • Overview: QUIC is a transport protocol developed by Google, which serves as the foundation for HTTP/3. It provides features like reduced connection establishment time, improved congestion control, and multiplexing without head-of-line blocking, offering low-latency, reliable communication.
  • Pros: Provides low latency and high efficiency, with better connection management, multiplexing, and congestion control. Suitable for real-time applications requiring reliable and fast communication.
  • Cons: Relatively new and may not be supported across all environments. Implementation can be more complex than traditional HTTP but less so than WebSockets.
FeatureWebSocketsServer-Sent Events (SSE)HTTP Long PollingQUIC/HTTP3
OverviewFull-duplex, real-time communicationServer-to-client updates over HTTPEmulates real-time via repeated requestsLow-latency, modern transport protocol
Communication TypeBidirectionalUnidirectional (server to client)Unidirectional (server to client)Bidirectional
LatencyLowLowHighLow
Connection ManagementComplex (requires handling reconnects)Simple (built-in reconnection)SimpleEfficient (built-in features)
EfficiencyHigh (persistent connection)Medium (persistent connection)Low (frequent connections)High (optimized for modern networks)
Use Case SuitabilityReal-time, high-frequency updatesPeriodic server-to-client updatesFallback or low-frequency updatesReal-time, high-reliability applications
Server LoadLowLowHighLow (optimized performance)

In summary,

  • WebSockets: Best suited for applications requiring continuous, real-time communication with minimal delay, such as live location sharing.
  • SSE: Ideal for use cases where the server needs to push updates to the client periodically without requiring a bidirectional communication channel.
  • HTTP Long Polling: Useful as a fallback mechanism in environments where WebSockets and SSE are not available or practical, though it comes with higher latency and server load.
  • QUIC/HTTP3: Combines the benefits of low latency and high efficiency with modern protocol advantages, making it suitable for real-time applications needing reliable and fast communication, such as live video streaming or gaming.
Which option Uber actually use in production?

Uber initially used SSE (server-sent event) for this. Later they moved to QUIC/HTTP3 for all push messages including. QUIC is a modern transport protocol built on UDP that enhances web performance and reliability by offering reduced latency, multiplexing without head-of-line blocking, connection migration, and integrated security. Uber sees 10-30 percent reduction in tail-end latency by switching to QUIC.

How to Find Nearby Drivers

Naive Solution

Using a database to store the latitude and longitude of drivers and matching them with a SELECT query. This method is straightforward but can be inefficient as the number of drivers increases. Each query has to scan through potentially all driver records to find matches, leading to slow performance and high computational cost.

Here's an example:

SELECT driver_id, ( 3959 * acos( cos( radians(:latitude) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians(:longitude) ) + sin( radians(:latitude) ) * sin( radians( latitude ) ) ) ) AS distance FROM drivers HAVING distance < :radius ORDER BY distance;

Database Extensions like PostGIS

PostGIS extends PostgreSQL to handle geographic objects, allowing spatial queries.

  • How it works: PostGIS adds support for geographic objects to the PostgreSQL database. It can index these objects using R-trees or GiST indexes, enabling efficient spatial queries.

Here's an example:

CREATE EXTENSION postgis; SELECT driver_id, ST_Distance( ST_SetSRID(ST_MakePoint(:longitude, :latitude), 4326), location::geography ) AS distance FROM drivers WHERE ST_DWithin( location::geography, ST_SetSRID(ST_MakePoint(:longitude, :latitude), 4326)::geography, :radius ) ORDER BY distance;
  • Pros: Efficient for handling complex geographic queries, integrates well with PostgreSQL, supports advanced spatial functions.
  • Cons: Adds complexity to the database setup and maintenance, can have a steeper learning curve.

Geohashing

Geohashing encodes latitude and longitude into a single string, dividing the world into a grid of cells.

  • How it works: Geohashing converts geographic coordinates into a base32 string. Nearby locations have similar prefixes, allowing quick searches by comparing string prefixes.

Here's an example:

import geohash latitude, longitude = 37.7749, -122.4194 geohash_precision = 6 geo_hash = geohash.encode(latitude, longitude, precision=geohash_precision) # Store geo_hash in the database, and search using prefix matching # Example SQL for searching: SELECT * FROM drivers WHERE geohash LIKE '9q8yy%';
  • Pros: Simple to implement, allows quick proximity searches, reduces the number of comparisons.
  • Cons: Precision varies with the length of the hash, potential edge cases where nearby points are in different hash cells, which can complicate the search logic.

Quadtree

A tree data structure where each node has four children, used to partition a two-dimensional space by recursively subdividing it into four quadrants.

  • How it works: The space is divided into four quadrants. Each quadrant is recursively subdivided until each node contains a small number of points or a single point.

Here's an example:

class QuadTree: def __init__(self, boundary, capacity): self.boundary = boundary # Boundary is a rectangle self.capacity = capacity # Capacity of points in this quad self.points = [] self.divided = False def subdivide(self): # Code to subdivide the current quad into four smaller quads def insert(self, point): if not self.boundary.contains(point): return False if len(self.points) < self.capacity: self.points.append(point) return True if not self.divided: self.subdivide() return (self.northeast.insert(point) or self.northwest.insert(point) or self.southeast.insert(point) or self.southwest.insert(point)) def query(self, range, found): if not self.boundary.intersects(range): return for point in self.points: if range.contains(point): found.append(point) if self.divided: self.northeast.query(range, found) self.northwest.query(range, found) self.southeast.query(range, found) self.southwest.query(range, found)
  • Pros: Efficient spatial indexing, handles dynamic data well, supports fast range and nearest-neighbor queries.
  • Cons: Can become unbalanced if data is unevenly distributed, complexity in implementing and maintaining the structure.

Learn more about QuadTree in the System Design Domain Knowledge Course.

Hilbert Curves and Google S2

Hilbert curves map multi-dimensional data to one dimension while preserving locality of the data points, and Google S2 library partitions the earth's surface into cells using a similar method.

  • How it works: Hilbert curves fill space-filling curves that map multi-dimensional space into a one-dimensional sequence, maintaining locality. Google S2 uses spherical geometry to divide the Earth's surface into a hierarchy of cells, indexed for fast querying.

Here's an example:

import s2sphere latitude, longitude = 37.7749, -122.4194 level = 12 # S2 cell level for desired precision point = s2sphere.LatLng.from_degrees(latitude, longitude) cell = s2sphere.CellId.from_lat_lng(point).parent(level) # Store cell.id() in the database, and search using S2 cell covering # Example search for nearby cells region = s2sphere.LatLngRect.from_point_pair(s2sphere.LatLng.from_degrees(37.0, -123.0), s2sphere.LatLng.from_degrees(38.0, -121.0)) coverer = s2sphere.RegionCoverer() coverer.min_level = level coverer.max_level = level covering = coverer.get_covering(region) # Query database for drivers in these cells cell_ids = [cell.id() for cell in covering] SELECT * FROM drivers WHERE s2_cell_id IN (cell_ids);
  • Pros: Excellent locality preservation, efficient for large-scale geographic data, supports fast and scalable spatial queries.
  • Cons: Complex to implement and understand, can require significant computational resources for large datasets.

Learn more about Hilber Curve and Google S2 in the System Design Domain Knowledge Course.

Uber's Tech Stack

In an interview, the design is often simplified for clarity and time constraints. However, real-world systems like Uber's are more complex due to additional requirements, constraints, and legacy systems that influence architecture choices. Established companies need to integrate new technologies with existing systems, manage large-scale operations, and ensure global availability and compliance. They might also invest in cutting-edge technologies like QUIC/HTTP3 or develop proprietary frameworks, which wouldn't be practical in a simplified interview scenario. Let's explore Uber's actual tech stack based on their engineering blog.

Uber's tech stack is built on a robust and scalable architecture designed to handle its global operations, detailed across several layers.

Foundational Layer

Microservices Architecture

  • Microservices: Modular and scalable, allowing independent development, deployment, and scaling.
  • Languages: Predominantly Node.js, Java, and Go.
  • Containerization: Docker ensures consistent environments.

Real-Time Data Management

  • Apache Kafka: Manages real-time data streaming and event sourcing.
  • Cassandra: Provides distributed database management, ensuring high availability and fault tolerance.

Service Mesh

  • Communication Management: Built using Envoy, it manages microservice communication, providing load balancing, failure recovery, and security.

Edge and Customer-Facing Technologies

Marketplace System

  • Real-Time Transactions: Built with Python, Node.js, Go, and Java to handle real-time transactions reliably.

Resilience and Availability

  • Ringpop: Ensures high availability and resilience, providing a distributed systems framework.

Caching and Databases

  • Redis: Used for in-memory data caching to enhance performance.
  • Riak: Serves as a distributed NoSQL database for large-scale data storage and retrieval.

Web and Mobile Stack

Frontend Development

  • React and Redux: Used for developing the web frontend, providing a responsive and dynamic user experience.
  • Mobile Development: Uses native mobile frameworks and React Native for performance and consistency across platforms.

Infrastructure

Hybrid Cloud

  • Private and Public Cloud: Combines private data centers and public cloud services for scalability, flexibility, and cost-efficiency.

Monitoring and Observability

Tools and Practices

  • Prometheus and Grafana: Prometheus for metrics collection and Grafana for visualization, ensuring proactive performance tracking and issue identification.

Security and Compliance

Data Protection

  • Robust Security Practices: Implements strong security measures to protect user data and ensure compliance with global regulations, including encryption and regular security audits.

Overall Architecture

Uber's overall architecture integrates microservices, real-time data processing, robust infrastructure, and advanced monitoring tools to ensure efficient operations and a seamless user experience.

References