Solution

System Design for Ticketmaster

Designing a system like Ticketmaster has some unique complexities. This is due to the nature of selling tickets: it is a high-stakes transactional process involving elastic demand, heightened during events like a Taylor Swift concert. This demand can cause an unprecedented increase in traffic, stressing components like the API, database, and payment processing service. Individuals seeking to score Taylor Swift tickets, for example, add both volume and velocity to the system's requirements.

Selling tickets appears deceptively simple. But the difference between efficiently handling millions of users live versus collapsing under the pressure could mean the success or failure for businesses. A system must withstand not just the daily volume of traffic but scale up massively during peak times, be it for movie tickets, theater events, or concerts in cities worldwide.

Functional Requirements

  • Users must be able to search for events
  • Users should be able to get all available seats for an event
  • Users should be able to reserve and then book seats for an event (using some black-box payment provider).
  • There must not be any double bookings: one ticket can be booked by only one user.
  • (follow-up question) If a seat is reserved, the attempted booker joins a waiting list for that event. If the reservation doesn’t go through, then the next user on the waiting list who can be fulfilled should be able to reserve and be given a chance to book the seat.

Non-Functional Requirements

  • 10,000,000 DAU
  • 100,000 peak concurrent users
  • 20,000 users are concurrently trying to book. The rest are only viewing. With a 5:1 read/write ratio, this means the system is heavily read-centric.
  • Consistency
  • Reliability
  • Security

Back-of-the-Envelope Resource Estimation

Let's assume each user makes 5 write requests per day and event and user data takes 1KB. Using the back-of-the-envelope calculator, we get

TicketMaster System Design Capacity Planning

High-level Design

At the highest level, Ticketmaster's design comprises distinct services for handling specific domain areas and storage components that efficiently manage data.

We instantiate our all powerful system design template - read from cache, write to message queue. And here is our high-level design diagram:

TicketMaster System Design Diagram

The QPSs are quite significant. Proper caching, load balancing, and possibly database sharding and replication are needed to handle the load.

1. Searching for Events and Seats

The Search Service enables users to find events. This is the system's read pathway. Given that event and venue information changes infrequently, it is cached locally on each server for efficient access.

TicketMaster Seats

When a user selects an event to attend, the next step involves checking seat availability. This information is visually represented on a map within the user interface, with different colors indicating the status of each seat.

The Search Service retrieves seat availability details from the Seat Cache.

2. Reserving a Seat

Upon choosing a seat, the user initiates a reservation. This action represents the system's write pathway.

  • The user requests a seat reservation via the Booking Service, providing the event ID, seat ID, and their user ID.
  • The Booking Service queues the request in an ordered fashion. The queue serves dual purposes: acting as a buffer and ensuring requests are processed on a first-come, first-served basis. This method helps manage excess requests beyond seat availability by potentially dropping them.
  • A consumer service then processes requests from the queue, marking the specified seat as 'unavailable' in both the cache and the database concurrently. While the cache facilitates rapid search query responses, the database acts as the definitive source of truth. Should a request attempt to book an already unavailable seat, it is declined. This design is an instantiation of our system design template.
  • Following queue entry, the Booking Service informs the user that their seat is being reserved, typically indicated on the front end by a loading spinner.
  • The consumer service, after processing the reservation request, prompts the Booking Service to engage with a payment service. This step generates a payment link for the user, who is given a finite timeframe to complete the payment.

3. Finalizing the Reservation

  • The Payment Service coordinates with external payment processors like Stripe or PayPal, returning a payment link to the user via the Booking Service.
  • The user submits their payment details and finalizes the transaction.
  • Successful payment prompts the external provider to notify our Payment Service via a webhook, leading to transaction details being recorded in the Transaction table.
  • Subsequently, the Payment Service notifies the Booking Service of the successful payment, triggering an update to the Booking table. A message is also sent to the message queue to revise the seat's status to 'paid'.
  • The consumer service updates the Seat table upon processing this message.
  • Finally, the Booking Service confirms the successful booking to the user, with the confirmation displayed on the user interface.

Detailed Design

A couple of things come to mind in the design above.

How Does Booking Service Notify User

Booking Service has to notify user about

  • seat is being reserved and user needs to proceed to payment
  • payment is successful and the booking is complete

This is done through another message queue. We have a Notification Service dedicated to taking messages from the queue and sending messages back to the user using either Server-side event (SSE) or websocket

TicketMaster Notification Service

How Do We Release a Seat

If a user does not complete a booking (not paying during the reserved time), we have to release the seat by setting its status to 'available'. This can be done by sending a message to a delayed task scheduler at the same time we pushed the booking request to the queue. The scheduler runs the task after the time window (e.g. 2 min) and checks if the seat is in 'booked' status. If not, it means user has not completed the booking on time and will set the status back to 'available'. TicketMaster Scheduler

Data Storage Design

Firstly, each microservice should have its own database to ensure loose coupling, scalability, and maintenance ease. This separation also confines any data corruption risks to the affected service without massive systemic fallout.

User Table
+--------------+---------------+------------------+
| user_id (PK) | username      | ...              |
+--------------+---------------+------------------+


Events Table
+---------------+--------------+------------------+--------------+
| event_id (PK) | title        | location         | ...          |
+---------------+--------------+------------------+--------------+


Seat Table
+--------------+---------------+----------------+
| seat_id (PK) | event_id (FK) | user_id (FK)   |
+--------------+---------------+----------------+


Booking Table
+-----------------+---------------+--------------+
| booking_id (PK) | event_id (FK) | user_id (FK) |
+-----------------+---------------+--------------+


Transaction Table
+-----------------+---------------+--------------------+
| booking_id (PK) | payment_status| payment_provider_id|
+-----------------+---------------+--------------------+

(PK): Primary Key, (FK): Foreign Key

Interactions between tables across microservices are predominantly via API calls respecting defined contracts, rather than direct database links.

Sharding and Replication

Ticketmaster manages thousands of events, each with potentially tens of thousands of seats. The volume of data for events, tickets, and user transactions is enormous and grows with each event. Consider sharding by event or geographical location. For instance, sharding the database based on event locations can help localize data access and improve performance for both ticket buyers and event management operations.

Replication becomes an affirmative when considering high-availability and disaster recovery. Read replicas can take the load off the primary writer node, serving the majority of read requests.

Deep Dive

How to Prevent Double Booking?

Preventing double booking is a critical concern for any ticketing system. We are already using a queue to order booking requests so if a seat becomes unavailable the booking request would simply be rejected without initiating a reservation. This "upfront" safeguarding will prevent double booking.

However, there may still be scenarios where there are inconsistencies between cache and database. Two users might pass the availability check almost simultaneously for the last available seat. There are several approaches to handle these edge cases.

One approach is pessimistic locking, where a record is locked in the database when a user starts the booking process, preventing others from booking the same ticket simultaneously. This approach is straightforward and reliable, but it can lead to a poor user experience if the lock duration is too long, causing unnecessary wait times.

Example:

SELECT * FROM tickets WHERE ticket_id = 123 FOR UPDATE;

The FOR UPDATE command ensures the ticket with ticket_id 123 is locked for update, and other transactions cannot modify it until the lock is released.

Another method is optimistic concurrency control, which allows multiple transactions to proceed without locking but checks at the update moment if another transaction has modified the record. This method uses a version number to ensure that a record hasn't been modified by another transaction before updating it. By checking the version number at the time of update, you ensure that a record hasn't been changed by another transaction since you last read it, maintaining data integrity even in high-concurrency environments.

Example:

-- Fetching the current version SELECT Version INTO @CurrentVersion FROM YourTable WHERE YourID = 1; -- Attempting to update with version check UPDATE reservation SET status = 'unavailable', Version = @CurrentVersion + 1 WHERE id = 1 AND Version = @CurrentVersion; -- Checking if the update was successful IF @@ROWCOUNT = 0 RAISERROR ('Update failed due to concurrent modification', 16, 1);

This command attempts to assign the ticket to a user, but only if it hasn't been assigned already.

Lastly, we could use a distributed locking mechanism. A data store like Redis can help manage concurrency with more granular control and better performance than database-level locks.

However, since we only require locking within a microservice (booking, payment), we don't really need a distributed lock.

How to Implement a Waitlist

Grasping the building blocks ("the lego pieces")

This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.

Core Building blocks

At the bare minimum you should know the core building blocks of system design

  • Scaling stateless services with load balancing
  • Scaling database reads with replication and caching
  • Scaling database writes with partition (aka sharding)
  • Scaling data flow with message queues

System Design Template

With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:

System Design Template

Additional Building Blocks

Additionally, you will want to understand these concepts

  • Processing large amount of data (aka “big data”) with batch and stream processing
    • Particularly useful for solving data-intensive problems such as designing an analytics app
  • Achieving consistency across services using distribution transaction or event sourcing
    • Particularly useful for solving problems that require strict transactions such as designing financial apps
  • Full text search: full-text index
  • Storing data for the long term: data warehousing

On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.

Working through problems and building solutions using the building blocks

Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.

Get premium for instant access to all content and solutions