Solution

System Design Netflix

Netflix, a leading streaming service, demands a robust system design that can handle millions of active users simultaneously.

Designing Netflix is similar to designing Youtube with one key distinction: Netflix has a much smaller library of content but much greater watch time per video. This gives it an opportunity to optimize video streaming at the lowest level with hardware based CDN called Open Connect directly installed at ISPs.

Functional Requirements

  • Instantaneous video streaming upon selection.
  • A uniform and seamless viewing experience regardless of the number of users online or the time of day.
  • Accessibility across a wide variety of devices (smart TVs, smartphones, tablets, PCs) and different internet connection speeds.

Non-Functional Requirements

  • 100M Daily Active Users (DAU).
  • Each user watches 1 hour of video per day.
  • Data retention policy of 10 years, considering licensing agreements and the evergreen nature of content.
  • An average video file size of 500MB, reflecting the high-definition quality of the content.
  • Low latency

Capacity Planning

Let’s begin with QPS estimates. With 100M DAU and 1 hour of video watch time, assuming 1 requests per minute, we get 60 requests per hour and per day. The QPS would be:

(60 requests per user * 100,000,000 users) / 86,400 seconds/day ≈ 70,000 QPS

However, this number spikes during peak hours; accordingly, systems should be designed to handle at least twice the average load—a conservative estimate puts this at around 140,000 QPS during peak times.

Now, addressing storage needs: with an average video size of 500MB, if Netflix releases 1,000 new videos daily, the daily storage increase is:

500MB/video * 1,000 videos = 500,000MB or 500GB per day

Annually, this amounts to 182.5TB, and over a decade considering the data retention policy, we could be looking at nearly 2 Petabytes of new content, excluding existing content and replication for redundancy and reliability.

For a website like netflix, the bottleneck is the shear volume of video data being transferred. Let's calculate the throughput. Assuming 100MB for 1080p videos, we get

(100MB per minute * 60 minute per day * 100,000,000 users) / 86,400 seconds/day ≈ 7,000,000MB per second = 7TB/s

High-Level Design

To grasp the system architecture of a video streaming service like Netflix, we rely on a high-level diagram that outlines the interaction between services.

Netfix System Design Diagram

The key components in the high-level architecture for video streaming include:

  • API Gateway: The entry point for all client requests, directing to appropriate services.
  • Video Playback Service: Manages video streaming logic and directs requests to video storage or CDN.
  • Object Storage: Stores the actual video files.
  • CDN (Content Delivery Network): Distributes video content to users to minimize latency.

Read Path: When a user initiates a request to view a video, this request is processed by the load balancer and API Gateway, which then routes it to the Video Playback Service. This service efficiently retrieves video data through caching layers optimized for quick access, before accessing the Video Metadata Storage for the video's URL. Once retrieved, the video is streamed from the nearest CDN node to the user's device, ensuring a seamless playback experience.

  1. The Content Delivery Network (CDN) is crucial for delivering cached videos from a location nearest to the user, significantly reducing latency and enhancing the viewing experience.
  2. The metadata database are responsible for managing video titles, descriptions, and user interactions such as likes and comments. These databases are optimized to support high volumes of read operations efficiently.

The idea is quite simple but the devil is in the details. Because of the extremely high throughput and low latency requirement we calculated above, traditional CDNs would not be cost effective. Netflix has gone great length on this and developed their own custom-built servers called Open Connect Appliances designed for a singular task for delivering streaming video content. These servers are deployed at ISP level so they are closest to the viewers to provide the lowest latency. The Open Connect technology is at the core of Netflix's low latency streaming architecture.

Detailed Design

Now let's take a look at API and database design.

API Design

  • GET /videos - Retrieves a list of videos.
  • POST /videos - Uploads a new video.
  • GET /videos/{videoId}/play - Streams a video.
  • POST /users/{userId}/history - Updates user's viewing history.

For each API endpoint, having well-defined inputs and outputs ensures clarity and efficiency in service integration. See the sample API requests and responses below:

// Sample API Request to GET /videos { "method": "GET", "url": "/videos", "query": { "genre": "action", "language": "en" } } // Sample API Response { "status": 200, "data": [ { "id": "1234567890", "title": "Exciting Action Movie", "description": "An action-packed journey." }, ... ] }

Database Schema

Netflix requires a database schema adept at handling diverse and vast video streaming and metadata storage needs. Below is a visual representation of a simplified database schema:

+----------------+     +--------------------+     +----------------+
| Videos         |     | UserViewingHistory |     | Users          |
|----------------|     |--------------------|     |----------------|
| videoId PK     | <-+ | historyId PK       |     | userId PK      |
| title          |   | | userId FK          | <-> | username       |
| description    |   | | videoId FK         |     | password_hash  |
+----------------+   | | timestamp          |     +----------------+
                     | +--------------------+
                     |
                     + +---------------------+
                       | Categories          |
                       |---------------------|
                       | categoryId PK       |
                       | name                |
                       +---------------------+


The primary tables involved are:

  • Videos: Stores video files with unique identifiers (videoId) and includes additional data such as titles and descriptions.
  • Users: Holds user-specific information, including a unique user ID (userId) used in other tables for relationship mapping.
  • UserViewingHistory: Keeps records of users’ viewing history, associated with both the Videos and Users tables through foreign keys.
  • Categories: Manages video categorization which helps in filtering and recommendations.

Data Partitioning and Distribution Strategies

Netflix’s database design relies heavily on effective partitioning and shard key selection. For partitioning, particularly in NoSQL databases like Cassandra:

  • List-based partitioning: Video meta data might be partitioned based on attributes such as genre or director, making it easier to fetch all videos within these categories.
  • Date and time-based partitioning: UserViewingHistory could be partitioned by timestamp, allowing efficient queries over viewing periods.

Deep Dives

Exploring the intricate components of a system design is akin to engineering detective work—deep dives into technical areas reveal the mechanisms that make a video streaming platform function seamlessly. These deep dives are pivotal for understanding the makeup of a complex service like Netflix, enabling us to appreciate the advanced technology that drives a user's effortless experience.

Content Delivery Network (CDN) Integration

A CDN is the backbone of video streaming services, designed to deliver content with reduced latency and ensure content is readily available worldwide. This network of distributed servers serves video content from the closest geographical location to the user, which is fundamental in facilitating a high-definition viewing experience with minimal delay.

Integrating a CDN with a video streaming platform involves:

  1. Choosing a CDN provider that aligns with the platform's scale and global reach.
  2. Establishing secure connections between the video storage solutions and the CDN.
  3. Configuring the CDN to cache video content dynamically based on viewership and bandwidth.
  4. Implementing geo-replication to serve content to users from the nearest data center.
  5. Continually monitoring CDN performance and making necessary adjustments for optimization.

Netflix's Open Connect CDN

Netflix's proprietary CDN, Open Connect, is a specialized solution tailored to optimize their streaming delivery. This system allows for an unparalleled control over how content is stored and transferred. Open Connect is seamlessly integrated with internet service providers worldwide, ensuring the content is as close as possible to the end-user.

Key implementations and innovations in Open Connect:

  • Hardware Design: Customized storage and network equipment tailored for high-volume streaming.
  • Network Architecture: A system built to minimize routing distance and enhance content delivery speed.
  • Peering Strategies: Directly interconnecting with ISPs to reduce latency and improve quality of service.

Video Streaming Essentials

The core technology stack for video streaming comprises protocols and codecs critical to its operation:

  • Adaptive Streaming Protocols: HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP).
  • Codecs: Efficient compression and decompression methods like H.264 and HEVC.

From encoding to playback, the sequence includes:

  1. Source video content is encoded into compatible streaming formats and codecs.
  2. Encoded videos are segmented and packaged for adaptive streaming.
  3. Packaged content is uploaded to video storage and distributed through the CDN.
  4. Upon a playback request, the client's device selects the appropriate quality based on connectivity.
  5. The device streams the video, with the CDN dynamically adapting the bitrate as needed.

Geo-Blocking and Content Availability

Geo-blocking serves the legal requirement of content licensing that varies across countries. It ensures providers like Netflix show content only where they have distribution rights, thus respecting regional copyright laws.

Mechanisms for implementing geo-blocking include:

  • IP Address Filtering: Determining user location based on IP address to allow or block content.
  • DNS Redirection: Redirecting DNS requests to serve different content based on geographic location.

How to Build a Recommender System

At the heart of Netflix's user experience lies its sophisticated recommender system, a pivotal factor in driving user engagement and satisfaction. This complex algorithm sifts through immense datasets to present viewers with content tailored to their preferences, thereby increasing the likelihood of longer viewing sessions and sustained subscriptions. The recommender system in a video streaming platform involves:

  • Understanding user preferences and viewing history.
  • Utilizing this data to suggest personalized content relevant to the user.

Algorithms and data inputs for personalizing content recommendations:

  • Collaborative Filtering: Makes predictions based on the viewing patterns of similar users.
  • Content-Based Filtering: Suggests content similar to what a user has liked in the past.
  • Machine Learning Techniques: Employs complex models to predict and refine recommendation accuracy.

The recommender system uses a variety of data points and algorithms:

  • Viewing History: Tracks what users have watched to predict future preferences.
  • Search Queries: Uses entered search terms to understand user interests more deeply.
  • Ratings and Feedback: Incorporates users' ratings and feedback to refine recommendations.
  • Watch Times and Habits: Observes when and how users watch content for contextual recommendations.

Netflix's Tech Stack

Now that we have a high-level understanding of Netflix's architecture. Let's take a look at the specific technologies used by Netflix.

Grasping the building blocks ("the lego pieces")

This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.

Core Building blocks

At the bare minimum you should know the core building blocks of system design

  • Scaling stateless services with load balancing
  • Scaling database reads with replication and caching
  • Scaling database writes with partition (aka sharding)
  • Scaling data flow with message queues

System Design Template

With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:

System Design Template

Additional Building Blocks

Additionally, you will want to understand these concepts

  • Processing large amount of data (aka “big data”) with batch and stream processing
    • Particularly useful for solving data-intensive problems such as designing an analytics app
  • Achieving consistency across services using distribution transaction or event sourcing
    • Particularly useful for solving problems that require strict transactions such as designing financial apps
  • Full text search: full-text index
  • Storing data for the long term: data warehousing

On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.

Working through problems and building solutions using the building blocks

Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.

Read the rest of this article and practice this problem with a FREE account