Solution

Dropbox System Design

Resource Estimation

Queries per second (QPS)

Assuming each sync involves a read and a write operation, let's calculate the number of operations per second. We note that there are 86,400 seconds in a day.

Using the resource estimator, we get the following results:

Design Dropbox Resource Estimation

High Level Design

How Dropbox Works

Before getting into the details of high level design, let's get the background knowledge of how files are stored in a cloud storage service like Dropbox.

File Chunking

In the beginning, one might think that the simplest way to handle file uploads and downloads in a cloud storage system would be to treat each file as a singular entity. Let's call this the naïve approach. In this scenario, whenever a user wants to upload a file, the entire file would be sent to the server. Similarly, when downloading, the entire file is fetched all at once. At first glance, this seems straightforward and uncomplicated.

However, problems arise when we delve deeper into real-world scenarios. Imagine a professional video editor working on a 10GB video file. After making a tiny edit, such as adjusting the color on a single frame, they need to save and backup their work. In the naïve approach, this minuscule change would necessitate re-uploading the entire 10GB file. Such a method is clearly inefficient, consuming unnecessary bandwidth and time.

This is where the chunk-based approach comes into play. Instead of treating a file as one large block, it's divided into smaller, manageable pieces or chunks. Let's visualize this with the video editor's dilemma. Using the chunk-based approach, the 10GB video might be split into 1000 chunks of 10MB each. When the editor makes that small change, perhaps only one or two of these chunks get altered. Now, instead of re-uploading 10GB, only 20MB needs to be transmitted. The savings in time and bandwidth are evident.

But the benefits don't end there. The chunking strategy has multiple advantages:

  • Resilience: If our video editor had an unstable internet connection and the upload was interrupted, the naïve method would potentially start over, causing frustration. In contrast, with chunking, only the interrupted chunks need to be retransmitted. This resilience becomes a boon in unreliable network conditions.
  • Deduplication: Over time, our editor might have multiple projects with some shared footage. Rather than storing duplicate data, the system can recognize identical chunks and store them just once. This not only saves storage space but also upload time.
  • Parallelism: Uploading or downloading several chunks simultaneously can maximize bandwidth utilization. For our editor, this means quicker backup and retrieval times, especially vital when deadlines are looming.
  • Streaming: If our editor wanted to preview a clip from the cloud, chunking allows them to stream just the part they need. They don't have to wait for the entire file; they can start viewing as soon as the relevant chunks are loaded.
  • Security: Each chunk can be encrypted individually. If there were a security breach and a chunk's encryption was compromised, the entire file wouldn't necessarily be vulnerable.

This progressive shift from the naïve to the chunk-based approach showcases a classic evolution in system design. What starts as a straightforward solution soon reveals its shortcomings under real-world pressures. By addressing these issues with a more sophisticated method, like chunking, systems can provide improved performance, efficiency, and user experience.

How Files are Stored on Dropbox

Now that we have a good idea how file chunking works. Let's take a look at how Dropbox divides and stores a file.

Dropbox System Design File Chunking, Block and Metadata Server

Dividing the File into 4MB Blocks

Imagine you've just completed a high-definition video project that's about 40MB in size. When you decide to back up this video to the cloud, the system doesn't just take the video as one large data blob. Instead, it intricately slices this file into ten distinct blocks, each 4MB in size. This segmented approach is the first step in ensuring an efficient storage and retrieval process.

Hashing Each Block for Uniqueness

Now, with ten blocks ready, the system must ensure that each block can be uniquely identified. For this purpose, it utilizes a cryptographic hashing function, such as SHA-256. This function takes in the data from each 4MB block (a rather arbituary choice according to Dropbox team's tech talk) and outputs a unique hash value. Even a minor change in the block's data would result in a drastically different hash, ensuring that each block's content can be reliably identified and verified.

Storing Blocks on the Block Server

With each of the ten blocks hashed, they're ready to be sent to the block server. Here, the previously calculated hash serves a dual purpose: it acts as a unique identifier (like a fingerprint) and a lookup key. The block server, in its vast digital vault, stores each block in a manner reminiscent of a key-value storage system. The hash (key) provides a direct path to the block's data (value). This methodology ensures swift retrieval and also offers benefits like deduplication. For instance, if another user uploads a file with a block identical to one already in the system, there's no need to store that block again. The system can simply reference the existing block using its hash.

Metadata Management and Its Storage

While the block server is busy safeguarding the data blocks, another equally crucial process unfolds: the organization and storage of metadata. Metadata, in this context, is the information about the file that doesn't include its actual content. This encompasses the file's name, its path, and a specially designated 'namespace'. A namespace is a unique identifier, often used to differentiate files from different users or different directories.

As the video file's blocks are stored, the system crafts a comprehensive metadata record. This record, beyond the file's name and path, includes an ordered list of the blocks (or their hashes) that constitute the complete file. This list is pivotal, as it maps out the sequence to reconstruct the original file from its constituent blocks.

This metadata is then sent to its dedicated storage space, often managed by a specialized metadata server. This server acts as the system's directory or index, guiding it on how to piece together files from the blocks when necessary.

To encapsulate, this orchestrated dance between dividing files, hashing blocks, managing two distinct servers, and meticulously tracking metadata exemplifies the depth of sophistication present in modern cloud storage systems. It showcases the lengths to which technology goes to ensure efficient, reliable, and organized storage for users.

Dropbox Design Diagram

Dropbox System Design Diagram Write Path Uploading a New File (Write Path)

When a user adds a new file to their Dropbox folder:

  1. File Preparation: The Dropbox client on the user's device divides the file into 4MB blocks, calculating a hash for each block. These blocks are then transmitted to the Block Service.
  2. Block Storage: The Block Service stores these blocks, using the block hashes as reference keys, in its Block Storage system. 3, 4. Metadata Transmission: The client sends the file's metadata, which includes information such as namespace, path, and the list of blocks, to the Metadata Service via its load balancer. 5, 6. Metadata Storage: The chosen Metadata Server then writes these details into the Metadata Database and simultaneously caches this information for quick access.
  3. Notification: After recording the metadata, the Metadata Service alerts the Notification Service about the new file.
  4. Syncing Alert: The Notification Service informs other client installations about the new file addition, prompting them to synchronize.

Dropbox System Design Diagram Write Path

Downloading the New File (Read Path)

When another device attempts to sync the new file from Dropbox:

  1. Notification Receipt: The client on the user's secondary device receives the update from the Notification Service about the newly added file and initiates the syncing process. 2, 3, 4, 5. Metadata Retrieval: The client communicates with the Metadata Service's load balancer, which routes the request to an appropriate Metadata Server. This server checks its cache for the required metadata. If unavailable, it fetches the data from the Metadata Database.
  2. Block Request: Armed with the metadata, the client sends a request to the Block Service to retrieve the associated file blocks.
  3. Block Retrieval: The Block Service fetches the necessary blocks, identified by their hashes, and sends them to the client.
  4. File Reconstruction: Upon receipt, the client assembles these blocks to reconstruct the original file and saves it to the device's storage.

API Design

Now that we have a good idea how the system works, let's flesh out the API endpoints.

Metadata Server

/save_metadata

  • Request Type: POST
  • Request Parameters: None

Request JSON:

{ "filename": "example.txt", "path": "/user/directory/example.txt", "block_hashes": ["hash1", "hash2", ...] }

Response JSON:

{ "status": "success", // or "error" "message": "Metadata saved successfully." // or error message }

/get_metadata

  • Request Type: GET
  • Request Parameters: path (e.g., /get_metadata?path=/user/directory/example.txt)

Response JSON:

{ "filename": "example.txt", "path": "/user/directory/example.txt", "block_hashes": ["hash1", "hash2", ...] }

Block Server

/upload/<block_hash>

  • Request Type: POST
  • Request Parameters: block_hash (e.g., hash1)

Request Data: Binary data of the block

Response JSON:

{ "status": "success", // or "error" "message": "Block uploaded successfully." // or error message }

/download/<block_hash>

  • Request Type: GET
  • Path Parameters: block_hash (e.g., hash1)

Response: Binary data of the block

Error Response Headers:

  • Status Code: 404

Error Response JSON:

{ "status": "error", "message": "Block not found." }

Notification Server

/long_poll

  • Request Type: GET
  • Request Parameters: None

Response: Connection kept open until a change occurs.

Response JSON (upon change):

{ "status": "change_detected", "message": "A change has been detected.", "changed_files": [ { "filename": "example.txt", "path": "/user/directory/example.txt" } // ... potentially other changed files ] }

Detailed Design

Database

The core function of the Metadata Database in Dropbox is to store essential details about the files without actually storing the file content itself. One of the core requirements is maintaining strong ACID properties and consistency across devices. We a Relational Database Management System (RDBMS) structure to manage and query the metadata generated by users.

File Metadata Table:

This table tracks individual files and their metadata.

Columns:

  • FileID: Unique identifier for each file.
  • NamespaceID: Identifier for the user or shared folder.
  • RelativePath: Path of the file relative to the Dropbox root.
  • BlockList: List of SHA-256 hashes representing the chunks of the file.
  • LatestJournalID: Reference to the latest version of the file.

Journal Table:

This table tracks file changes over time. Each entry represents a version of the file.

Columns:

  • JournalID: Unique identifier for each version of the file.
  • FileID: Reference to the file in the File Metadata Table.
  • Timestamp: Time when this version was created.
  • ChangeType: Type of change (e.g., modification, deletion, addition).
  • ChangedBlockList: List of blocks that were changed in this version.

Users & Permissions Table:

This table identifies users, their files, and their permissions.

Columns:

  • UserID: Unique identifier for each user.
  • NamespaceID: Reference to the Namespace ID in the File Metadata Table.
  • UserDetails: Email, Name, etc.
  • Permissions: Type of permission on shared files (e.g., read, write, admin).

Relations:

  • File Metadata <-> Journal:

    • One-to-many relation. One file can have multiple versions, each represented in the Journal table. The LatestJournalID in the File Metadata Table points to the latest version in the Journal table.
  • Users & Permissions <-> File Metadata:

    • One-to-many relation. One user can have multiple files, and each file's NamespaceID references back to its user.

To track file changes:

  1. Addition: Add a new file entry in the File Metadata Table and an associated initial version in the Journal Table.
  2. Modification: Add a new version to the Journal Table, detailing the changes.
  3. Deletion: Add a new version with the change type "deletion" to the Journal Table.

Devices can compare the LatestJournalID of files with the one in the File Metadata Table during syncing. If they're different, the system knows updates have occurred. The Journal table helps identify changes, aiding synchronization.

Notification Service

The Notification Service continuously monitors for modifications across Dropbox accounts. Its primary purpose isn't to process file data or metadata; instead, its responsibility lies in notifying relevant clients about any changes made.

To facilitate this, each Dropbox client sets up what is termed a "long poll connection" to the Notification Service. In essence, the client establishes this connection and patiently waits, effectively asking, "Has there been any change?". Instead of the Notification Service actively sending a message to inform the client of a change, it simply closes the long poll connection when a modification is detected. This action serves as a trigger for the client, indicating that something has changed. As a result, the client knows it needs to re-establish a secure connection to the Metadata Servers to synchronize and reflect the most recent changes.

This approach might lead one to question why Dropbox chose long poll connections over other technologies. Let's delve into a comparison:

  • Long Poll Connection: Here, the client sends a request to the server and waits. If there's a change while the connection is open, the server responds, effectively ending the request. The client then processes this response and sends another request, waiting again. This method offers a simple and effective way to detect changes without constant server-client chatter, making it efficient for systems like Dropbox.
  • Server-Sent Events (SSE): SSE is a technology where the server can push information to a web client over an HTTP connection, without the client needing to request it.
  • WebSocket: WebSocket is a protocol that establishes a two-way communication channel over a single, long-lived connection. It allows both the server and client to send messages at any time. While it offers more flexibility than long poll connections, it might be overkill for Dropbox's needs. The added complexity of maintaining a two-way communication might not provide significant advantages for simply notifying of changes.

Notification Service in Dropbox uses long poll connections as a strategic choice to balance efficiency and simplicity.

How to Store Blocks More Efficiently

When dealing with data storage, especially in cloud environments like Amazon's S3, costs can quickly accumulate. This isn't just about the monetary costs of storing data, but also the bandwidth consumed when transferring files, a concern that becomes particularly pronounced when considering mobile devices with limited data plans.

Imagine a scenario where multiple users are uploading files to a platform. Now, it's not too far-fetched to consider that some of these files might be identical or, at the very least, share significant portions of content. For instance, two users might upload the same popular software installer, or perhaps a widely shared document or media file. If the platform were to store each of these files without any checks for redundancy, it would be wasting precious storage space.

Since we are already breaking the files into 4MB blocks, we can identify and eliminate redundancies at the block level rather than the file level. This process, known as deduplication, ensures that even if two files are slightly different, the blocks of data that are identical aren't stored multiple times. Since each block of data is hashed and these hashes are used as unique identifiers or keys, the process becomes streamlined. If two blocks produce the same hash, they're identical. Instead of storing both, we can simply store one and reference it for both files. In this way, by comparing hashes, we can efficiently deduplicate data, optimizing storage costs and ensuring efficient utilization of bandwidth during transfers.

Note that for the cloud storage, Dropbox initially used AWS's S3 and later on moved to build their own solution called "Magic Pocket"

How to Track Changes More Efficiently

In the design we've discussed thus far, files are segmented into 4MB blocks. This approach is straightforward and works effectively for many scenarios. However, a challenge arises when there's a modification in the middle of a file, such as an insertion or a deletion. Such a change could potentially shift the boundaries of all subsequent blocks, causing them to be marked as "changed" even if their actual content remains the same. Recognizing all these blocks as changes would be quite inefficient, especially when considering the associated costs of storage, bandwidth, and processing time.

This is where the concept of "delta sync" proves its worth. Delta sync is a method that identifies and transmits only the parts of a file that have changed, instead of the entire file. It's particularly beneficial when dealing with large files, where small modifications could result in significant unnecessary data transfers under a simple block comparison method.

To make delta sync work, the process employs various algorithms and techniques. For instance, using a rolling checksum or rolling hash enables efficient identification of changed sections within a file without rehashing the entire block. When a file is modified, this method allows for the computation of hash values in a manner that's sensitive to the actual content changes, rather than just block shifts.

Furthermore, delta sync employs methods like variable-sized chunking, which doesn't rely on fixed block sizes. Instead, it sets chunk boundaries based on the content of the file. Techniques like the Rabin-Karp algorithm are frequently used in this context. As a result, when content is added or removed from the file, only the chunks adjacent to the change are affected, leaving distant chunks untouched.

So, how does delta sync handle file modifications specifically? When a file is changed, delta sync analyzes both the old and new versions of the file. It then identifies the "deltas" or differences between the two versions and syncs only these deltas. This is accomplished by matching chunks from the modified file to chunks in the original file. Once a mismatch is found, delta sync looks for the next matching chunk. The data between the two matching chunks is what has been modified. By syncing only this modified data, delta sync ensures efficient use of storage and bandwidth, making the entire synchronization process faster and more resource-friendly.

After device D1 completes the upload of file f, and device D2 remains offline, how does device D2 determine which files to synchronize once it regains online connectivity?

Once D2 comes online, it firsts sends a request to the Metadata Service to ask if there is any new changes. It sends the last timestamp it was online. Metadata Service queries the Journal table to find the changes and the File Metadata table to find block lists for D2 to fetch. D2 then downloads the blocks using the same flow as described in the "Read Path" section. a

References

Under the hood: Architecture overview by Dropbox

Inside the Magic Pocket

How Dropbox Handles File Chunking and Streaming Synchronization