Case Study: Dropbox - rFronteddu/general_wiki GitHub Wiki

Dropbox Clone

We want to design a file hosting service like Dropbox or Google Drive.

Requirements and goals of the system

  1. Users can upload and download their files/photos from any device
  2. users can share files or folders with other users
  3. Automatic sync between devices, updates should be reflected
  4. Storing file up to a GB
  5. ACID should be guaranteed for all operations
  6. Offline editing, user can add/delete/modify offline and changes should be reflected when they come back online.
  7. Snapshotting data to retrieve older versions

Design considerations

  • Expect high read and write volume, roughly same ration
  • We can store files in smaller chunks (say 4MB) to improve performance in case of failure (retrieve only missing chunks when operation fails)
  • We can reduce data exchange by transferring updated chinks only
  • By removing duplicate chunks we can save storage space and bw usage
  • keeping a local copy of the metadata (file name, size, etc) with the client can save round trip trips to the server.
  • For small changes, client can just upload the diffs

Capacity Estimation and Constraints

  • Assume 500M users, and 100M daily active users (DAU)
  • Assume on avg, each user connects from three devices (pc, tablet, phone)
  • Assume an user has 200 files photo => 100 billion files,
  • assume avg file size 100KB -> 100B * 100KB = 10Peta Bytes
  • Finally assume 1M active connections per minute

High Level Design

The user will specify a folder as the workspace on their device. Any file/photo/folder in this folder will be uploaded to the cloud, when a file is modified or deleted, it will be reflected in the same way in the cloud storage.

The user can specify similar workspaces on all their devices and modifications are propagated between same workspaces in different devices.

At a high level, we need to store files and their metadata like file name, file size, directory, etc. and who this file is shared with.

We need server for files and servers for metadata. We also need a mechanism to notify clients whenever updates happen so they can synchronize.

High Level Dropbox Clone

Component Design

Client

  • The client application monitors the workspace folder on the user's machine and syncs all files/folders in it with the remote cloud storage.
  • Client uses storage servers to upload, download, and modify files to backend Cloud Storage.
  • Client interacts with remote synchronization service to handle any file metadata updates (change file name, file size, modification date, etc.).

Essential operations

  • Upload and download files
  • Detect file changes in the workspace folder
  • Handle conflict due to offline or concurrent updates

How to efficiently handle file transfer

We can break each file into smaller chunks so that we transfer only those that are modified and not the whole file. If we keep track of each chunk that constitute a file in the metadata, we can also only retransmit the missing ones if something goes wrong during upload/download. We can optimize the chunk size based on bandwidth, storage, operation per second etc.

Copy of metadata in the client

Keep a local copy enable us to do offline updates and to reduce the number of round trips to update remote metadata.

Efficiently listen to changes within other clients

Clients could periodically check with the server if there are changes, this delays responsiveness compared to the server notifying the clients. If the client checks too often we also waste bw for empty responses. This approach is not scalable.

We could use HTTP long polling, clients can poll with an expectation that the server may not respond immediately, the server generates a response only when there is a change.

Based on these considerations, we can divide the client into the following four parts:

  • Internal Metadata Database: Keep track of all files, chunks, their versions, and their location in the file system.
  • Chunker: Split files into smaller pieces, reconstruct file from its chunks. Detect parts of file that have been modified.
  • Watcher: Monitor local workspace folder and notify Server of any action performed by the users (create, delete, update). Also listens for changes broadcasted by Synchronization Service.
  • Indexer: Processes Watcher events and update internal metadata db. Once chunks are submitted/downloaded to the Cloud Storage, Indexer interacts with the Synchronization Service to broadcast changes to other clients and update remote metadata database.

Other considerations:

  • Clients should exponentially back-off if the server is busy/not-responding.
  • Mobile clients should only sync on demand to save user's bandwidth and space.

Metadata Database

Responsible for maintaining the versioning and metadata information about files/chunks, users, and workspaces.

We can use MySQL or NoSQL like Postgres.

The Synchronization Service should provide a consistent view of the files using a db across users. Since NoSQL do not support ACID in favor of scalability and performance, using a relational DB is better in this case.

The metadata DB will store information about the following objects:

  • Chunks
  • Files
  • Users
  • Devices
  • Workspace

Synchronization Service

This service processes file updates made by a client and applies changes to other subscribed clients. It also synchronizes clients' local databases with the information stored in the remote Metadata DB.

Clients communicate with this service to either obtain updates from Cloud Storage or send files and updates to the Cloud Storage and, potentially, to other users.

When this service receives an update request, it checks with the Metadata Database for consistency and then proceeds with the update. Subsequently, all subscribers are notified of the update.

This service must be designed to reduce data transmission to achieve better response time. We can use differencing algorithms to reduce the amount of data that needs to be synchronized so that we can just transmit differences between two versions of a file.

We assumed files are divided into 4MB chunks. Server and clients can calculate a has (SHA-256) to see whether to update the local copy of a chunk or not. On the server, if we already have a chunk with a similar hash (even from another user), we don't need to create another copy. we can use the same chunk.

To provide an efficient scalable synchronization protocol we can consider using a communication middleware between clients and Synchronization Service like KAFKA or NATS, we will call this Message Queueing Service.

Message Queueing Service

We will configure two queues:

  • The Request Queue is a global queue and all clients will share it. Client's requests to update the metadata Database will be sent to the Request Queue first, from there, the Synchronization Service will take it to update metadata.
  • The Response Queues will be used by the Synchronization Service to deliver the update messages to each client. There will be a Response Queue for each subscribed client so that that they can process at their own speed.

Cloud/Block Storage

The Block Storage stores chunks of files uploaded by the user. Clients directly interact with it to send and receive objects.

File Processing Workflow

What happens when A updates a file that is shared with B and C?

  • A uploads chunks to cloud storage
  • A updates metadata and commits changes
  • A gets confirmation and notifications are sent to B and C about the changes
  • B and C receive metadata changes and download updated chunks.

Data Deduplication

Deduplication is used to eliminate duplicate copies of data to improve storage and network utilization. For each chunk we can calculate a hash and compare with all the hashes of existing chinks to see if we already have the same chunk.

We can implement

  • Post-process deduplication: New chunks are first storage on storage and later some process analyzes the data looking for duplication. Clients don't need to wait but we need to transport extra data and need extra storage.
  • In-line deduplication: Calculations done in real-time as the clients are entering data on their device. If our system identifies that a chunk is already stored, only a reference to the existing chunk will be added in the metadata, rather than a full copy of the chunk. This approach gives best network and storage usage.

Metadata partitioning

To scale the metadata DB we need to partition it so it can store information about millions of users and billions of files/chunks.

  • Vertical Partitioning: We could store tables related to one particular feature on one server, for example, all user related tables in one db and all files/chunks in another, this approach is straightforward but it has scalability issues as the number of entries can go out of control and also performance and consistency issues for joins that will need data from multiple DB.
  • Range Based Partitioning: We could store files/chunks in separate partitions based on the first letter of the file path? This can lead to unbalanced servers as more files could end in servers identified by common named folder.
  • Hash-Based partitioning: We take an hash of the object and we store base on it. We could take an hash of the fileID to determine the partition. The hashing function would randomly distributed objects into different partitions, we can rely on Consistent Hashing to reduce overloading.

Caching

We can use two kind of caches.

  • To deal with hot files/chunks we can use Memcached to cache chunks with its respective IDs/Hashes and check if a chunk is already cached before hitting the block storage. If we used 144GB of cache, we could cache 35K chunks.
  • As an eviction policy we could use LRU , discarding least recently used chunks first since it is likely that things that have not been touched in a long time remain untouched while things people are working on may be viewed or updated again sooner.

Load Balancers

Two good places to load balance in this system are:

  • Between clients and block servers
  • Between Clients and metadata servers

We can start with a simple Round Robin to distribute requests. If server loads become too uneven we can switch to a more intelligent algorithm that places requests on less used servers first.

Security permissions and file sharing

We need to store permissions for each file in the metadata db to reflect what is visible or modifiable by any user.

Cache Level Dropbox Clone