Systems Design Examples v2 Google Drive - herougo/SoftwareEngineerKnowledgeRepository GitHub Wiki

  • create/upload files
  • files shared across multiple devices
  • changes should be propagated (notifications)

Questions

  • will we support
    • privacy settings -> Yes
    • version retrieval -> No
  • what kinds of devices will we support (desktop, mobile)?
  • How should we handle 2 conflicting sync changes on a file?

Functional Requirements

  • create/upload files
  • files shared across multiple devices
  • changes should be propagated

Capacity Estimates

  • 1 million users
  • 1 million DAU
  • 1 file per day
  • average file 5 MB
  • limit 15 GB per user
  • limit 10 GB per file

API

  • upload
  • download
  • get file version

Data

  • user
    • id
    • username
    • email
    • password_hash
  • device
    • id
    • user_id
  • file
    • id
    • path
    • is_folder
    • last_version
    • file_hash
    • owner_id
  • file_version
    • id
    • file_id
    • version_number

Discussion

  • (index/shard by user_id)

Challenges

  • trade-off with compression on client vs server (use it for desktop)
  • best way to upload big files -> use S3 multipart uploads (client splits file into chunks and uploads chunks in parallel); use checksum too
  • polling, long polling, or websocket for keeping files synced?
    • websocket because it's fast
  • DB for the metadata
    • Why? -> strong consistency
  • file with 100k user to sync -> hybrid approach with popular document service

Improvements

  • Scaling to multiple regions
  • encryption

Sources