Design Image sharing website ( Flickr or Instagram ) - mbhushan/system-design GitHub Wiki

1. Core Use Cases:

a. upload image(s)
b. view images(s).

Important aspects of the system are:

* There is no limit to the number of images that will be stored, so storage scalability,     
  in terms of image count needs to be considered.
* There needs to be low latency for image downloads/requests.
* If a user uploads an image, the image should always be there (data reliability for images).
* The system should be easy to maintain (manageability).
* Since image hosting doesn't have high profit margins, the system needs to be cost-effective

2. Design goals:

a. Low latency (fast retrieval)
b. highly available.
c. Consistency.
d. Cost effective.

3. Dive Deep:

* Download Speed: Upload Speed => 3:1
* Read will be mostly from cache and writes will have to be written to DB(SSDs) eventually,    
  DB writes will be slower than the reads.
* It makes sense to have different services for reads and writes.
  • Webservers like Apache have upper limit of 500-1000 simultaneous connections, writes can quickly exhaust them. Writes tend to maintain open connection for the duration of the upload.
  • Makes sense to have read and write service separately and scale them independently.
  • Read and Write service can optimize their own performance with service-appropriate methods (for example, queuing up requests, or caching popular images

Key Ideas:

1. Photos should be served quickly to facilitate a good user experience. To achieve 
high throughput and low latency we must require at most one disk operation per read. 
We can accomplish this by keeping all metadata in main memory.

2. system would still generally incur 3 disk operations to fetch an image: one to 
read the directory metadata into memory, a second to load the inode into memory, 
and a third to read the file contents.

3. Application metadata describes the information needed to construct a URL that 
a browser can use to retrieve a photo. Filesystem metadata identifies the data necessary 
for a host to retrieve the photos that reside on that host’s disk.

4. http://⟨CDN⟩/⟨Cache⟩/⟨Machine id⟩/⟨Logical volume, Photo⟩