8: DESIGN A URL SHORTENER - swchen1234/systemDesign GitHub Wiki

Step 1 - Understand the problem and establish design scope

常见可以问的问题:

  • traffic Volume?
  • how short?
  • what characters allowed?

Back of the envelope estimation

  • Write operation: 100 million URLs are generated per day. • Write operation per second: 100 million / 24 /3600 = 1160
  • Read operation: Assuming ratio of read operation to write operation is 10:1, read operation per second: 1160 * 10 = 11,600
  • Assuming the URL shortener service will run for 10 years, this means we must support 100 million * 365 * 10 = 365 billion records.
  • Assume average URL length is 100.
  • Storage requirement over 10 years: 365 billion * 100 bytes * 10 years = 365 TB

Step 2 - Propose high-level design and get buy-in

API Endpoints

We will design the APIs REST-style. A URL shortener primary needs two API endpoints.

  1. URL Shortening

Post(longURL): return shortURL

  1. URL redirecting

Get(shortURL): return longURL

URL redirecting

  • 301 Redirect: A 301 redirect shows that the requested URL is “permanently” moved to the long URL. Since it is permanently redirected, the browser caches the response, and subsequent requests for the same URL will not be sent to the URL shortening service. Instead, requests are redirected to the long URL server directly.

  • 302 Redirect: A 302 redirect means that the URL is “temporarily” moved to the long URL, meaning that subsequent requests for the same URL will be sent to the URL shortening service first. Then, they are redirected to the long URL server.

  • Implementation of URL redirecting -> hash tables

Step 3 - Design deep dive

Data Model

Hash table 不够scalable -> store in a Relational Database

Hash function

1. hash + collision

常见hash function有CRC32, MD5, SHA-1, 但即使最短的CRC32产生的hash value仍有超过7个字符 ->解决:取hash value前7个字符+predefined string

2. Base 62 conversion

将数字化为62进制

URL shortening deep dive

URL redirecting deep dive

Step 4 - Wrap up

additional talking points

  • Rate limiter: A potential security problem we could face is that malicious users send an overwhelmingly large number of URL shortening requests. Rate limiter helps to filter out requests based on IP address or other filtering rules. If you want to refresh your memory about rate limiting, refer to “Chapter 4: Design a rate limiter”.
  • Web server scaling: Since the web tier is stateless, it is easy to scale the web tier by adding or removing web servers.
  • Database scaling: Database replication and sharding are common techniques.
  • Analytics: Data is increasingly important for business success. Integrating an analytics solution to the URL shortener could help to answer important questions like how many people click on a link? When do they click the link? etc.
  • Availability, consistency, and reliability. These concepts are at the core of any large system’s success. We discussed them in detail in Chapter 1, please refresh your memory on these topics.