8: DESIGN A URL SHORTENER - swchen1234/systemDesign GitHub Wiki
Step 1 - Understand the problem and establish design scope
常见可以问的问题:
- traffic Volume?
- how short?
- what characters allowed?
Back of the envelope estimation
- Write operation: 100 million URLs are generated per day. • Write operation per second: 100 million / 24 /3600 = 1160
- Read operation: Assuming ratio of read operation to write operation is 10:1, read operation per second: 1160 * 10 = 11,600
- Assuming the URL shortener service will run for 10 years, this means we must support 100 million * 365 * 10 = 365 billion records.
- Assume average URL length is 100.
- Storage requirement over 10 years: 365 billion * 100 bytes * 10 years = 365 TB
Step 2 - Propose high-level design and get buy-in
API Endpoints
We will design the APIs REST-style. A URL shortener primary needs two API endpoints.
- URL Shortening
Post(longURL): return shortURL
- URL redirecting
Get(shortURL): return longURL
URL redirecting
-
301 Redirect: A 301 redirect shows that the requested URL is “permanently” moved to the long URL. Since it is permanently redirected, the browser caches the response, and subsequent requests for the same URL will not be sent to the URL shortening service. Instead, requests are redirected to the long URL server directly.
-
302 Redirect: A 302 redirect means that the URL is “temporarily” moved to the long URL, meaning that subsequent requests for the same URL will be sent to the URL shortening service first. Then, they are redirected to the long URL server.
-
Implementation of URL redirecting -> hash tables
Step 3 - Design deep dive
Data Model
Hash table 不够scalable -> store in a Relational Database
Hash function
1. hash + collision
常见hash function有CRC32, MD5, SHA-1, 但即使最短的CRC32产生的hash value仍有超过7个字符 ->解决:取hash value前7个字符+predefined string
2. Base 62 conversion
将数字化为62进制
URL shortening deep dive
URL redirecting deep dive
Step 4 - Wrap up
additional talking points
- Rate limiter: A potential security problem we could face is that malicious users send an overwhelmingly large number of URL shortening requests. Rate limiter helps to filter out requests based on IP address or other filtering rules. If you want to refresh your memory about rate limiting, refer to “Chapter 4: Design a rate limiter”.
- Web server scaling: Since the web tier is stateless, it is easy to scale the web tier by adding or removing web servers.
- Database scaling: Database replication and sharding are common techniques.
- Analytics: Data is increasingly important for business success. Integrating an analytics solution to the URL shortener could help to answer important questions like how many people click on a link? When do they click the link? etc.
- Availability, consistency, and reliability. These concepts are at the core of any large system’s success. We discussed them in detail in Chapter 1, please refresh your memory on these topics.