7: DESIGN A UNIQUE ID GENERATOR IN DISTRIBUTED SYSTEMS - swchen1234/systemDesign GitHub Wiki
在传统database 中,用auto_increment即可解决。但在 a distributed environment中,因为 a single database server is not large enough and generating unique IDs across multiple databases with minimal delay is challenging.
Step 1 - Understand the problem and establish design scope
可以问的问题:
- character of unique IDs
- does it increment by 1?
- only numerical values?
- ID length requirement
- scale of the system
Step 2 - Propose high-level design and get buy-in
常见方法:
Multi-master replication
拓展databases’ auto_increment, 每个db添加新id时+k, 其中k为is the number of database servers in use。 缺点:
- Hard to scale with multiple data centers • IDs do not go up with time across multiple servers. • It does not scale well when a server is added or removed.
Universally unique identifier (UUID)
UUID is a 128-bit number used to identify information in computer systems. 每个server独自生成,UUID has a very low probability of getting collusion. example of UUID: 09c93e62-50b4-468d-bf8a-c07e1040bfb2
Pros: • Generating UUID is simple. No coordination between servers is needed so there will not be any synchronization issues. • The system is easy to scale because each web server is responsible for generating IDs they consume. ID generator can easily scale with web servers. Cons: • IDs are 128 bits long, but our requirement is 64 bits. • IDs do not go up with time. • IDs could be non-numeric.
Ticket server
use a centralized auto_increment feature in a single database server (Ticket Server)
Twitter snowflake approach
<img
Step 3 - Design deep dive
深入探讨Twitter snowflake approach
- Sign bit: 1 bit. It will always be 0. This is reserved for future uses. It can potentially be used to distinguish between signed and unsigned numbers.
- timestamp: convert UTC <-> binary representation
- Sequence number: 12 bits. For every ID generated on that machine/process, the sequence number is incremented by 1. The number is reset to 0 every millisecond.
Step 4 - Wrap up
Additional Points:
- Clock synchronization: 不同server的clock不同步
- Section length tuning: 根据concurrency和long-term application决定sequence number 和timestamp 的长度。
- High availability.