CHAPTER 7: DESIGN A UNIQUE ID GENERATOR IN DISTRIBUTED SYSTEMS

本文链接：https://blog.csdn.net/HuiFeiDeTuoNiaoGZ/article/details/133011987

文章讨论了在分布式环境中生成唯一且排序的数字ID的挑战，提出了使用多主复制、UUID、Ticket服务器和Twitter雪崩方法的设计。重点在于解决ID生成的同步问题、可扩展性和高可用性需求。

摘要由CSDN通过智能技术生成

However, auto_increment does not work in a distributed environment because a
single database server is not large enough and generating unique IDs across multiple
databases with minimal delay is challenging.

Step 1 - Understand the problem and establish design scope

Candidate: What are the characteristics of unique IDs?
Interviewer: IDs must be unique and sortable.
Candidate: For each new record, does ID increment by 1?
Interviewer: The ID increments by time but not necessarily only increments by 1. IDs
created in the evening are larger than those created in the morning on the same day.
Candidate: Do IDs only contain numerical values?
Interviewer: Yes, that is correct.
Candidate: What is the ID length requirement?
Interviewer: IDs should fit into 64-bit.
Candidate: What is the scale of the system?
Interviewer: The system should be able to generate 10,000 IDs per second.

• IDs must be unique.
• IDs are numerical values only.
• IDs fit into 64-bit.
• IDs are ordered by date.
• Ability to generate over 10,000 unique IDs per second.

Step 2 - Propose high-level design and get buy-in

• Multi-master replication
• Universally unique identifier (UUID)
• Ticket server
• Twitter snowflake approach

Multi-master replication

在这里插入图片描述
Instead of increasing the next ID by 1, we increase it by k, where k is the number of database servers in use.

• Hard to scale with multiple data centers
• IDs do not go up with time across multiple servers.
• It does not scale well when a server is added or removed.

UUID

UUID is a 128-bit number used to identify information in computer systems. UUID has a very low probability of getting collusion.
UUIDs can be generated independently without coordination between servers.
在这里插入图片描述
• Generating UUID is simple. No coordination between servers is needed so there will not be any synchronization issues.
• The system is easy to scale because each web server is responsible for generating IDs they consume. ID generator can easily scale with web servers.

Cons:
• IDs are 128 bits long, but our requirement is 64 bits.
• IDs do not go up with time.
• IDs could be non-numeric.

Ticket Server

在这里插入图片描述
Pros:
• Numeric IDs.
• It is easy to implement, and it works for small to medium-scale applications.
Cons:
• Single point of failure. Single ticket server means if the ticket server goes down, all
systems that depend on it will face issues. To avoid a single point of failure, we can set up multiple ticket servers. However, this will introduce new challenges such as data synchronization.

Twitter snowflake approach

在这里插入图片描述
• Sign bit: 1 bit. It will always be 0. This is reserved for future uses. It can potentially be used to distinguish between signed and unsigned numbers.
• Timestamp: 41 bits. Milliseconds since the epoch or custom epoch. We use Twitter
snowflake default epoch 1288834974657, equivalent to Nov 04, 2010, 01:42:54 UTC.
• Datacenter ID: 5 bits, which gives us 2 ^ 5 = 32 datacenters.
• Machine ID: 5 bits, which gives us 2 ^ 5 = 32 machines per datacenter.
• Sequence number: 12 bits. For every ID generated on that machine/process, the sequence number is incremented by 1. The number is reset to 0 every millisecond.

Step 3 - Design deep dive

Datacenter IDs and machine IDs are chosen at the startup time, generally fixed once the system is up running. Any changes in datacenter IDs and machine IDs require careful review since an accidental change in those values can lead to ID conflicts.

Timestamp

在这里插入图片描述
The maximum timestamp that can be represented in 41 bits is
2 ^ 41 - 1 = 2199023255551 milliseconds (ms), which gives us: ~ 69 years =

Sequence number

Sequence number is 12 bits, which give us 2 ^ 12 = 4096 combinations.
In theory, a machine can support a maximum of 4096 new IDs per millisecond.

Step 4 - Wrap up

• Clock synchronization. In our design, we assume ID generation servers have the same clock. This assumption might not be true when a server is running on multiple cores. The same challenge exists in multi-machine scenarios. Solutions to clock synchronization are out of the scope of this book; however, it is important to understand the problem exists.
Network Time Protocol is the most popular solution to this problem.

Section length tuning. For example, fewer sequence numbers but more timestamp bits are effective for low concurrency and long-term applications.

High availability. Since an ID generator is a mission-critical system, it must be highly
available.