SnowFlake-分布式ID生成算法

最新推荐文章于 2024-06-25 16:29:42 发布

Bpazy

最新推荐文章于 2024-06-25 16:29:42 发布

阅读量210

点赞数

分类专栏： Java 文章标签： SnowFlake 分布式ID

本文链接：https://blog.csdn.net/hanziyuan08/article/details/84868801

版权

Java 同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

算法

4 篇文章 0 订阅

订阅专栏

首先介绍SnowFlake：

所有生成的id按时间趋势递增(要注意不是绝对递增)；
整个分布式系统内不会产生重复id（因为有datacenterId和workerId来做区分）

SnowFlake算法生成id的结果是一个64bit大小的整数(对应Java中的long类型，转成String长度最大19)，它的结构如下图：算法原理图

1位，不用。二进制中最高位为1的都是负数，但是我们生成的id一般都使用整数，所以这个最高位固定是0
41位，用来记录时间戳（毫秒）。
- 41位可以表示 $2^{41}−1$ 个数字。
- 如果只用来表示正整数（计算机中正数包含0），可以表示的数值范围是：0 至 $2^{41}−1$ ，减1是因为可表示的数值范围是从0开始算的，而不是1。
- 也就是说41位可以表示241−1个毫秒的值，转化成单位年则是 $2^{41}−1)/(1000∗60∗60∗24∗365)=69$ 年
10位，用来记录工作机器id。
- 可以部署在 $2^{10}=1024$ 个节点，包括5位datacenterId和5位workerId
- 5位（bit）可以表示的最大正整数是 $2^{5}−1=31$ ，即可以用0、1、2、3、…31这32个数字，来表示不同的datecenterId或workerId
12位，序列号，用来记录同毫秒内产生的不同id。
- 12位（bit）可以表示的最大正整数是 $2^{12}−1=4095$ ，即可以用0、1、2、3、…4094这4095个数字，来表示同一机器同一时间截（毫秒)内产生的4095个ID序号。

以下是引用自理解分布式id生成算法SnowFlake才Twitter的Scale版本翻译来的：

public class IdWorker{

    private long workerId;
    private long datacenterId;
    private long sequence;

    public IdWorker(long workerId, long datacenterId, long sequence){
        // sanity check for workerId
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0",maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0",maxDatacenterId));
        }
        System.out.printf("worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);

        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;
    }

    private long twepoch = 1288834974657L;

    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private long sequenceBits = 12L;

    private long workerIdShift = sequenceBits;
    private long datacenterIdShift = sequenceBits + workerIdBits;
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private long sequenceMask = -1L ^ (-1L << sequenceBits);

    private long lastTimestamp = -1L;

    public long getWorkerId(){
        return workerId;
    }

    public long getDatacenterId(){
        return datacenterId;
    }

    public long getTimestamp(){
        return System.currentTimeMillis();
    }

    public synchronized long nextId() {
        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds",
                    lastTimestamp - timestamp));
        }

        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }

        lastTimestamp = timestamp;
        return ((timestamp - twepoch) << timestampLeftShift) |
                (datacenterId << datacenterIdShift) |
                (workerId << workerIdShift) |
                sequence;
    }

    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    private long timeGen(){
        return System.currentTimeMillis();
    }
}

关于上面位运算的原理可以看理解分布式id生成算法SnowFlake或者Id，本文不作过多阐述。

从代码中可以发现，其中的工作机器ID有10位，Data Center Id和Worker Id各占5bit，5bit可以表示 $2^5-1=31$ 台机器，但Data Ceneter Id也占5bit就太浪费了，毕竟我和大多数的公司没有那么多的数据中心。
所以在我的开源项目Id中，我把Data Center Id阉割掉，给Worker Id预留了10bit，那么Worker节点的总数可以达到 $2^{10}=1024$ 个。

那么新的问题就是如何在不同的机器使用不同的Worker Id了，一个解决方案就是采用IP，
如果线上机器的IP二进制表示的最后10位不重复，那么即可采用该方案。
列如机器的IP为192.168.1.100,二进制表示:11000000 10101000 00000001 01100100，
截取最后10位 01 01100100,转为十进制356,则设置workerId为356。
代码则为：

byte[] addr = inetAddress.getAddress();
long workerId = ((addr[0] & 0xFFL) << (3 * 8))
                + ((addr[1] & 0xFFL) << (2 * 8))
                + ((addr[2] & 0xFFL) << (1 * 8))
                + (addr[3] & 0xFFL);

workerId即为结果。

测试：

public static void main(String[] args) {
     IdWorker worker = new IdWorker(1,1,1);
     for (int i = 0; i < 30; i++) {
         System.out.println(worker.nextId());
     }
 }