分布式全局唯一id实现-1DB步长&Mybatis-plus雪花算法优化

最新推荐文章于 2024-04-08 10:52:02 发布

拽着尾巴的鱼儿

最新推荐文章于 2024-04-08 10:52:02 发布

阅读量1.3k

点赞数

分类专栏： java基础篇 db数据库 Spring框架篇文章标签： mybatis 分布式算法

本文链接：https://blog.csdn.net/l123lgx/article/details/130004191

版权

java基础篇同时被 3 个专栏收录

78 篇文章 1 订阅

订阅专栏

Spring框架篇

69 篇文章 3 订阅

订阅专栏

db数据库

49 篇文章 0 订阅

订阅专栏

前言: 开发过程中通常需要一个全局唯一id，对数据进行标识，以便于对数据的统计，因为考虑到主键索引的性能问题，使用数字型效率更高，id 的生成要么借助数据库，要么借助程序内部完成，本文通过数据库和程序两个维度进行探讨。

全局唯一id 的生成策略：

1 借助数据库生成：

1.1 使用mysql 自增id，设置自增初始值和步长来实现不同的数据库id 生成的不同：
需要修改每个mysql 服务端my.cnf 配置文件中的步长和初始值，因为mysql 中步长：auto_increment_increment ，自增初始值：auto_increment_offset 的默认值都是1, 即使通过mysql 语句全局修改步长和初始值：

 -- 全局级别
SET GLOBAL auto_increment_increment=50;
SET GLOBAL auto_increment_offset = 10;
-- 会话级别
SET SESSION auto_increment_increment=50;
SET SESSION auto_increment_offset = 10;

但是在重启mysql 之后，这些配置一会回到默认值1，因为通过语句设置的值仅仅保存在了内存中；要想永久生效，必须修改mysql 服务端的配置文件，增加步长和初始值的设置：

# 设置步长为100
auto-increment-increment=100
# 设置自增初始值 6
auto-increment-offset=6

这样设置之后，所有的表都将遵循初始值为6，步长为100 完成id 的自增；从而也避免了在table 表级别设置AUTO_INCREMENT=10 ，其实并没有从10开始完成递增的问题；

1.2 创建表
以user 表为例：

CREATE TABLE `applet_user1` (
  `id` BIGINT (20) UNSIGNED NOT NULL AUTO_INCREMENT  COMMENT '主键',
  `user_name` varchar(64) DEFAULT NULL COMMENT '用户名称',
  `secret` varchar(64) DEFAULT NULL COMMENT '用户密码',
  `status` int(11) DEFAULT '1' COMMENT '状态 默认1',
  `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8mb4 COMMENT='用户表';

设置id BIGINT ，无符号，最大支持64位，无符号id区间为0~2^64

注意：mysql 中提供的步长和自增初始值，无法对单个表或者单个数据库设置，目前只提供了服务级别的全局配置，也即如果要通过自增初始值和步长这种方式，多个数据库需要部署多个mysql 实例，对每个mysql 实例的步长和初始值完成设置。

2 借助程序生成：

基于程序生成id的策略比较灵活，除了大名鼎鼎的UUID之外，比较常用的有雪花算法，和数据库分段id 。以下先介绍mybatisplus 雪花算法生成的id，并在其基础上对于workId 的生成方式进行改进。

2.1 mybatisplus id 雪花生成算法介绍：
mybatisplus 雪花算法生成的id 由 41位的毫秒时间戳+10机器位（5位机器id+5位数据中心id）+12 的序列号位，进行id 的计算，当统一毫秒内生成的id 超过了12位序列号的最大id 则程序阻塞到下一毫秒后继续进行id 的生成；
雪花id 生成算法：
mybatisplus 中默认通过DefaultIdentifierGenerator 类中的nextId 方法进行id 生成：

package com.baomidou.mybatisplus.core.incrementer;

import com.baomidou.mybatisplus.core.toolkit.Sequence;

public class DefaultIdentifierGenerator implements IdentifierGenerator {
    private final Sequence sequence;

    public DefaultIdentifierGenerator() {
    	// 构造id 生成器（默认使用）
        this.sequence = new Sequence();
    }

    public DefaultIdentifierGenerator(long workerId, long dataCenterId) {
    	// 构造id 生成器（使用自定义的 机器位id 和数据中心id）
        this.sequence = new Sequence(workerId, dataCenterId);
    }

    public DefaultIdentifierGenerator(Sequence sequence) {
        this.sequence = sequence;
    }

    public Long nextId(Object entity) {
    // id 生成
        return this.sequence.nextId();
    }
}

具体的id 交由sequence 进行生成：

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package com.baomidou.mybatisplus.core.toolkit;

import java.lang.management.ManagementFactory;
import java.net.InetAddress;
import java.net.NetworkInterface;
import java.util.concurrent.ThreadLocalRandom;
import org.apache.ibatis.logging.Log;
import org.apache.ibatis.logging.LogFactory;

public class Sequence {
    private static final Log logger = LogFactory.getLog(Sequence.class);
    private final long twepoch = 1288834974657L;
    private final long workerIdBits = 5L;
    private final long datacenterIdBits = 5L;
    private final long maxWorkerId = 31L;
    private final long maxDatacenterId = 31L;
    private final long sequenceBits = 12L;
    private final long workerIdShift = 12L;
    private final long datacenterIdShift = 17L;
    private final long timestampLeftShift = 22L;
    private final long sequenceMask = 4095L;
    private final long workerId;
    private final long datacenterId;
    private long sequence = 0L;
    private long lastTimestamp = -1L;

    public Sequence() {
        this.datacenterId = getDatacenterId(31L);
        this.workerId = getMaxWorkerId(this.datacenterId, 31L);
    }

    public Sequence(long workerId, long datacenterId) {
        Assert.isFalse(workerId > 31L || workerId < 0L, String.format("worker Id can't be greater than %d or less than 0", 31L), new Object[0]);
        Assert.isFalse(datacenterId > 31L || datacenterId < 0L, String.format("datacenter Id can't be greater than %d or less than 0", 31L), new Object[0]);
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    protected static long getMaxWorkerId(long datacenterId, long maxWorkerId) {
        StringBuilder mpid = new StringBuilder();
        mpid.append(datacenterId);
        String name = ManagementFactory.getRuntimeMXBean().getName();
        if (StringUtils.isNotBlank(name)) {
            mpid.append(name.split("@")[0]);
        }

        return (long)(mpid.toString().hashCode() & '\uffff') % (maxWorkerId + 1L);
    }

    protected static long getDatacenterId(long maxDatacenterId) {
        long id = 0L;

        try {
            InetAddress ip = InetAddress.getLocalHost();
            NetworkInterface network = NetworkInterface.getByInetAddress(ip);
            if (network == null) {
                id = 1L;
            } else {
                byte[] mac = network.getHardwareAddress();
                if (null != mac) {
                    id = (255L & (long)mac[mac.length - 2] | 65280L & (long)mac[mac.length - 1] << 8) >> 6;
                    id %= maxDatacenterId + 1L;
                }
            }
        } catch (Exception var7) {
            logger.warn(" getDatacenterId: " + var7.getMessage());
        }

        return id;
    }

    public synchronized long nextId() {
    	// 获取当前系统时间戳
        long timestamp = this.timeGen();
        if (timestamp < this.lastTimestamp) {
        	// 当前系统时间戳 比最近一次生成id 的系统时间戳要小，说明发生时钟回退（系统的时间被修改到了以前的时间）
            long offset = this.lastTimestamp - timestamp;
            if (offset > 5L) {
            	// 如果回退时间大于5毫秒则直接报错
                throw new RuntimeException(String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", offset));
            }
			// 回退的时间比较小，则进行阻塞，offset *2 的毫秒数后 ，重新获取当前时间
            try {
                this.wait(offset << 1);
                timestamp = this.timeGen();
                if (timestamp < this.lastTimestamp) {
                    throw new RuntimeException(String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", offset));
                }
            } catch (Exception var6) {
                throw new RuntimeException(var6);
            }
        }
		// 如果是同一毫秒内进行id 的获取
        if (this.lastTimestamp == timestamp) {
        	// 则通过12位序列化进行id 的获取
            this.sequence = this.sequence + 1L & 4095L;
            if (this.sequence == 0L) {
            	// 如果发现12的序列号已经都时间完毕，则阻塞到下一时间毫秒后在进行id 的获取
                timestamp = this.tilNextMillis(this.lastTimestamp);
            }
        } else {
        	// 如果不是在同一毫秒内则初始一个sequence  （返回 最小值和界限之间的均匀分布值）
            this.sequence = ThreadLocalRandom.current().nextLong(1L, 3L);
        }
		// 更新时间戳
        this.lastTimestamp = timestamp;
        // 64 位 安位组合完成id 的生成
        return timestamp - 1288834974657L << 22 | this.datacenterId << 17 | this.workerId << 12 | this.sequence;
    }

    protected long tilNextMillis(long lastTimestamp) {
        long timestamp;
        for(timestamp = this.timeGen(); timestamp <= lastTimestamp; timestamp = this.timeGen()) {
        }

        return timestamp;
    }

    protected long timeGen() {
        return SystemClock.now();
    }
}

2.2 mybatisPlus 雪花算法workId 位的问题：
从代码中可以看出，同一个服务，在同一时间生成的id 是不会重复的；但是现在系统都是使用了docker 容器化的集群实例部署，如果同一个服务如订单服务，通过集群的方式部署了5个实例，这5个实例所在的机器配置都相同，此时就有可能，id 生成使用的10位机器位id 是相同，如果此时多台服务同时进行insert 就会造成因为id 重复，无法插入；
既然是由于10位的机器位id 相同造成了id 的重复，那么就需要优化来使得多个实例之间获取到的 workId 不相同，从而避免id 的重复问题；

优化方向：通过对每个服务设置机器数要生成的区间段，然后在区间段内轮询生成机器id 和数据中心id 的方式来覆盖，mybatisPlus 默认的机器位id 和数据中心id ，从而使得每台服务机器位id 的不同，以此满足多个服务同时对一张表进行插入操作。



import com.baomidou.mybatisplus.core.incrementer.DefaultIdentifierGenerator;
import org.apache.commons.lang3.StringUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.core.script.DefaultRedisScript;
import org.springframework.stereotype.Component;

import java.util.Collections;

// spring 读取配置并生成对应的bean 到容器中
@Component
public class IdWorkerConfig {
	// 每个spring 服务的名称可以在bootstrap.yml 配置
	//spring:
	//  application:
    //  # 应用名称
    //    name: xxxx
    @Value("${spring.application.name}")
    private String applicationName;
    // 每个服务配置自己的机器数所用的区间段,区间段的大小表示最大可以支持的服务实例个数 
    // 如：snowid:
  	//      start: 200
    //      end: 249
    @Value("${snowid.start}")
    private Integer start;
    @Value("${snowid.end}")
    private Integer end;
	// redis bean
    @Autowired
    public RedisTemplate redisTemplate;


    /**
     * 自定义workerId，保证该应用的ID不会重复
     *
     * @return 新的id生成器
     */
    @Bean
    public DefaultIdentifierGenerator defaultIdentifierGenerator() {
        String MAX_ID = applicationName + "-worker-id";
        // 获取机器数id
        Long maxId = this.getWorkerId(MAX_ID);
        String maxIdStr = Long.toBinaryString(maxId);
        // 将数据补全为10位
        maxIdStr = StringUtils.leftPad(maxIdStr, 10, "0");

        // 从中间进行拆分
        String datacenterStr = maxIdStr.substring(0, 5);
        String workerStr = maxIdStr.substring(5, 10);

        // 将拆分后的数据转换成dataCenterId和workerId
        long dataCenterId = Integer.parseInt(datacenterStr, 2);
        long workerId = Integer.parseInt(workerStr, 2);
        // 覆盖原有mybatis-plus 的机器数和数据中心id
        return new DefaultIdentifierGenerator(workerId, dataCenterId);
    }

    /**
     * LUA脚本获取workerId，保证每个节点获取的workerId都不相同
     *
     * @param key 当前微服务的名称
     * @return workerId
     */
    private Long getWorkerId(String key) {
        Integer workId = getWorkMod(key);

        String luaStr = "local isExist = redis.call('exists', KEYS[1])\n" +
                "if isExist == 1 then\n" +
                "    local workerId = redis.call('get', KEYS[1])\n" +
//                "    workerId = (workerId + 1) % 1024\n" +
                "    workerId = " + workId + "\n" +
                "    redis.call('set', KEYS[1], workerId)\n" +
                "    return workerId\n" +
                "else\n" +
//                "    redis.call('set', KEYS[1], 0)\n" +
//                "    return 0\n" +
                "    redis.call('set', KEYS[1], " + workId + ")\n" +
                "    return " + workId + "\n" +
                "end";
        DefaultRedisScript<Long> redisScript = new DefaultRedisScript<>();
        // 以下两种二选一即可
        redisScript.setScriptText(luaStr);
        //redisScript.setScriptSource(new ResourceScriptSource(new ClassPathResource("redis/redis_worker_id.lua")));
        redisScript.setResultType(Long.class);
        return (Long) redisTemplate.execute(redisScript, Collections.singletonList(key));
    }
	// 本次所需的机器数序号
    private Integer getWorkMod(String key) {
        // 步长
        Integer length = end - start + 1;
        // 起始值
        Integer beginIndex = start;
        // 获取本次要用的workId
        Object obj = redisTemplate.opsForValue().get(key);
        if (null == obj) {
            // 第一次直接设置初始值
            return beginIndex;
        }
        // 获取 workId + 为下一次的机器位
        Integer workId = (Integer) obj;
        workId++;
        // 取模
        Integer mod = workId % length;
        // 要增加的区间值
        Integer add = start - (mod / length) * length;
        mod += add;
        // 返回本次启动，服务所需的机器id
        return mod;

    }


}

然后在对应的实体中将id 修改为通过雪花算法生成：

@TableId(value = "id", type = IdType.ASSIGN_ID )
	private String id;

优化后的方案可以满足全局唯一id的生成，同样支持不同服务同时对相同表进行数据插入；但是请注意，由于机器数的id 是通过轮询段区间的方式进行了生成，加入段区间为0-49 ，相同服务的实例个数为5，则极端情况下，其中一个服务始终没有进行过重启，另外4个服务最多支持（（49-5）/4）11次的启动，如果多于11次则会造成生成的机器数id 和始终没有经过重启服务的机器数id 重复的问题，从而造成全局id 可能重复问题；对于此种情况我们需要在每次服务发版时，都要对所有的服务实例完成发版，让其在重启时可以获取新的机器数序号；

3 参考：
3.1 雪花算法ID重复的分析与在项目中的解决；
3.2 自定义ID生成器；

拽着尾巴的鱼儿

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
分布式全局唯一id实现-1DB步长&Mybatis-plus雪花算法优化

但是请注意，由于机器数的id 是通过轮询段区间的方式进行了生成，加入段区间为0-49 ，相同服务的实例个数为5，则极端情况下，其中一个服务始终没有进行过重启，另外4个服务最多支持（（49-5）/4）11次的启动，如果多于11次则会造成生成的机器数id 和始终没有经过重启服务的机器数id 重复的问题，从而造成全局id 可能重复问题；既然是由于10位的机器位id 相同造成了id 的重复，那么就需要优化来使得多个实例之间获取到的 workId 不相同，从而避免id 的重复问题；
复制链接

扫一扫