RabbitMQ本地消息表扫描补偿简单设计

最新推荐文章于 2024-07-11 20:31:52 发布

手握抽象

最新推荐文章于 2024-07-11 20:31:52 发布

阅读量1.6k

点赞数 40

文章标签： rabbitmq 分布式

本文链接：https://blog.csdn.net/qq_43479493/article/details/137338118

版权

一、大体逻辑

二、需要考虑的问题

在实现该功能的时候，需要注意一下几个问题：

重试策略：如何设计一个合理的重试机制以防止服务过载，并如何处理达到最大重试次数的消息？
并发控制：如何避免在多个定时任务实例同时运行时发送重复消息？
消息依赖与顺序：如果消息之间存在依赖关系或需要保持顺序，这些要求在重试时如何得到满足？
死信处理：如何处理多次发送失败的消息，比如移动到死信队列？
时间管理：如何处理消息时效性问题，包括跨时区的时间差异？
资源调整：定时任务在系统负载变化时，如何相应调整资源使用？
性能与优化：如何优化数据库查询性能并减少对数据库的压力？

三、问题解决思路

3.1、重试策略

如何设计一个合理的重试机制以防止服务过载，并如何处理达到最大重试次数的消息？

（1）指数退避算法

定义一个初始的重试间隔（比如1秒）。每次重试失败后，将重试间隔翻倍（1秒、2秒、4秒、8秒...）。这种方法被称为指数退避，因为重试间隔按照指数增长。

（2）限制最大重试间隔
通常讲，我们会设置一个最大的重试间隔（比如60秒）。这可以防止重试间隔过长，从而导致系统响应慢。

（3）限制最大重试次数

同理

（4）引入抖动

在每次计算得到的重试间隔上增加一个相对较小的随机量。这样可以避免多个失败的重试请求同时发生，降低了过载的风险。

常见的抖动实现包括“全抖动”和“装饰抖动”两种。

"全抖动"是将每次的重试间隔设置在0, 指数退避值之间的随机值。

“装饰抖动”则是在指数退避值周围的一个范围内选择一个随机值。

3.2、并发控制

如何避免在多个定时任务实例同时运行时发送重复消息？

涉及到多实例并发问题，首先想到的就应该是分布式锁。

（1）基于数据库的锁

使用数据库行锁或表锁来实现分布式锁。
可以通过在表中插入或更新特定的记录来实现锁的逻辑。
缺点：数据库性能瓶颈可能会影响分布式锁的性能。

（2）基于Redis的锁

使用 Redis 这类 In-Memory 数据库可以通过 SETNX 命令或 RedLock（Redis官方推荐的一种算法）实现分布式锁。
通常使用超时时间以避免死锁的情况发生。
Redisson 是一个Java实现的客户端，提供了简便的分布式锁API。

（3）基于ZooKeeper的锁

ZooKeeper 可以通过它的临时顺序节点来实现分布式锁。
当一个进程尝试获得锁时，它在特定的锁节点路径下创建一个临时顺序节点。
然后，这个进程查找具有最小顺序值的节点，如果是自己创建的节点就获取锁，否则等待它前面节点的删除事件。

（4）基于etcd的锁

etcd 是一个分布式键值存储系统，类似于 ZooKeeper。
它提供了更现代的API和一致性保证来实现分布式锁。
利用其提供的比较交换（compare-and-swap）特性实现锁逻辑。

（5）基于Consul的锁

Consul 也提供了分布式锁的功能，通过它的session机制。
创建一个 session 并将其与一个 key 关联，可以通过监视该 key 来实现锁的功能。

（6）基于Chubby或Apache Curator这类客户端库的锁

Chubby 是谷歌开发的一个分布式锁服务。
Apache Curator 是一个 ZooKeeper 客户端库，提供了更高级的API，包括分布式锁的实现。

3.3、消息依赖与顺序

暂时未补充，以后会更新上来。。。

3.4、死信处理

暂时未补充，以后会更新上来。。。

3.5、时间管理

暂时未补充，以后会更新上来。。。

3.6、资源调整

暂时未补充，以后会更新上来。。。

3.7、性能与优化

暂时未补充，以后会更新上来。。。

四、代码展示

4.1、消息表创建

CREATE TABLE `transactional_message`
(
    id                  BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    create_time         DATETIME        NOT NULL DEFAULT CURRENT_TIMESTAMP,
    edit_time           DATETIME        NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    deleted             TINYINT         NOT NULL DEFAULT 0,
    current_retry_times TINYINT         NOT NULL DEFAULT 0 COMMENT '当前重试次数',
    max_retry_times     TINYINT         NOT NULL DEFAULT 5 COMMENT '最大重试次数',
    next_schedule_time  DATETIME        NOT NULL COMMENT '下一次调度时间',
    message_status      TINYINT         NOT NULL DEFAULT 0 COMMENT '消息状态',
    init_backoff        BIGINT UNSIGNED NOT NULL DEFAULT 10 COMMENT '退避初始化值,单位为秒',
    backoff_factor      TINYINT         NOT NULL DEFAULT 2 COMMENT '退避因子(也就是指数)',
    tx_no               VARCHAR(64)     NOT NULL COMMENT '唯一事务号'，
    tx_content          TEXT            NOT NULL COMMENT '消息内容'
    ...
    INDEX idx_queue_name (queue_name),
    INDEX idx_create_time (create_time),
    INDEX idx_next_schedule_time (next_schedule_time),
    INDEX idx_business_key (business_key)
) COMMENT '事务消息表';

基础事务消息表，可以在此基础上增加扩展字段。

4.2、定时扫描程序

@Slf4j
@Configuration
@EnableScheduling
public class ScheduleJobAutoConfiguration {

    @Resource
    RedissonClient redissonClient;

    @Resource
    TransactionalMessageManagementService messageManagementService;

    @Scheduled(fixedDelay = 10000)
    public void transactionalMessageCompensationTask() throws Exception {
        RLock lock = redissonClient.getLock("transactionalMessageCompensationTask");
        // 等待时间5秒,预期300秒执行完毕,这两个值需要按照实际场景定制
        boolean tryLock = lock.tryLock(5, 300, TimeUnit.SECONDS);
        if (tryLock) {
            try {
                long start = System.currentTimeMillis();
                log.info("开始执行事务消息推送补偿定时任务...");
                messageManagementService.processPendingCompensationRecords();
                long end = System.currentTimeMillis();
                long delta = end - start;
                // 以防锁过早释放
                if (delta < 5000) {
                    Thread.sleep(5000 - delta);
                }
                log.info("执行事务消息推送补偿定时任务完毕,耗时:{} ms...", end - start);
            } finally {
                lock.unlock();
            }
        }
    }

}

4.3、事务消息操作类

@Slf4j
@Service
public class TransactionalMessageManagementService {
    @Resource
    RabbitTemplate rabbitTemplate;
    @Resource
    TransactionalMessageMapper messageMapper;
    @Resource
    TransactionalMessageContentMapper messageContentMapper;

    @Resource
    ConfirmService confirmService;

    private static final LocalDateTime END = LocalDateTime.of(2999, 1, 1, 0, 0, 0);
    private static final long DEFAULT_INIT_BACKOFF = 10L;
    private static final int DEFAULT_BACKOFF_FACTOR = 2;
    private static final int DEFAULT_MAX_RETRY_TIMES = 5;
    private static final int LIMIT = 100;


    public void saveTransactionalMessageRecord(TransactionalMessage record, String content) {
        record.setMessageStatus(TxMessageStatus.PENDING.getStatus());
        record.setNextScheduleTime(calculateNextScheduleTime(LocalDateTime.now(), DEFAULT_INIT_BACKOFF, DEFAULT_BACKOFF_FACTOR, 0));
        record.setCurrentRetryTimes(0);
        record.setInitBackoff(DEFAULT_INIT_BACKOFF);
        record.setBackoffFactor(DEFAULT_BACKOFF_FACTOR);
        record.setMaxRetryTimes(DEFAULT_MAX_RETRY_TIMES);
        messageMapper.insert(record);

        TransactionalMessageContent messageContent = new TransactionalMessageContent();
        messageContent.setContent(content);
        messageContent.setMessageId(record.getId());
        messageContentMapper.insert(messageContent);

    }

    /**
     * 计算下一次执行时间
     *
     * @param base          基础时间
     * @param initBackoff   退避基准值
     * @param backoffFactor 退避指数
     * @param round         轮数
     * @return LocalDateTime
     */
    private LocalDateTime calculateNextScheduleTime(LocalDateTime base, long initBackoff, long backoffFactor, long round) {
        double delta = initBackoff * Math.pow(backoffFactor, round);
        return base.plusSeconds((long) delta);
    }

    public void sendMessageSync(TransactionalMessage record, String content) {
        try {
            rabbitTemplate.setConfirmCallback(confirmService);
            rabbitTemplate.setReturnsCallback(confirmService);

            rabbitTemplate.convertAndSend(record.getExchangeName(), record.getRoutingKey(), content,
                    new CorrelationData(record.getTxNo()));
            // 标记成功
            markSuccess(record);
        } catch (Exception e) {
            // 标记失败
            markFail(record, e);
        }
    }

    private void markSuccess(TransactionalMessage record) {
        log.info("发送消息成功,目标队列:{}", record.getQueueName());
        // 标记下一次执行时间为最大值
        record.setNextScheduleTime(END);
        record.setCurrentRetryTimes(record.getCurrentRetryTimes().compareTo(record.getMaxRetryTimes()) >= 0 ?
                record.getMaxRetryTimes() : record.getCurrentRetryTimes() + 1);
        record.setMessageStatus(TxMessageStatus.SUCCESS.getStatus());
        record.setEditTime(LocalDateTime.now());
        messageMapper.updateStatusSelective(record);
    }

    private void markFail(TransactionalMessage record, Exception e) {
        log.error("发送消息失败,目标队列:{}", record.getQueueName(), e);
        record.setCurrentRetryTimes(record.getCurrentRetryTimes().compareTo(record.getMaxRetryTimes()) >= 0 ?
                record.getMaxRetryTimes() : record.getCurrentRetryTimes() + 1);
        // 计算下一次的执行时间
        LocalDateTime nextScheduleTime = calculateNextScheduleTime(
                record.getNextScheduleTime(),
                record.getInitBackoff(),
                record.getBackoffFactor(),
                record.getCurrentRetryTimes()
        );
        record.setNextScheduleTime(nextScheduleTime);
        record.setMessageStatus(TxMessageStatus.FAIL.getStatus());
        record.setEditTime(LocalDateTime.now());
        messageMapper.updateStatusSelective(record);
    }

    public void processPendingCompensationRecords() {
        // 这里预防把刚保存的消息也推送了
        LocalDateTime max = LocalDateTime.now().plusSeconds(-DEFAULT_INIT_BACKOFF);
        LocalDateTime min = max.plusHours(-1);
        Map<Long, TransactionalMessage> collect = messageMapper.queryPendingCompensationRecords(min, max, TxMessageStatus.SUCCESS.getStatus(), LIMIT)
                .stream()
                .collect(Collectors.toMap(TransactionalMessage::getId, x -> x));
        if (!collect.isEmpty()) {
            StringJoiner joiner = new StringJoiner(",", "(", ")");
            collect.keySet().forEach(x -> joiner.add(x.toString()));
            messageContentMapper.queryByMessageIds(joiner.toString())
                    .forEach(item -> {
                        TransactionalMessage message = collect.get(item.getMessageId());
                        sendMessageSync(message, item.getContent());
                    });
        }
    }
}

4.4、附带两个SQL

<update id="updateStatusSelective" parameterType="entity.TransactionalMessage">
        UPDATE transactional_message SET
        <if test="currentRetryTimes != null">
            current_retry_times = #{currentRetryTimes},
        </if>
        <if test="nextScheduleTime != null">
            next_schedule_time = #{nextScheduleTime},
        </if>
        <if test="messageStatus != null">
            message_status = #{messageStatus}
        </if>
        WHERE id = #{id}
    </update>
    <select id="queryPendingCompensationRecords" resultType="entity.TransactionalMessage">
        SELECT
            <include refid="Base_Column_List"/>
            FROM t_transactional_message
        WHERE next_schedule_time &gt;= #{minScheduleTime}
        AND next_schedule_time &lt;= #{maxScheduleTime}
        AND message_status != #{status}
        AND current_retry_times &lt; max_retry_times
        LIMIT #{limit}
    </select>

五、案例源码

Github：rabbitmq-local-message-table

博客解释不完全，后续会继续更新该文章，请持续关注。

手握抽象

关注

40
点赞
踩
39

收藏

觉得还不错? 一键收藏
0
评论
RabbitMQ本地消息表扫描补偿简单设计

每次重试失败后，将重试间隔翻倍（1秒、2秒、4秒、8秒...）。在每次计算得到的重试间隔上增加一个相对较小的随机量。这样可以避免多个失败的重试请求同时发生，降低了过载的风险。通常讲，我们会设置一个最大的重试间隔（比如60秒）。"全抖动"是将每次的重试间隔设置在0, 指数退避值之间的随机值。“装饰抖动”则是在指数退避值周围的一个范围内选择一个随机值。暂时未补充，以后会更新上来。暂时未补充，以后会更新上来。暂时未补充，以后会更新上来。暂时未补充，以后会更新上来。暂时未补充，以后会更新上来。常见的抖动实现包括“
复制链接

扫一扫