RocketMQ源码解析之rebalance

最新推荐文章于 2023-01-17 12:00:54 发布

0xZzzz

最新推荐文章于 2023-01-17 12:00:54 发布

阅读量706

点赞数 1

分类专栏： RocketMQ 文章标签：中间件 RocketMQ rebalance 源码

本文链接：https://blog.csdn.net/heroqiang/article/details/104909583

版权

RocketMQ 专栏收录该内容

17 篇文章 5 订阅

订阅专栏

阅读须知

文章中使用/* */注释的方法会做深入分析

正文

rebalance 是 RocketMQ 消费过程中一个非常重要的流程，可以先从字面简单的理解下这个流程要做的事情。在分析 Consumer 启动流程时，在启动流程的最后一步，调用了一次 rebalance 服务，我们以这里作为入口进行分析：
MQClientInstance：

public void rebalanceImmediately() {
    this.rebalanceService.wakeup();
}

这里的 rebalanceService 是在构造 MQClientInstance 时初始化的，也是在 MQClientInstance 的 start 方法中启动的，RebalanceService 继承了 ServiceThread，wakeup 方法唤醒了等待闭锁的 rebalanceService 线程，我们来看它的 run 方法：
RebalanceService：

public void run() {
    log.info(this.getServiceName() + " service started");
    while (!this.isStopped()) {
        this.waitForRunning(waitInterval);
        /* rebalance */
        this.mqClientFactory.doRebalance();
    }
    log.info(this.getServiceName() + " service end");
}

MQClientInstance：

public void doRebalance() {
    for (Map.Entry<String, MQConsumerInner> entry : this.consumerTable.entrySet()) {
        MQConsumerInner impl = entry.getValue();
        if (impl != null) {
            try {
            	/* 遍历 consumer 列表，逐一 rebalance */
                impl.doRebalance();
            } catch (Throwable e) {
                log.error("doRebalance exception", e);
            }
        }
    }
}

同样在 consumer 启动过程中，会将本次启动的 consumer 添加到指定的 consumerGroup 映射中，也就是添加到 consumerTable 中。
DefaultMQPushConsumerImpl：

public void doRebalance() {
    if (!this.pause) {
        this.rebalanceImpl.doRebalance(this.isConsumeOrderly());
    }
}

RebalanceImpl：

public void doRebalance(final boolean isOrder) {
    Map<String, SubscriptionData> subTable = this.getSubscriptionInner();
    if (subTable != null) {
        for (final Map.Entry<String, SubscriptionData> entry : subTable.entrySet()) {
            final String topic = entry.getKey();
            try {
            	/* 遍历订阅信息根据 topic 做 rebalance */
                this.rebalanceByTopic(topic, isOrder);
            } catch (Throwable e) {
                if (!topic.startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
                    log.warn("rebalanceByTopic Exception", e);
                }
            }
        }
    }
    this.truncateMessageQueueNotMyTopic();
}

RebalanceImpl：

private void rebalanceByTopic(final String topic, final boolean isOrder) {
    switch (messageModel) {
    	// 广播模式
        case BROADCASTING: {
            Set<MessageQueue> mqSet = this.topicSubscribeInfoTable.get(topic);
            if (mqSet != null) {
            	/* 更新正在处理的队列列表 */
                boolean changed = this.updateProcessQueueTableInRebalance(topic, mqSet, isOrder);
                if (changed) {
                	// 当 rebalance 发生修改时，应更新订阅的版本以通知 broker
                    this.messageQueueChanged(topic, mqSet, mqSet);
                    log.info("messageQueueChanged {} {} {} {}",
                        consumerGroup,
                        topic,
                        mqSet,
                        mqSet);
                }
            } else {
                log.warn("doRebalance, {}, but the topic[{}] not exist.", consumerGroup, topic);
            }
            break;
        }
        // 集群模式
        case CLUSTERING: {
            Set<MessageQueue> mqSet = this.topicSubscribeInfoTable.get(topic);
            // 根据 topic 和 consumerGroup 获取 consumerId 列表
            List<String> cidAll = this.mQClientFactory.findConsumerIdList(topic, consumerGroup);
            if (null == mqSet) {
                if (!topic.startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
                    log.warn("doRebalance, {}, but the topic[{}] not exist.", consumerGroup, topic);
                }
            }
            if (null == cidAll) {
                log.warn("doRebalance, {} {}, get consumer id list failed", consumerGroup, topic);
            }
            if (mqSet != null && cidAll != null) {
                List<MessageQueue> mqAll = new ArrayList<MessageQueue>();
                mqAll.addAll(mqSet);
                Collections.sort(mqAll);
                Collections.sort(cidAll);
                // consumer 之间的消息分配策略算法，默认为 AllocateMessageQueueAveragely
                AllocateMessageQueueStrategy strategy = this.allocateMessageQueueStrategy;
                List<MessageQueue> allocateResult = null;
                try {
                	/* 执行分配算法 */
                    allocateResult = strategy.allocate(
                        this.consumerGroup,
                        this.mQClientFactory.getClientId(),
                        mqAll,
                        cidAll);
                } catch (Throwable e) {
                    log.error("AllocateMessageQueueStrategy.allocate Exception. allocateMessageQueueStrategyName={}", strategy.getName(),
                        e);
                    return;
                }
                Set<MessageQueue> allocateResultSet = new HashSet<MessageQueue>();
                if (allocateResult != null) {
                    allocateResultSet.addAll(allocateResult);
                }
                /* 更新正在处理的队列列表 */
                boolean changed = this.updateProcessQueueTableInRebalance(topic, allocateResultSet, isOrder);
                if (changed) {
                    log.info(
                        "rebalanced result changed. allocateMessageQueueStrategyName={}, group={}, topic={}, clientId={}, mqAllSize={}, cidAllSize={}, rebalanceResultSize={}, rebalanceResultSet={}",
                        strategy.getName(), consumerGroup, topic, this.mQClientFactory.getClientId(), mqSet.size(), cidAll.size(),
                        allocateResultSet.size(), allocateResultSet);
                    // 当 rebalance 发生修改时，应更新订阅的版本以通知 broker
                    this.messageQueueChanged(topic, mqSet, allocateResultSet);
                }
            }
            break;
        }
        default:
            break;
    }
}

这里针对两种模式有不同的处理方式：广播模式，每个消费者都能收到 topic 下的所有队列，为消费者分配的队列集合为全量的集合；集群模式，会获取 topic 下的所有队列，并从 broker 获取该 topic 下所有 consumerId 列表，然后根据分配策略获取到 consumerGroup 下该消费者应该分配到的队列集合。即集群模式，每个消费者分到的队列列表由分配策略来分配。
RebalanceImpl：

private boolean updateProcessQueueTableInRebalance(final String topic, final Set<MessageQueue> mqSet,
    final boolean isOrder) {
    boolean changed = false;
    Iterator<Entry<MessageQueue, ProcessQueue>> it = this.processQueueTable.entrySet().iterator();
    while (it.hasNext()) {
        Entry<MessageQueue, ProcessQueue> next = it.next();
        MessageQueue mq = next.getKey();
        ProcessQueue pq = next.getValue();
        if (mq.getTopic().equals(topic)) {
        	// 找到不属于当前 consumer 实例的队列或者已经过期的队列
            if (!mqSet.contains(mq)) {
                pq.setDropped(true);
                // 子类实现移除不需要的消息队列
                if (this.removeUnnecessaryMessageQueue(mq, pq)) {
                    it.remove();
                    changed = true;
                    log.info("doRebalance, {}, remove unnecessary mq, {}", consumerGroup, mq);
                }
            } else if (pq.isPullExpired()) {
                switch (this.consumeType()) {
                    case CONSUME_ACTIVELY:
                        break;
                    case CONSUME_PASSIVELY:
                        pq.setDropped(true);
                        // 子类实现移除不需要的消息队列
                        if (this.removeUnnecessaryMessageQueue(mq, pq)) {
                            it.remove();
                            changed = true;
                            log.error("[BUG]doRebalance, {}, remove unnecessary mq, {}, because pull is pause, so try to fixed it",
                                consumerGroup, mq);
                        }
                        break;
                    default:
                        break;
                }
            }
        }
    }
    List<PullRequest> pullRequestList = new ArrayList<PullRequest>();
    // 遍历处理 rebalance 后新增的队列
    for (MessageQueue mq : mqSet) {
    	// 
        if (!this.processQueueTable.containsKey(mq)) {
        	// 顺序消息要尝试加锁
            if (isOrder && !this.lock(mq)) {
                log.warn("doRebalance, {}, add a new mq failed, {}, because lock failed", consumerGroup, mq);
                continue;
            }
            this.removeDirtyOffset(mq);
            ProcessQueue pq = new ProcessQueue();
            /* 计算拉取位置 */
            long nextOffset = this.computePullFromWhere(mq);
            if (nextOffset >= 0) {
                ProcessQueue pre = this.processQueueTable.putIfAbsent(mq, pq);
                if (pre != null) {
                    log.info("doRebalance, {}, mq already exists, {}", consumerGroup, mq);
                } else {
                    log.info("doRebalance, {}, add a new mq, {}", consumerGroup, mq);
                    PullRequest pullRequest = new PullRequest();
                    pullRequest.setConsumerGroup(consumerGroup);
                    pullRequest.setNextOffset(nextOffset);
                    pullRequest.setMessageQueue(mq);
                    pullRequest.setProcessQueue(pq);
                    pullRequestList.add(pullRequest);
                    changed = true;
                }
            } else {
                log.warn("doRebalance, {}, add new mq failed, {}", consumerGroup, mq);
            }
        }
    }
    /* 转发拉取请求 */
    this.dispatchPullRequest(pullRequestList);
    return changed;
}

RebalancePushImpl：

public long computePullFromWhere(MessageQueue mq) {
    long result = -1;
    // 默认为 CONSUME_FROM_LAST_OFFSET，从最后的偏移量开始消费
    final ConsumeFromWhere consumeFromWhere = this.defaultMQPushConsumerImpl.getDefaultMQPushConsumer().getConsumeFromWhere();
    final OffsetStore offsetStore = this.defaultMQPushConsumerImpl.getOffsetStore();
    switch (consumeFromWhere) {
        case CONSUME_FROM_LAST_OFFSET_AND_FROM_MIN_WHEN_BOOT_FIRST:
        case CONSUME_FROM_MIN_OFFSET:
        case CONSUME_FROM_MAX_OFFSET:
        case CONSUME_FROM_LAST_OFFSET: {
            long lastOffset = offsetStore.readOffset(mq, ReadOffsetType.READ_FROM_STORE);
            if (lastOffset >= 0) {
                result = lastOffset;
            }
            // 第一次拉取，没有偏移量
            else if (-1 == lastOffset) {
            	// 重试topic
                if (mq.getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
                    result = 0L;
                } else {
                    try {
                    	// 发送 GET_MAX_OFFSET 命令，远程调用获取队列的最大偏移量，处理器 为 AdminBrokerProcessor
                        result = this.mQClientFactory.getMQAdminImpl().maxOffset(mq);
                    } catch (MQClientException e) {
                        result = -1;
                    }
                }
            } else {
                result = -1;
            }
            break;
        }
        case CONSUME_FROM_FIRST_OFFSET: {
            long lastOffset = offsetStore.readOffset(mq, ReadOffsetType.READ_FROM_STORE);
            if (lastOffset >= 0) {
                result = lastOffset;
            } else if (-1 == lastOffset) {
                result = 0L;
            } else {
                result = -1;
            }
            break;
        }
        case CONSUME_FROM_TIMESTAMP: {
            long lastOffset = offsetStore.readOffset(mq, ReadOffsetType.READ_FROM_STORE);
            if (lastOffset >= 0) {
                result = lastOffset;
            } else if (-1 == lastOffset) {
                if (mq.getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
                    try {
                        result = this.mQClientFactory.getMQAdminImpl().maxOffset(mq);
                    } catch (MQClientException e) {
                        result = -1;
                    }
                } else {
                    try {
                        long timestamp = UtilAll.parseDate(this.defaultMQPushConsumerImpl.getDefaultMQPushConsumer().getConsumeTimestamp(),
                            UtilAll.YYYYMMDDHHMMSS).getTime();
                        result = this.mQClientFactory.getMQAdminImpl().searchOffset(mq, timestamp);
                    } catch (MQClientException e) {
                        result = -1;
                    }
                }
            } else {
                result = -1;
            }
            break;
        }
        default:
            break;
    }
    return result;
}

RebalancePushImpl：

public void dispatchPullRequest(List<PullRequest> pullRequestList) {
    for (PullRequest pullRequest : pullRequestList) {
        this.defaultMQPushConsumerImpl.executePullRequestImmediately(pullRequest);
        log.info("doRebalance, {}, add a new pull request {}", consumerGroup, pullRequest);
    }
}

DefaultMQPushConsumerImpl：

public void executePullRequestImmediately(final PullRequest pullRequest) {
    this.mQClientFactory.getPullMessageService().executePullRequestImmediately(pullRequest);
}

PullMessageService：

public void executePullRequestImmediately(final PullRequest pullRequest) {
    try {
        this.pullRequestQueue.put(pullRequest);
    } catch (InterruptedException e) {
        log.error("executePullRequestImmediately pullRequestQueue.put", e);
    }
}

这里我们看到最终是将 pullRequest 对象放到了队列中，拉取消息的流程我们将在后续的文章中进行分析。下面我们来看分配策略算法，我们以默认的平均哈希队列算法 AllocateMessageQueueAveragely 来进行分析：

public List<MessageQueue> allocate(String consumerGroup, String currentCID, List<MessageQueue> mqAll,
    List<String> cidAll) {
    if (currentCID == null || currentCID.length() < 1) {
        throw new IllegalArgumentException("currentCID is empty");
    }
    if (mqAll == null || mqAll.isEmpty()) {
        throw new IllegalArgumentException("mqAll is null or mqAll empty");
    }
    if (cidAll == null || cidAll.isEmpty()) {
        throw new IllegalArgumentException("cidAll is null or cidAll empty");
    }
    List<MessageQueue> result = new ArrayList<MessageQueue>();
    if (!cidAll.contains(currentCID)) {
        log.info("[BUG] ConsumerGroup: {} The consumerId: {} not in cidAll: {}",
            consumerGroup,
            currentCID,
            cidAll);
        return result;
    }
    // 获取当前 consumerId 在所有消费者列表中的下标
    int index = cidAll.indexOf(currentCID);
    // 结算的结果就是每个消费者平均分一个队列后还剩多少个
    int mod = mqAll.size() % cidAll.size();
    // 计算平均分配的数量
    // 如果队列的数量小于等于消费者的数量，则每个消费者分配一个队列，有可能存在有的消费者分配不到队列
    // 否则每个消费者均分，mod > 0 也就是不能整除的情况，当消费者的下标小于 mod 时，多分一个，也就是平均值 + 1
    int averageSize =
        mqAll.size() <= cidAll.size() ? 1 : (mod > 0 && index < mod ? mqAll.size() / cidAll.size()
            + 1 : mqAll.size() / cidAll.size());
    // 计算当前消费者获得分配队列的开始位置，布尔表达式成立则为上面多分一个队列的情况
    // 这时使用下标乘以平均分配的数量作为分配队列的开始位置，前面的留给下标更小的消费者
    // 否则使用下标乘以平均分配的数量再加上 mod 作为分配队列的开始位置，前面的留给下标更小的消费者
    int startIndex = (mod > 0 && index < mod) ? index * averageSize : index * averageSize + mod;
    // 计算分配到的队列的个数，最后一组如果剩余个数不足 averageSize 只能分配到剩余的个数
    int range = Math.min(averageSize, mqAll.size() - startIndex);
    // 计算分配的每个队列的下标
    for (int i = 0; i < range; i++) {
        result.add(mqAll.get((startIndex + i) % mqAll.size()));
    }
    return result;
}