我们直接Consumer启动的源码开始看起,先看Consumer的启动方法start()
public synchronized void start() throws MQClientException {
switch (this.serviceState) {
//省略代码
//srart方法
mQClientFactory.start();
log.info("the consumer [{}] start OK.", this.defaultMQPushConsumer.getConsumerGroup());
this.serviceState = ServiceState.RUNNING;
break;
case RUNNING:
case START_FAILED:
case SHUTDOWN_ALREADY:
throw new MQClientException("The PushConsumer service state not OK, maybe started once, "
+ this.serviceState
+ FAQUrl.suggestTodo(FAQUrl.CLIENT_SERVICE_NOT_OK),
null);
default:
break;
}
this.updateTopicSubscribeInfoWhenSubscriptionChanged();
this.mQClientFactory.checkClientInBroker();
this.mQClientFactory.sendHeartbeatToAllBrokerWithLock();
//立即调用reblance方法
this.mQClientFactory.rebalanceImmediately();
}
Start方法挺复杂,我们的目的是Consumer的负载均衡,所以这里主要两个点针对两个地方:
- mQClientFactory.start()
- this.mQClientFactory.rebalanceImmediately();
先看mQClientFactory.start()方法
public void start() throws MQClientException {
synchronized (this) {
switch (this.serviceState) {
case CREATE_JUST:
this.serviceState = ServiceState.START_FAILED;
// If not specified,looking address from name server
if (null == this.clientConfig.getNamesrvAddr()) {
this.mQClientAPIImpl.fetchNameServerAddr();
}
// Start request-response channel
this.mQClientAPIImpl.start();
// Start various schedule tasks
this.startScheduledTask();
// Start pull service
this.pullMessageService.start();
// Start rebalance service
this.rebalanceService.start();
// Start push service
this.defaultMQProducer.getDefaultMQProducerImpl().start(false);
log.info("the client factory [{}] start OK", this.clientId);
this.serviceState = ServiceState.RUNNING;
break;
case RUNNING:
break;
case SHUTDOWN_ALREADY:
break;
case START_FAILED:
throw new MQClientException("The Factory object[" + this.getClientId() + "] has been created before, and failed.", null);
default:
break;
}
}
}
我们注意到其中的关键点Start rebalance service
// Start rebalance service
this.rebalanceService.start();
先看下他的父类ServiceThread
public void start() {
this.thread.start();
}
最后来看看RebalanceService.run()
@Override
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
this.waitForRunning(waitInterval);
this.mqClientFactory.doRebalance();
}
log.info(this.getServiceName() + " service end");
}
可以看到,这里有一个线程无限循环调用this.mqClientFactory.doRebalance();方法,调用之前,this.waitForRunning(waitInterval);方法先等待20s,也就是说没每20s进行一次重新负载均衡,以便适应集群汇总Consumer的新增或者删除
上面还提到第二个点, this.mQClientFactory.rebalanceImmediately();
这段代码负责Consumer启动的时候就先进行一次负载均衡至此doRebalance方法的的调用时机已经很清晰了
接下来追进doRebalance方法是怎么做负载均衡的
public void doRebalance() {
for (Map.Entry<String, MQConsumerInner> entry : this.consumerTable.entrySet()) {
MQConsumerInner impl = entry.getValue();
if (impl != null) {
try {
impl.doRebalance();
} catch (Throwable e) {
log.error("doRebalance exception", e);
}
}
}
}
@Override
public void doRebalance() {
if (this.rebalanceImpl != null) {
this.rebalanceImpl.doRebalance(false);
}
}
继续调用doRebalance()方法
public void doRebalance(final boolean isOrder) {
Map<String, SubscriptionData> subTable = this.getSubscriptionInner();
if (subTable != null) {
for (final Map.Entry<String, SubscriptionData> entry : subTable.entrySet()) {
final String topic = entry.getKey();
try {
this.rebalanceByTopic(topic, isOrder);
} catch (Throwable e) {
if (!topic.startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
log.warn("rebalanceByTopic Exception", e);
}
}
}
}
this.truncateMessageQueueNotMyTopic();
}
这里也很清晰,需要继续查看rebalanceByTopic方法
private void rebalanceByTopic(final String topic, final boolean isOrder) {
switch (messageModel) {
case BROADCASTING: {
Set<MessageQueue> mqSet = this.topicSubscribeInfoTable.get(topic);
if (mqSet != null) {
boolean changed = this.updateProcessQueueTableInRebalance(topic, mqSet, isOrder);
if (changed) {
this.messageQueueChanged(topic, mqSet, mqSet);
log.info("messageQueueChanged {} {} {} {}",
consumerGroup,
topic,
mqSet,
mqSet);
}
} else {
log.warn("doRebalance, {}, but the topic[{}] not exist.", consumerGroup, topic);
}
break;
}
case CLUSTERING: {
Set<MessageQueue> mqSet = this.topicSubscribeInfoTable.get(topic);
List<String> cidAll = this.mQClientFactory.findConsumerIdList(topic, consumerGroup);
if (null == mqSet) {
if (!topic.startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
log.warn("doRebalance, {}, but the topic[{}] not exist.", consumerGroup, topic);
}
}
if (null == cidAll) {
log.warn("doRebalance, {} {}, get consumer id list failed", consumerGroup, topic);
}
if (mqSet != null && cidAll != null) {
List<MessageQueue> mqAll = new ArrayList<MessageQueue>();
mqAll.addAll(mqSet);
Collections.sort(mqAll);
Collections.sort(cidAll);
AllocateMessageQueueStrategy strategy = this.allocateMessageQueueStrategy;
List<MessageQueue> allocateResult = null;
try {
allocateResult = strategy.allocate(
this.consumerGroup,
this.mQClientFactory.getClientId(),
mqAll,
cidAll);
} catch (Throwable e) {
log.error("AllocateMessageQueueStrategy.allocate Exception. allocateMessageQueueStrategyName={}", strategy.getName(),
e);
return;
}
Set<MessageQueue> allocateResultSet = new HashSet<MessageQueue>();
if (allocateResult != null) {
allocateResultSet.addAll(allocateResult);
}
boolean changed = this.updateProcessQueueTableInRebalance(topic, allocateResultSet, isOrder);
if (changed) {
log.info(
"rebalanced result changed. allocateMessageQueueStrategyName={}, group={}, topic={}, clientId={}, mqAllSize={}, cidAllSize={}, rebalanceResultSize={}, rebalanceResultSet={}",
strategy.getName(), consumerGroup, topic, this.mQClientFactory.getClientId(), mqSet.size(), cidAll.size(),
allocateResultSet.size(), allocateResultSet);
this.messageQueueChanged(topic, mqSet, allocateResultSet);
}
}
break;
}
default:
break;
}
}
我们直接看集群的时候
- Set mqSet = this.topicSubscribeInfoTable.get(topic);
根据topic获得这个topic的messageQueue集合 - List cidAll = this.mQClientFactory.findConsumerIdList(topic, consumerGroup);
根据topic和group或者所有的Consumer实例
Collections.sort(mqAll); Collections.sort(cidAll);
排序消息队列和消费者数组,因为是在进行分配队列,排序后,各Client的顺序才能保持一致AllocateMessageQueueStrategy strategy = this.allocateMessageQueueStrategy;
默认选择的是org.apache.rocketmq.client.consumer.rebalance.AllocateMessageQueueAveragelyallocateResult = strategy.allocate( this.consumerGroup, this.mQClientFactory.getClientId(), mqAll, cidAll);
进行分配
那么就看看allocate是怎么使用策略进行负载均衡的
int index = cidAll.indexOf(currentCID);
//最简单的算法,取模
//比如mqAll.size()是4,代表4个queue。cidAll.size()是5,代表一个consumer,那么mod就是4
int mod = mqAll.size() % cidAll.size();
// 平均分配
// 4 <= 5 ? 1 : (4 > 0 && 1 < 4 ? 4 / 5 + 1 : 4 / 5)
int averageSize =
mqAll.size() <= cidAll.size() ? 1 : (mod > 0 && index < mod ? mqAll.size() / cidAll.size()
+ 1 : mqAll.size() / cidAll.size());
// 有余数的情况下,[0, mod) 平分余数,即每consumer多分配一个节点;第index开始,跳过前mod余数。
int startIndex = (mod > 0 && index < mod) ? index * averageSize : index * averageSize + mod;
// 分配队列数量。之所以要Math.min()的原因是,mqAll.size() <= cidAll.size(),部分consumer分配不到消息队列。
int range = Math.min(averageSize, mqAll.size() - startIndex);
for (int i = 0; i < range; i++) {
result.add(mqAll.get((startIndex + i) % mqAll.size()));
}
到这里,我们得知一个结论:
queue个数大于Consumer个数, 那么Consumer会平均分配queue
。queue个数小于Consumer个数,那么会有Consumer闲置,就是浪费掉了,其余Consumer平均分配到queue上
。一个queue只会被同一个groop下的一个consumer消费
queue选择算法也就是负载均衡算法有很多种可选择:
- AllocateMessageQueueAveragely:是前面讲的默认方式
- AllocateMessageQueueAveragelyByCircle:每个消费者依次消费一个partition,环状。
- AllocateMessageQueueConsistentHash:一致性hash算法
- AllocateMachineRoomNearby:就近元则,离的近的消费
- AllocateMessageQueueByConfig:是通过配置的方式