RocketMQ系列之顺序消息原理解析（八）

sharedCode

已于 2022-05-31 13:49:19 修改

阅读量400

点赞数

分类专栏：消息中间件rocketMq源码解析文章标签： rocketMq 源码解析顺序消息

于 2020-11-10 09:07:25 首次发布

本文链接：https://blog.csdn.net/u012394095/article/details/109591733

版权

消息中间件rocketMq源码解析专栏收录该内容

14 篇文章 2 订阅

订阅专栏

前言

本文是个系列文章，源码的阅读很多在前文中讲过，很多时候不会从源码入口开始讲起，本文主要讲的是普通消息的客户端线程处理模型，以push模式为准

rocketMq的顺序消息，通过两种机制来保证消息只会被一个线程处理

borker维护全局队列锁
客户端内维护了局部队列锁，保证在一个JVM应用里面，这个队列只会被一个线程消费。

我们回顾一下rocketMQ的顺序消息的大致原理，通过队列内部的offset的有序性来保证单个队列里面的有序性，只需要同一个队列只有一个消费者一个线程消费，那么就可以保证消息是有序的被消费的。

接下来我们带着几个问题来看源码

broker的全局锁是如何维护的？维护全局锁有什么作用
当新增消费者或者减少消费者的时候，如何保证全局顺序消费。

broker全局锁

ConsumeMessageOrderlyService 该类在初始化的时候会生成一个定时器，这个

public ConsumeMessageOrderlyService(DefaultMQPushConsumerImpl defaultMQPushConsumerImpl,
       // ....省略代码
        this.scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(new ThreadFactoryImpl("ConsumeMessageScheduledThread_"));
    }
                                    
// 启动
 public void start() {
        if (MessageModel.CLUSTERING.equals(ConsumeMessageOrderlyService.
                                           this.defaultMQPushConsumerImpl.messageModel())) {
            this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    ConsumeMessageOrderlyService.this.lockMQPeriodically();
                }
            }, 1000 * 1, ProcessQueue.REBALANCE_LOCK_INTERVAL, TimeUnit.MILLISECONDS);
        }
    }

上面的代码，初始化了一个定时线程池，然后在ConsumeMessageOrderlyService启动的时候，创建了一个任务，1s执行一次，lockMQPeriodically这个方法就是给当前客户端所消费的所有队列去borker进行上锁。全局锁就是在这里进行上的

lockMQPeriodically

public synchronized void lockMQPeriodically() {
        if (!this.stopped) {
            this.defaultMQPushConsumerImpl.getRebalanceImpl().lockAll();
        }
    }

lockAll

public void lockAll() {
        // 获取borker和队列信息
        HashMap<String, Set<MessageQueue>> brokerMqs = this.buildProcessQueueTableByBrokerName();

        Iterator<Entry<String, Set<MessageQueue>>> it = brokerMqs.entrySet().iterator();
        while (it.hasNext()) {
            Entry<String, Set<MessageQueue>> entry = it.next();
            final String brokerName = entry.getKey();
            final Set<MessageQueue> mqs = entry.getValue();

            if (mqs.isEmpty())
                continue;
            // 获取borker
            FindBrokerResult findBrokerResult = this.mQClientFactory.
              findBrokerAddressInSubscribe(brokerName, MixAll.MASTER_ID, true);
            if (findBrokerResult != null) {
                LockBatchRequestBody requestBody = new LockBatchRequestBody();
                requestBody.setConsumerGroup(this.consumerGroup);
                requestBody.setClientId(this.mQClientFactory.getClientId());
                requestBody.setMqSet(mqs);

                try {
                    // 进行全局锁上锁
                    Set<MessageQueue> lockOKMQSet =
                      this.mQClientFactory.getMQClientAPIImpl().lockBatchMQ(
                      findBrokerResult.getBrokerAddr(), requestBody, 1000);
                  // 设置上锁成功的ProcessQueue 的locked = true
                  for (MessageQueue mq : lockOKMQSet) {
                        ProcessQueue processQueue = this.processQueueTable.get(mq);
                        if (processQueue != null) {
                            if (!processQueue.isLocked()) {
                                log.info("the message queue locked OK, Group: {} {}", this.consumerGroup, mq);
                            }
                            processQueue.setLocked(true);
                            processQueue.setLastLockTimestamp(System.currentTimeMillis());
                        }
                    }
                  // 设置上锁失败的ProcessQueue 的locked = false
                    for (MessageQueue mq : mqs) {
                        if (!lockOKMQSet.contains(mq)) {
                            ProcessQueue processQueue = this.processQueueTable.get(mq);
                            if (processQueue != null) {
                                processQueue.setLocked(false);
                                log.warn("the message queue locked Failed, Group: {} {}", this.consumerGroup, mq);
                            }
                        }
                    }
                } catch (Exception e) {
                    log.error("lockBatchMQ exception, " + mqs, e);
                }
            }
        }
    }

发起队列上锁请求
设置上锁成功的ProcessQueue 的locked = true
设置上锁失败的ProcessQueue 的locked = false

ProcessQueue里面的locked属性在进行顺序消费的时候会频繁的使用到

borker上锁

AdminBrokerProcessor 这个是borker里面的代码入口，RequestCode.LOCK_BATCH_MQ 找到switch里面的这个，进入lockBatchMQ方法

private RemotingCommand lockBatchMQ(ChannelHandlerContext ctx,
        RemotingCommand request) throws RemotingCommandException {
        final RemotingCommand response = RemotingCommand.createResponseCommand(null);
        LockBatchRequestBody requestBody = LockBatchRequestBody.decode(request.getBody(), LockBatchRequestBody.class);
			// 调用全局锁管理类进行上锁，将上锁成功的队列返回
        Set<MessageQueue> lockOKMQSet = this.brokerController.getRebalanceLockManager()
          .tryLockBatch(
            requestBody.getConsumerGroup(),
            requestBody.getMqSet(),
            requestBody.getClientId());
				// 返回上锁成功的队列
        LockBatchResponseBody responseBody = new LockBatchResponseBody();
        responseBody.setLockOKMQSet(lockOKMQSet);

        response.setBody(responseBody.encode());
        response.setCode(ResponseCode.SUCCESS);
        response.setRemark(null);
        return response;
    }

步骤说明：

调用全局锁管理类进行上锁，将上锁成功的队列返回
返回上锁成功的队列

RebalanceLockManager

public Set<MessageQueue> tryLockBatch(final String group, final Set<MessageQueue> mqs,
        final String clientId) {
        // 已经成功上锁的队列
        Set<MessageQueue> lockedMqs = new HashSet<MessageQueue>(mqs.size());
        // 没有上锁的队列
        Set<MessageQueue> notLockedMqs = new HashSet<MessageQueue>(mqs.size());

        for (MessageQueue mq : mqs) {
            // 判断当前队列，是否被当前的客户端持有，如果已经持有了，那么更新下持有时间和
            if (this.isLocked(group, mq, clientId)) {
                lockedMqs.add(mq);
            } else {
                notLockedMqs.add(mq);
            }
        }
        // 待上锁的队列不为空
        if (!notLockedMqs.isEmpty()) {
            try {
                // 加个锁
                this.lock.lockInterruptibly();
                try {
                    ConcurrentHashMap<MessageQueue, LockEntry> groupValue = this.
                      mqLockTable.get(group);
                    if (null == groupValue) {
                        groupValue = new ConcurrentHashMap<>(32);
                        this.mqLockTable.put(group, groupValue);
                    }
                    for (MessageQueue mq : notLockedMqs) {
                        // 根据队列获取锁信息
                        LockEntry lockEntry = groupValue.get(mq);
                        if (null == lockEntry) {
                            // 当锁为空的时候，表示这个锁没有持有，OK，被当前客户端持有
                            lockEntry = new LockEntry();
                            lockEntry.setClientId(clientId);
                            groupValue.put(mq, lockEntry);
                        }

                        // 锁被当前客户端持有，那么更新下最后的持有时间（这个地方再次判断是否为当前实例持有的原因是因为上面加了锁，有可能在锁等待的时候，上一个线程已经获取了锁，当前线程再次进入的时候，其实锁已经被持有了）
                        if (lockEntry.isLocked(clientId)) {
                            lockEntry.setLastUpdateTimestamp(System.currentTimeMillis());
                            lockedMqs.add(mq);
                            continue;
                        }

                        // 老的客户端ID
                        String oldClientId = lockEntry.getClientId();

                        // 锁是否过期，过期时间为当前时间减最后持有时间，是否大于60秒 ， 也就是60秒没有来续期则自动释放
                        if (lockEntry.isExpired()) {
                            lockEntry.setClientId(clientId);
                            lockEntry.setLastUpdateTimestamp(System.currentTimeMillis());
                            log.warn(
                                "tryLockBatch, message queue lock expired, I got it. Group: {} OldClientId: {} NewClientId: {} {}",
                                group,
                                oldClientId,
                                clientId,
                                mq);
                            lockedMqs.add(mq);
                            continue;
                        }

                        // 获取锁失败。
                        log.warn(
                            "tryLockBatch, message queue locked by other client. Group: {} OtherClientId: {} NewClientId: {} {}",group,oldClientId, clientId,mq);
                    }
                } finally {
                    this.lock.unlock();
                }
            } catch (InterruptedException e) {
                log.error("putMessage exception", e);
            }
        }

        return lockedMqs;
    }

步骤说明：

首先循环判断当前队列，是否被当前的客户端持有，如果已经持有了，那么更新下持有时间，对于当前线程未持有锁的队列，统一放到notLocakedMqs
维护了一个ConcurrentHashMap 用来存储锁，一个双层map，通过消费组和消息队列做为key
循环notLocakedMqs，经过以下的判断，来尝试获取全局锁
1. 在全局锁mqLocakTable里面获取每个队列的全局锁信息，当获取为空时，说明当前队列没有被上锁，则当前客户端获取锁成功
2. 判断锁是否被当前客户端持有，那么更新下最后的持有时间（这个地方再次判断是否为当前实例持有的原因是因为上面加了锁，有可能在锁等待的时候，上一个线程已经获取了锁，当前线程再次进入的时候，其实锁已经被持有了）
3. 判断锁是否过期，过期时间为当前时间减最后持有时间，是否大于60秒，也就是60秒没有来续期则自动释放
4. 以上条件都不满足时，则该队列获取失败

全局锁的文本概念已经讲清楚了，那么接下来我们结合客户端来讲解全局锁的作用

回顾前文

在之前有写过rocektMq push模式的消息处理。

public boolean putMessage(final List<MessageExt> msgs) {
        boolean dispatchToConsume = false;
        try {
            this.lockTreeMap.writeLock().lockInterruptibly();
            try {
                int validMsgCnt = 0;
                for (MessageExt msg : msgs) {
                     MessageExt old = msgTreeMap.put(msg.getQueueOffset(), msg);
                    // treeMap返回值，如果返回空则表示不存在， 返回值不为空，则表示key重复，两个key值相等，则新值覆盖旧值，并返回新值
                    if (null == old) {
                        validMsgCnt++;
                        this.queueOffsetMax = msg.getQueueOffset();
                        msgSize.addAndGet(msg.getBody().length);
                    }
                }
                msgCount.addAndGet(validMsgCnt);
							// 如果treeMap不为空，并且不在消费，那么设置dispatchToConsume为true
                if (!msgTreeMap.isEmpty() && !this.consuming) {
                    dispatchToConsume = true;
                    this.consuming = true;
                }

                if (!msgs.isEmpty()) {
                    MessageExt messageExt = msgs.get(msgs.size() - 1);
                    String property = messageExt.getProperty(MessageConst.PROPERTY_MAX_OFFSET);
                    if (property != null) {
                        long accTotal = Long.parseLong(property) - messageExt.getQueueOffset();
                        if (accTotal > 0) {
                            this.msgAccCnt = accTotal;
                        }
                    }
                }
            } finally {
                this.lockTreeMap.writeLock().unlock();
            }
        } catch (InterruptedException e) {
            log.error("putMessage exception", e);
        }

        return dispatchToConsume;
    }

步骤说明：

1.将当前从broker拉取到的消息，放入到ProcessQueue里面的treeMap里面去，offset为key，消息为内容，这样就对消息做了有序存储了，这也是为什么单个队列的消息是有序的，因为全程都是按照offset来做的，offset小的在treemap上面

2.判断map里面的消息不能为空，然后呢当前队列没有被在消费，也就是consuming = false ，这种情况下，就会设置dispatchToConsume = true , 这个参数主要是为了做顺序消息的时候使用，保证同一个队列只有一个线程在消费。

接下来我们看下顺序消息的消费模型

ConsumeMessageOrderlyService

@Override
    public void submitConsumeRequest(
        final List<MessageExt> msgs,
        final ProcessQueue processQueue,
        final MessageQueue messageQueue,
        final boolean dispathToConsume) {
        if (dispathToConsume) {
            ConsumeRequest consumeRequest = new ConsumeRequest(processQueue, messageQueue);
            this.consumeExecutor.submit(consumeRequest);
        }
    }

如果dispathToConsume = true的情况下，才会构建一个ConsumerRequest , 也就是一个消费任务放到线程池里面去，进行任务消费。

下面我们看下ConsumerRequest的run方法。

 @Override
        public void run() {
            if (this.processQueue.isDropped()) {
                log.warn("run, the message queue not be able to consume, because it's dropped. {}", this.messageQueue);
                return;
            }
            // 根据messageQueue设置锁， 如果能够获取到锁，那么就继续往下执行，获取不到不能执行
            final Object objLock = messageQueueLock.fetchLockObject(this.messageQueue);
            synchronized (objLock) {
                // 消费模式==广播消费，或者 全局队列上锁&&锁没有过期
                if (MessageModel.BROADCASTING.equals(ConsumeMessageOrderlyService.this.                          defaultMQPushConsumerImpl.messageModel())|| (this.processQueue.isLocked() && !this.processQueue.isLockExpired())) {
                    final long beginTime = System.currentTimeMillis();
                    for (boolean continueConsume = true; continueConsume; ) {
                        if (this.processQueue.isDropped()) {
                            break;
                        }
                        // 没有上锁
                        if (MessageModel.CLUSTERING.equals(ConsumeMessageOrderlyService.
     this.defaultMQPushConsumerImpl.messageModel()) && !this.processQueue.isLocked()) {
                       ConsumeMessageOrderlyService.this.tryLockLaterAndReconsume(
                         this.messageQueue, this.processQueue, 10);
                            break;
                        }
                        // 锁过期
                        if (MessageModel.CLUSTERING.equals(ConsumeMessageOrderlyService.this.defaultMQPushConsumerImpl.messageModel())
                            && this.processQueue.isLockExpired()) {
                            log.warn("the message queue lock expired, so consume later, {}", this.messageQueue);
                            ConsumeMessageOrderlyService.this.tryLockLaterAndReconsume(this.messageQueue, this.processQueue, 10);
                            break;
                        }
                        // 消费任务一次运行的最大时间。可以通过-Drocketmq.client.maxTimeConsumeContinuously来设置，默认为60s。
                        long interval = System.currentTimeMillis() - beginTime;
                        if (interval > MAX_TIME_CONSUME_CONTINUOUSLY) {
                            ConsumeMessageOrderlyService.this.submitConsumeRequestLater(processQueue, messageQueue, 10);
                            break;
                        }
                        // 消费批次大小，默认为1， 也就是一个一个消费，实际生产环境可以调整大
                        final int consumeBatchSize =
                            ConsumeMessageOrderlyService.this.defaultMQPushConsumer.getConsumeMessageBatchMaxSize();

                        // 从treeMap里面依次获取对应数量的消息出来。
                        List<MessageExt> msgs = this.processQueue.takeMessags(consumeBatchSize);
                        // 设置消息的topic为RETRY_TOPIC, 防止消费失败的时候需要重试
                        defaultMQPushConsumerImpl.resetRetryAndNamespace(msgs, defaultMQPushConsumer.getConsumerGroup());
                        if (!msgs.isEmpty()) {
                            final ConsumeOrderlyContext context = new ConsumeOrderlyContext(this.messageQueue);

                            ConsumeOrderlyStatus status = null;

                            ConsumeMessageContext consumeMessageContext = null;
                            if (ConsumeMessageOrderlyService.this.defaultMQPushConsumerImpl.hasHook()) {
                                consumeMessageContext = new ConsumeMessageContext();
                                consumeMessageContext
                                    .setConsumerGroup(ConsumeMessageOrderlyService.this.defaultMQPushConsumer.getConsumerGroup());
                                consumeMessageContext.setNamespace(defaultMQPushConsumer.getNamespace());
                                consumeMessageContext.setMq(messageQueue);
                                consumeMessageContext.setMsgList(msgs);
                                consumeMessageContext.setSuccess(false);
                                // init the consume context type
                                consumeMessageContext.setProps(new HashMap<String, String>());
                                ConsumeMessageOrderlyService.this.defaultMQPushConsumerImpl.executeHookBefore(consumeMessageContext);
                            }
                            // 获取当前的系统指针
                            long beginTimestamp = System.currentTimeMillis();
                            ConsumeReturnType returnType = ConsumeReturnType.SUCCESS;
                            // 查看是否有异常
                            boolean hasException = false;
                            try {
                                // 队列锁---防止顺序消息被重复消费，
                                this.processQueue.getLockConsume().lock();
                                if (this.processQueue.isDropped()) {
                                    log.warn("consumeMessage, the message queue not be able to consume, because it's dropped. {}",
                                        this.messageQueue);
                                    break;
                                }
                                // 进行消息消费
                                status = messageListener.consumeMessage(Collections.unmodifiableList(msgs), context);
                            } catch (Throwable e) {
                                log.warn("consumeMessage exception: {} Group: {} Msgs: {} MQ: {}",
                                    RemotingHelper.exceptionSimpleDesc(e),
                                    ConsumeMessageOrderlyService.this.consumerGroup,
                                    msgs,
                                    messageQueue);
                                hasException = true;
                            } finally {
                                // 消费锁释放
                                this.processQueue.getLockConsume().unlock();
                            }
                            // 状态为空，或者属于错误的状态时，打印日志
                            if (null == status
                                || ConsumeOrderlyStatus.ROLLBACK == status
                                || ConsumeOrderlyStatus.SUSPEND_CURRENT_QUEUE_A_MOMENT == status) {
                                log.warn("consumeMessage Orderly return not OK, Group: {} Msgs: {} MQ: {}",
                                    ConsumeMessageOrderlyService.this.consumerGroup,
                                    msgs,
                                    messageQueue);
                            }
                            // 获取消费者执行的时间
                            long consumeRT = System.currentTimeMillis() - beginTimestamp;
                            if (null == status) {
                                // 是否存在异常
                                if (hasException) {
                                    returnType = ConsumeReturnType.EXCEPTION;
                                } else {
                                    returnType = ConsumeReturnType.RETURNNULL;
                                }
                            } else if (consumeRT >= defaultMQPushConsumer.getConsumeTimeout() * 60 * 1000) {
                                // 判断是否超时，如果是超时的话
                                returnType = ConsumeReturnType.TIME_OUT;
                            } else if (ConsumeOrderlyStatus.SUSPEND_CURRENT_QUEUE_A_MOMENT == status) {
                                //稍后消费，表示消费失败
                                returnType = ConsumeReturnType.FAILED;
                            } else if (ConsumeOrderlyStatus.SUCCESS == status) {
                                // 消费成功
                                returnType = ConsumeReturnType.SUCCESS;
                            }

                            if (ConsumeMessageOrderlyService.this.defaultMQPushConsumerImpl.hasHook()) {
                                consumeMessageContext.getProps().put(MixAll.CONSUME_CONTEXT_TYPE, returnType.name());
                            }

                            if (null == status) {
                                // 当状态为空时，说明需要稍后消费，顺序消费就是这样，一定要这个消费完成了才会继续
                                status = ConsumeOrderlyStatus.SUSPEND_CURRENT_QUEUE_A_MOMENT;
                            }

                            if (ConsumeMessageOrderlyService.this.defaultMQPushConsumerImpl.hasHook()) {
                                consumeMessageContext.setStatus(status.toString());
                                consumeMessageContext
                                    .setSuccess(ConsumeOrderlyStatus.SUCCESS == status || ConsumeOrderlyStatus.COMMIT == status);
                                ConsumeMessageOrderlyService.this.defaultMQPushConsumerImpl.executeHookAfter(consumeMessageContext);
                            }
                            // 收集统计数据
                            ConsumeMessageOrderlyService.this.getConsumerStatsManager()
                                .incConsumeRT(ConsumeMessageOrderlyService.this.consumerGroup, messageQueue.getTopic(), consumeRT);
                            // 处理消费结果，这个在下面说下
                            continueConsume = ConsumeMessageOrderlyService.this.processConsumeResult(msgs, status, context, this);
                        } else {
                            continueConsume = false;
                        }
                    }
                } else {
                    if (this.processQueue.isDropped()) {
                        log.warn("the message queue not be able to consume, because it's dropped. {}", this.messageQueue);
                        return;
                    }
                    // 当队列没有上锁，那么会走这一块，然后进行上锁，这块最终又会重新执行到上面的代码里面去
                    ConsumeMessageOrderlyService.this.tryLockLaterAndReconsume(this.messageQueue, this.processQueue, 100);
                }
            }
        }

上面的代码很长，但是实际的步骤概括起来还是比较简单的

步骤说明：

首先获取本地的队列锁，本地的队列锁获取成功，则进行处理，获取失败的话则进入调用tryLockLaterAndReconsume延迟消费，放入一个定时线程池里面，等会再试
解下来的很长一段代码都是判断全局锁是否上锁成功，全局锁是否过期，只有全局锁在成功的状态，那么才可以继续处理
调用监听器进行消费，接下来就是消费状态的判断，如果出现异常，status=null, 也是会进行重试消费处理的。
对结果进行处理，当状态是SUSPEND_CURRENT_QUEUE_A_MOMENT的时候会进行重试

回答开篇：

1.当新增消费者或者减少消费者的时候，如何保证全局顺序消费。

消费者数量变更的时候，会触发负载均衡，客户端会重新计算消费的队列，这个时候会把不需要再消费的队列的全局锁释放掉，同时还是去borker里面对新消费的队列进行上锁，如果上锁失败，那么这个队列的消息是不能消费的，只有上锁成功才能被消费。