rocketmq消费失败重试源码分析。

啃了两天这个功能的源码,涉及了好几个模块,终于把这块逻辑啃完了。看完之后有种豁然开朗的感觉。

 

首先我们要知道一点 消息重试的功能是建立在延迟消费消息这个功能的前提条件之下的。 这篇文章会穿插延迟发送消息的一些源码分析。

① 当消费者(client)启动的时候,将会调用copySubscription()方法。

②消费端拉取到消息后将拉取到的消息List(最多32条)扔给CousumerMessageService服务进行消费。

我们再来看看ConsumeRequest

这个类继承了Runnable接口,我们只需要看下其run()方法即可知道消费者如何消费消息的。

  public void run() {
            //消息对应的processQueue负载队列如果因为负载均衡失去了此队列的消费权则isDropped()将变为true,直接返回,放弃消费。
            if (this.processQueue.isDropped()) {
                log.info("the message queue not be able to consume, because it's dropped. group={} {}", ConsumeMessageConcurrentlyService.this.consumerGroup, this.messageQueue);
                return;
            }
        //这个就是我们在创建消费者的时候注进去的监听器,根据监听器的动作决定如何去消费消息
            MessageListenerConcurrently listener = ConsumeMessageConcurrentlyService.this.messageListener;
            ConsumeConcurrentlyContext context = new ConsumeConcurrentlyContext(messageQueue);
            ConsumeConcurrentlyStatus status = null;

            ConsumeMessageContext consumeMessageContext = null;
            if (ConsumeMessageConcurrentlyService.this.defaultMQPushConsumerImpl.hasHook()) {
                consumeMessageContext = new ConsumeMessageContext();
                consumeMessageContext.setConsumerGroup(defaultMQPushConsumer.getConsumerGroup());
                consumeMessageContext.setProps(new HashMap<String, String>());
                consumeMessageContext.setMq(messageQueue);
                consumeMessageContext.setMsgList(msgs);
                consumeMessageContext.setSuccess(false);
                ConsumeMessageConcurrentlyService.this.defaultMQPushConsumerImpl.executeHookBefore(consumeMessageContext);
            }

            long beginTimestamp = System.currentTimeMillis();
            boolean hasException = false;
            ConsumeReturnType returnType = ConsumeReturnType.SUCCESS;
            try {
                //重点来了。这里代码的作用是如果此消息的主题为(%RETRY%+消费组的名称),那么将会将此消息的topic重置为原始消息的topic。
                //即此消息的真实topic会存储在properties当中,键为RETRY_TOPIC,值为真实topic,将真实topic取出赋予此消息。
                ConsumeMessageConcurrentlyService.this.resetRetryTopic(msgs);
                if (msgs != null && !msgs.isEmpty()) {
                    for (MessageExt msg : msgs) {
                        MessageAccessor.setConsumeStartTimeStamp(msg, String.valueOf(System.currentTimeMillis()));
                    }
                }
                //触发监听器执行预设的消息处理动作。
                status = listener.consumeMessage(Collections.unmodifiableList(msgs), context);
            } catch (Throwable e) {
                log.warn("consumeMessage exception: {} Group: {} Msgs: {} MQ: {}",
                    RemotingHelper.exceptionSimpleDesc(e),
                    ConsumeMessageConcurrentlyService.this.consumerGroup,
                    msgs,
                    messageQueue);
                hasException = true;
            }
            long consumeRT = System.currentTimeMillis() - beginTimestamp;
            if (null == status) {
                if (hasException) {
                    returnType = ConsumeReturnType.EXCEPTION;
                } else {
                    returnType = ConsumeReturnType.RETURNNULL;
                }
            } else if (consumeRT >= defaultMQPushConsumer.getConsumeTimeout() * 60 * 1000) {
                returnType = ConsumeReturnType.TIME_OUT;
            } else if (ConsumeConcurrentlyStatus.RECONSUME_LATER == status) {
                returnType = ConsumeReturnType.FAILED;
            } else if (ConsumeConcurrentlyStatus.CONSUME_SUCCESS == status) {
                returnType = ConsumeReturnType.SUCCESS;
            }

            if (ConsumeMessageConcurrentlyService.this.defaultMQPushConsumerImpl.hasHook()) {
                consumeMessageContext.getProps().put(MixAll.CONSUME_CONTEXT_TYPE, returnType.name());
            }

            if (null == status) {
                log.warn("consumeMessage return null, Group: {} Msgs: {} MQ: {}",
                    ConsumeMessageConcurrentlyService.this.consumerGroup,
                    msgs,
                    messageQueue);
                status = ConsumeConcurrentlyStatus.RECONSUME_LATER;
            }

            if (ConsumeMessageConcurrentlyService.this.defaultMQPushConsumerImpl.hasHook()) {
                consumeMessageContext.setStatus(status.toString());
                consumeMessageContext.setSuccess(ConsumeConcurrentlyStatus.CONSUME_SUCCESS == status);
                ConsumeMessageConcurrentlyService.this.defaultMQPushConsumerImpl.executeHookAfter(consumeMessageContext);
            }

            ConsumeMessageConcurrentlyService.this.getConsumerStatsManager()
                .incConsumeRT(ConsumeMessageConcurrentlyService.this.consumerGroup, messageQueue.getTopic(), consumeRT);

            if (!processQueue.isDropped()) {
                //监听器处理完消息之后无论成功或失败都最终将会走此方法处理下结果。
                ConsumeMessageConcurrentlyService.this.processConsumeResult(status, context, this);
            } else {
                log.warn("processQueue is dropped without process consume result. messageQueue={}, msgs={}", messageQueue, msgs);
            }
        }

这些来再看下processConsumeResult方法的源码。

public void processConsumeResult(
        final ConsumeConcurrentlyStatus status,
        final ConsumeConcurrentlyContext context,
        final ConsumeRequest consumeRequest
    ) {
        int ackIndex = context.getAckIndex();

        if (consumeRequest.getMsgs().isEmpty())
            return;
            //下面是消息消费ack机制,若一次ConsumerRequest封装的消息都消费成功,则设置ackIndex的值为消息总条数-1,反之ackIndex=-1
            //其目的是在后续的代码逻辑中使用,若处理消息返回RECONSUME_LATER 则需要进行消费重试机制,就是根据这个ack的值进行判断。
        switch (status) {
            case CONSUME_SUCCESS:
                if (ackIndex >= consumeRequest.getMsgs().size()) {
                    ackIndex = consumeRequest.getMsgs().size() - 1;
                }
                int ok = ackIndex + 1;
                int failed = consumeRequest.getMsgs().size() - ok;
                this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
                this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
                break;
            case RECONSUME_LATER:
                ackIndex = -1;
                this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
                    consumeRequest.getMsgs().size());
                break;
            default:
                break;
        }

        switch (this.defaultMQPushConsumer.getMessageModel()) {
            case BROADCASTING:
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
                }
                break;
            case CLUSTERING:
                //msgBackFailed 集合存储的是消费失败并且发送sendMessageBack也失败的消息。
                List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());
               //根据上面的出来的ackIndex的值进行遍历,若消费成功的情况下,下面的遍历是不会执行的,刚好不会触发遍历的条件
                //若消费失败则i=0,相当于会将consumeRequest中存储的消息遍历发送sendMessageBack,若发送失败则往msgBackFailed加入对应的消息。
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    boolean result = this.sendMessageBack(msg, context);
                    if (!result) {
                        msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
                        msgBackFailed.add(msg);
                    }
                }
                 //若msgBackFailed不为空,则证明发送sendMessageBack有失败(有可能部分也有可能全部),将发送sendMessageBack失败的
                 // 消息从consumeRequest删除。并且会将这些发送失败的消息重新包装起来5S后转发给消费线程池继续消费
                if (!msgBackFailed.isEmpty()) {
                    consumeRequest.getMsgs().removeAll(msgBackFailed);

                    this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
                }
                break;
            default:
                break;
        }
            //将consumeRequest中存在的消息从负载队列视图中将其删除,消费端从broker拉取消息首先都会将这些消息存储在ProcessQueue中的一个treeMap中
           //这个treeMap的key为消息的偏移量,value为消息实体。 此方法同时还返回了此ProcessQueue中经过删除后最小的偏移量,这边会有种现象:你删除的
           //这批消息最小的偏移量可能还不是treeMap中key的最小值。那么在下面更新本地消费进度的时候还是用treeMap中最小的偏移量。
           //这边我们要注意最重要一点对于消息重复消费的功能!!!消息消费失败若发送sendMessageBack,则这条消息也会从treeMap中删除,并更新进度。
           //这条消息的消费会交由消费集群中的一个节点去继续消费,至于哪个节点消费取决于负载均衡将此消息对应的topic对应的重试队列retryQueue(默认
           //为1条 )分配给哪个节点。
        long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
        if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
            //更新本地的消费进度.
            this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
        }
    }

我们再去看看sendMessageBack方法在broker端的处理:

private RemotingCommand consumerSendMsgBack(final ChannelHandlerContext ctx, final RemotingCommand request)
        throws RemotingCommandException {
        final RemotingCommand response = RemotingCommand.createResponseCommand(null);
        final ConsumerSendMsgBackRequestHeader requestHeader =
            (ConsumerSendMsgBackRequestHeader)request.decodeCommandCustomHeader(ConsumerSendMsgBackRequestHeader.class);

        if (this.hasConsumeMessageHook() && !UtilAll.isBlank(requestHeader.getOriginMsgId())) {

            ConsumeMessageContext context = new ConsumeMessageContext();
            context.setConsumerGroup(requestHeader.getGroup());
            context.setTopic(requestHeader.getOriginTopic());
            context.setCommercialRcvStats(BrokerStatsManager.StatsType.SEND_BACK);
            context.setCommercialRcvTimes(1);
            context.setCommercialOwner(request.getExtFields().get(BrokerStatsManager.COMMERCIAL_OWNER));

            this.executeConsumeMessageHookAfter(context);
        }
        SubscriptionGroupConfig subscriptionGroupConfig =
            this.brokerController.getSubscriptionGroupManager().findSubscriptionGroupConfig(requestHeader.getGroup());
        if (null == subscriptionGroupConfig) {
            response.setCode(ResponseCode.SUBSCRIPTION_GROUP_NOT_EXIST);
            response.setRemark("subscription group not exist, " + requestHeader.getGroup() + " "
                + FAQUrl.suggestTodo(FAQUrl.SUBSCRIPTION_GROUP_NOT_EXIST));
            return response;
        }

        if (!PermName.isWriteable(this.brokerController.getBrokerConfig().getBrokerPermission())) {
            response.setCode(ResponseCode.NO_PERMISSION);
            response.setRemark("the broker[" + this.brokerController.getBrokerConfig().getBrokerIP1() + "] sending message is forbidden");
            return response;
        }

        if (subscriptionGroupConfig.getRetryQueueNums() <= 0) {
            response.setCode(ResponseCode.SUCCESS);
            response.setRemark(null);
            return response;
        }

        //设置此条消息新的topic为%RETRY%+消费组的名称,并且选择新topic的队列(默认为0,默认情况下RetryQueueNums未1)
        String newTopic = MixAll.getRetryTopic(requestHeader.getGroup());
        int queueIdInt = Math.abs(this.random.nextInt() % 99999999) % subscriptionGroupConfig.getRetryQueueNums();

        int topicSysFlag = 0;
        if (requestHeader.isUnitMode()) {
            topicSysFlag = TopicSysFlag.buildSysFlag(false, true);
        }
         //为新topic创建topicConfig,实际上这步在client启动的时候和broker进行心跳的时候就已经创建好了。消费者client
         //会将自身订阅的subscription包装成心跳发送给broker,而subscription包含了名为(%RETRY%+消费组名称)的主题了,这个方法
        // 在broker接收到心跳的时候也会触发
        TopicConfig topicConfig = this.brokerController.getTopicConfigManager().createTopicInSendMessageBackMethod(
            newTopic,
            subscriptionGroupConfig.getRetryQueueNums(),
            PermName.PERM_WRITE | PermName.PERM_READ, topicSysFlag);
        if (null == topicConfig) {
            response.setCode(ResponseCode.SYSTEM_ERROR);
            response.setRemark("topic[" + newTopic + "] not exist");
            return response;
        }

        if (!PermName.isWriteable(topicConfig.getPerm())) {
            response.setCode(ResponseCode.NO_PERMISSION);
            response.setRemark(String.format("the topic[%s] sending message is forbidden", newTopic));
            return response;
        }
         //通过物理偏移量找到消息体
        MessageExt msgExt = this.brokerController.getMessageStore().lookMessageByOffset(requestHeader.getOffset());
        if (null == msgExt) {
            response.setCode(ResponseCode.SYSTEM_ERROR);
            response.setRemark("look message by offset failed, " + requestHeader.getOffset());
            return response;
        }
        //给原始消息新增属性,key为RETRY_TOPIC,value为原始消息的实际topic。这里很关键,和消费端消费消息时候的resetRetryTopic(msgs)
        //相呼应。
        final String retryTopic = msgExt.getProperty(MessageConst.PROPERTY_RETRY_TOPIC);
        if (null == retryTopic) {
            MessageAccessor.putProperty(msgExt, MessageConst.PROPERTY_RETRY_TOPIC, msgExt.getTopic());
        }
        msgExt.setWaitStoreMsgOK(false);
        //得到消息的延迟级别,默认此时的值为0。
        int delayLevel = requestHeader.getDelayLevel();

        int maxReconsumeTimes = subscriptionGroupConfig.getRetryMaxTimes();
        if (request.getVersion() >= MQVersion.Version.V3_4_9.ordinal()) {
            maxReconsumeTimes = requestHeader.getMaxReconsumeTimes();
        }
      //消息每消费失败一次都会增加ReconsumeTimes的值,当这个值达到了maxReconsumeTimes(默认为16),则将此消息送入死信队列。
        //且此死信队列不可读,也就说此消息在没有人工干预下再也不能发生消费了。
        if (msgExt.getReconsumeTimes() >= maxReconsumeTimes
            || delayLevel < 0) {
            newTopic = MixAll.getDLQTopic(requestHeader.getGroup());
            queueIdInt = Math.abs(this.random.nextInt() % 99999999) % DLQ_NUMS_PER_GROUP;

            topicConfig = this.brokerController.getTopicConfigManager().createTopicInSendMessageBackMethod(newTopic,
                DLQ_NUMS_PER_GROUP,
                PermName.PERM_WRITE, 0
            );
            if (null == topicConfig) {
                response.setCode(ResponseCode.SYSTEM_ERROR);
                response.setRemark("topic[" + newTopic + "] not exist");
                return response;
            }
        } else {
            //设置延迟级别为3,意味着此消息推迟10S进行消费。消息重复消费需要借助延迟消费的功能来实现的。
            if (0 == delayLevel) {
                delayLevel = 3 + msgExt.getReconsumeTimes();
            }

            msgExt.setDelayTimeLevel(delayLevel);
        }
       //创建一个新的消息,此消息的topic为newTopic,queueID为queueIdInt
        MessageExtBrokerInner msgInner = new MessageExtBrokerInner();
        msgInner.setTopic(newTopic);
        msgInner.setBody(msgExt.getBody());
        msgInner.setFlag(msgExt.getFlag());
        MessageAccessor.setProperties(msgInner, msgExt.getProperties());
        msgInner.setPropertiesString(MessageDecoder.messageProperties2String(msgExt.getProperties()));
        msgInner.setTagsCode(MessageExtBrokerInner.tagsString2tagsCode(null, msgExt.getTags()));

        msgInner.setQueueId(queueIdInt);
        msgInner.setSysFlag(msgExt.getSysFlag());
        msgInner.setBornTimestamp(msgExt.getBornTimestamp());
        msgInner.setBornHost(msgExt.getBornHost());
        msgInner.setStoreHost(this.getStoreHost());
        msgInner.setReconsumeTimes(msgExt.getReconsumeTimes() + 1);

        String originMsgId = MessageAccessor.getOriginMessageId(msgExt);
        MessageAccessor.setOriginMessageId(msgInner, UtilAll.isBlank(originMsgId) ? msgExt.getMsgId() : originMsgId);
      //将新创建的消息进行存储,存储的时候会有个逻辑判断,delayLevel大于0的情况下会将此消息的topic和queueID再!!进行一次转换,将
        // 此消息的newTopic,queueIdInt存入到属性中(real_topic,real_qid),新的topic为 SCHEDULE_TOPIC_XXXX,新的queue为 根据delayLevel的等级去本地delayLevelMap
        //中找到对应的队列,实际上这里的步骤就是延迟消息的实现。到时候会有ScheduleMessageService会去做后续的逻辑处理(这边先不分析,请看下文)。
        PutMessageResult putMessageResult = this.brokerController.getMessageStore().putMessage(msgInner);
        if (putMessageResult != null) {
            switch (putMessageResult.getPutMessageStatus()) {
                case PUT_OK:
                    String backTopic = msgExt.getTopic();
                    String correctTopic = msgExt.getProperty(MessageConst.PROPERTY_RETRY_TOPIC);
                    if (correctTopic != null) {
                        backTopic = correctTopic;
                    }

                    this.brokerController.getBrokerStatsManager().incSendBackNums(requestHeader.getGroup(), backTopic);

                    response.setCode(ResponseCode.SUCCESS);
                    response.setRemark(null);

                    return response;
                default:
                    break;
            }

            response.setCode(ResponseCode.SYSTEM_ERROR);
            response.setRemark(putMessageResult.getPutMessageStatus().name());
            return response;
        }

        response.setCode(ResponseCode.SYSTEM_ERROR);
        response.setRemark("putMessageResult is null");
        return response;
    }

上面是consumerSendMsgBack的主要核心源码分析,有些检查和统计就没有写注释了。这里我们要注意到一个消息如果消费失败并且触发成功了SendMsgBack ,那么broker端将会对其的topic和queueid进行两次的变化!!!!!

第一次的变化是为了重试的逻辑,第二次的转变是为了延迟消费的逻辑。延迟消费的逻辑,统一将想要延迟消费的消息的topic和queueID塞进properties中,然后将topic设为"SCHEDULE_TOPIC_XXXX" ,queueId设为根据DelayTimeLevel的等级从delayTimeLevel表中找到对应的队列。

延迟消费消息的逻辑这边不想累赘了,大致说下源码的原理吧。

ScheduleMessageService服务是来处理延迟消息的服务组件。delayLevelTable存储了不同的延迟级别的延迟时间,可配置。

默认有这么多个级别,作为value塞进表中。  offsetTable存储了不同级别的队列的消费进度。 这个服务会通过java自带的timer组件去遍历所有的消费队列,找到从各个队列的消费进度到结束为止的所有延迟队列消息,判断时间是否达到了延迟时间,达到了再将这些消息的原始topic和原始队列取出转发存储起来,待消费者消费。

 

这边对于重试消息应该转变了两次topic和queueID,导致在ScheduleMessageService转发存储的时候会将第一次转变的topic和queueid取出来进行消息转发 topic为(%RETRY%+cousumerGroup),queueid为0(默认为0)。此消费主题会被cousumerGroup

这个消费组消费(至于哪个节点消费由负载均衡决定,有空写下负载均衡的原理这部分源码也完整看过),消费组其中的一个节点node负载均衡后得到了这个队列的消费权,拉取消息之后进行消费。消费消息的时候调用

又回到了这里,将 topic为(%RETRY%+cousumerGroup)替换回藏在properties中的"REAL_TOPIC"进行消费计数

 

这就是完整的消费重试逻辑。借助了延迟消费组件来完成。发现源码看懂了解释巨难解释。因为涉及了太多组件很源码的分析,只能挑主干的代码来讲。如有偏颇,望见谅。

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值