12. producer 如何均衡的将消息发送到broker上及故障延迟机制

查看DefaultMQProducerImpl.sendDefaultImpl()方法,

private SendResult sendDefaultImpl(
        Message msg,
        final CommunicationMode communicationMode,
        final SendCallback sendCallback,
        final long timeout
    ) throws MQClientException, RemotingException, MQBrokerException, InterruptedException {
        this.makeSureStateOK();
        Validators.checkMessage(msg, this.defaultMQProducer);
       ...
        // 获取topic信息
        TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());
        if (topicPublishInfo != null && topicPublishInfo.ok()) {
            boolean callTimeout = false;
            MessageQueue mq = null;
            Exception exception = null;
            SendResult sendResult = null;
            // 重试发送的次数, retryTimesWhenSendFailed = 2
            int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1;
            int times = 0;
            String[] brokersSent = new String[timesTotal];
            for (; times < timesTotal; times++) {
                // 选择一个broker
                // 在第一次循环时,lastBrokerName为空,在其后的循环中,lastBrokerName不为空
                String lastBrokerName = null == mq ? null : mq.getBrokerName();
            // 选择一个messageQueue
                MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
                if (mqSelected != null) {
                    mq = mqSelected;
                    brokersSent[times] = mq.getBrokerName();
                    ...
                }
                ...
        }
        ...
    }

这里会选取一个messageQueue进行消息发送,在构造requestHeader时,传入queueId,

requestHeader.setQueueId(mq.getQueueId());

现在来看看是如何选取messageQueue的

 public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
        return this.mqFaultStrategy.selectOneMessageQueue(tpInfo, lastBrokerName);
    }

这里mqFaultStrategy是MQFaultStrategy

    public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
        // 如果启用了故障延迟机制
        // 默认不启用
        if (this.sendLatencyFaultEnable) {
            ...
        }

        return tpInfo.selectOneMessageQueue(lastBrokerName);
    }

故障延迟机制是不开启的,会直接走tpInfo.selectOneMessageQueue(lastBrokerName)

 public MessageQueue selectOneMessageQueue(final String lastBrokerName) {
        // 第一次发送时,lastBrokerName为空
        if (lastBrokerName == null) {
            return selectOneMessageQueue();
        } else {
            for (int i = 0; i < this.messageQueueList.size(); i++) {
                int index = this.sendWhichQueue.incrementAndGet();
                // 对messageQueueList.size()取模,定位到一个下标
                int pos = Math.abs(index) % this.messageQueueList.size();
                if (pos < 0)
                    pos = 0;
                MessageQueue mq = this.messageQueueList.get(pos);
                // 和上一次broker一样时,不返回
                // 规避上一次发送失败的broker
                if (!mq.getBrokerName().equals(lastBrokerName)) {
                    return mq;
                }
            }
            return selectOneMessageQueue();
        }
    }

这里,在外层第一次发送时,lastBrokerName为空,

  public MessageQueue selectOneMessageQueue() {
        //
        int index = this.sendWhichQueue.incrementAndGet();
        int pos = Math.abs(index) % this.messageQueueList.size();
        if (pos < 0)
            pos = 0;
        return this.messageQueueList.get(pos);
    }

这里是通过this.sendWhichQueue累加然后取模,定位到一个下标,然后定位到一个messageQueue

private volatile ThreadLocalIndex sendWhichQueue = new ThreadLocalIndex();

而sendWhichQueue又是一个ThreadLocalIndex

public class ThreadLocalIndex {
    private final ThreadLocal<Integer> threadLocalIndex = new ThreadLocal<Integer>();
    private final Random random = new Random();

    public int incrementAndGet() {
        Integer index = this.threadLocalIndex.get();
        if (null == index) {
            index = Math.abs(random.nextInt());
            this.threadLocalIndex.set(index);
        }

        this.threadLocalIndex.set(++index);
        return Math.abs(index);
    }

    @Override
    public String toString() {
        return "ThreadLocalIndex{" +
            "threadLocalIndex=" + threadLocalIndex.get() +
            '}';
    }
}

这里就可以认为sendWhichQueue是保存在threadLocal中,线程独有,当第一次获取时,会使用random生成一个随机数,以后只要累加使用即可。

回到上面的流程,外层第一次发送时,lastBrokerName为空,如果第一次发送失败了,

那么for需要会在循环几次发送,同步发送时,默认是3次,第二次重试发送时,lastBrokerName就不为空了,并且会重新选取一个messageQueue

 public MessageQueue selectOneMessageQueue(final String lastBrokerName) {
        // 第一次发送时,lastBrokerName为空
        if (lastBrokerName == null) {
            ...
        } else {
            // 重试发送时,走这里
            for (int i = 0; i < this.messageQueueList.size(); i++) {
                int index = this.sendWhichQueue.incrementAndGet();
                // 对messageQueueList.size()取模,定位到一个下标
                int pos = Math.abs(index) % this.messageQueueList.size();
                if (pos < 0)
                    pos = 0;
                MessageQueue mq = this.messageQueueList.get(pos);
                // 和上一次broker一样时,不返回
                // 规避上一次发送失败的broker
                if (!mq.getBrokerName().equals(lastBrokerName)) {
                    return mq;
                }
            }
            return selectOneMessageQueue();
        }
    }

这里就会直接对sendWhichQueue累加,然后取模,获取一个下标,但是多了一个判断,如果本次选取的brokerName和上一次的相同,即同个broker,会重新进行messageQueue选举的,因为这里是重试发送,认为上一次发送的目标broker不可用,这里避免请求发送到同一个broker上去

线程内轮询

---------------------------------------------------------------------------------------------------------------------------------

rocketmq提供了故障延迟机制

生产者发送消息,需要发送到指定的MessageQueue上,如果发送失败了,则很可能说明这个MessageQueue所在的broker出现了某种问题,则在发送下一条消息或者重试的时候,需要尽可能的避免上次失败的broker。在rocketmq中,MQFaultStrategy负责做这件事情。
————————————————
版权声明:本文为CSDN博主「pumpkin_pk」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/yuxiuzhiai/article/details/103740627

MQFaultStrategy里面有一个变量

private final LatencyFaultTolerance<String> latencyFaultTolerance = new LatencyFaultToleranceImpl();

而latencyFaultTolerance中其实是维护了一个map

private final ConcurrentHashMap<String, FaultItem> faultItemTable = new ConcurrentHashMap<String, FaultItem>(16);

这里面维护了故障broker的信息

class FaultItem implements Comparable<FaultItem> {
        // brokerName
        private final String name;
        // 这次发送消息到出现异常的时间,或者是发送耗时
        private volatile long currentLatency;
        // 可用时间,在这个时间之前,这个broker不可用,认为处于故障状态
        private volatile long startTimestamp;
        ...
}

来看主流程,选取messageQueue时

// 如果启用了故障延迟机制
        // 默认不启用
        if (this.sendLatencyFaultEnable) {
            try {
                int index = tpInfo.getSendWhichQueue().incrementAndGet();
                for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
                    int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
                    if (pos < 0)
                        pos = 0;
                    MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
                    // 判断messageQueue对应的brokerName是否可用
                    if (latencyFaultTolerance.isAvailable(mq.getBrokerName()))
                        return mq;
                }
                // 走到这里,所有broker都不可用
                // 从排序之后的故障的broker列表的前半部分,选取一个broker
                final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
                // 获取写队列的数量
                int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
                if (writeQueueNums > 0) {
                    // writeQueueNums >0 说明notBestBroker还没有恢复正常
                    // 随机选取一个messageQueue
                    final MessageQueue mq = tpInfo.selectOneMessageQueue();
                    // TODO 这里没看懂?
                    if (notBestBroker != null) {
                        mq.setBrokerName(notBestBroker);
                        mq.setQueueId(tpInfo.getSendWhichQueue().incrementAndGet() % writeQueueNums);
                    }
                    return mq;
                }
                // 这里writeQueueNums <=0, 在tpInfo.getQueueIdByBroker(notBestBroker)中,
                // 如果在故障broker列表中未找到notBestBroker时,返回-1
                // 即notBestBroker恢复正常
                else {
                    // 从故障列表中移除notBestBroker
                    latencyFaultTolerance.remove(notBestBroker);
                }
            } catch (Exception e) {
                log.error("Error occurred when selecting message queue", e);
            }
            // 兜底走轮询策略
            return tpInfo.selectOneMessageQueue();
        }

上面是取的时候判断故障信息,那故障信息是在哪里维护的呢,回到发送的地方

    private SendResult sendDefaultImpl(
        Message msg,
        final CommunicationMode communicationMode,
        final SendCallback sendCallback,
        final long timeout
    ) throws MQClientException, RemotingException, MQBrokerException, InterruptedException {
        this.makeSureStateOK();
        Validators.checkMessage(msg, this.defaultMQProducer);
        final long invokeID = random.nextLong();
        long beginTimestampFirst = System.currentTimeMillis();
        long beginTimestampPrev = beginTimestampFirst;
        long endTimestamp = beginTimestampFirst;
        // 获取topic信息
        TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());
        if (topicPublishInfo != null && topicPublishInfo.ok()) {
            ...
            for (; times < timesTotal; times++) {
                // 选择一个broker,
                // 在第一次循环时,lastBrokerName为空,在其后的循环中,lastBrokerName不为空
                String lastBrokerName = null == mq ? null : mq.getBrokerName();
            // 选择一个messageQueue
                MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
                if (mqSelected != null) {
                    mq = mqSelected;
                    brokersSent[times] = mq.getBrokerName();
                    try {
                        beginTimestampPrev = System.currentTimeMillis();
                        ...
                        // 执行发送
                        sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);
                        log.info("发送结果:{}", sendResult);
                        endTimestamp = System.currentTimeMillis();
                        this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);
                        ...
                    } catch (RemotingException e) {
                        endTimestamp = System.currentTimeMillis();
                        this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, true);
                        ...
                    } catch (MQClientException e) {
                        endTimestamp = System.currentTimeMillis();
                        this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, true);
                        ...
                    } catch (MQBrokerException e) {
                        log.info("发送异常:", e);
                        endTimestamp = System.currentTimeMillis();
                        this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, true);
                        ...
                    } catch (InterruptedException e) {
                        endTimestamp = System.currentTimeMillis();
                        this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);
                        ...
                    }
                } else {
                    break;
                }
            }

            ...
        }

        validateNameServerSetting();

        throw new MQClientException("No route info of this topic: " + msg.getTopic() + FAQUrl.suggestTodo(FAQUrl.NO_TOPIC_ROUTE_INFO),
            null).setResponseCode(ClientErrorCode.NOT_FOUND_TOPIC_EXCEPTION);
    }

--

public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) {
        this.mqFaultStrategy.updateFaultItem(brokerName, currentLatency, isolation);
    }

--

    public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) {
        // 故障时isolation=true
        // 正常或者InterruptedException时,isolation= false
        if (this.sendLatencyFaultEnable) {
            // 计算故障延迟时间,
            // 不管是不是真的故障,这里有可能都是做延迟
            // 故障时,传递30000, 去延时的最大值进行延迟
            // 非故障时,根据发送的耗时,来动态确定延迟
            long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency);
            // 更新故障map
            this.latencyFaultTolerance.updateFaultItem(brokerName, currentLatency, duration);
        }
    }

--

private long computeNotAvailableDuration(final long currentLatency) {
        for (int i = latencyMax.length - 1; i >= 0; i--) {
            if (currentLatency >= latencyMax[i])
                return this.notAvailableDuration[i];
        }

        return 0;
    }

--

    private long[] latencyMax = {50L, 100L, 550L, 1000L, 2000L, 3000L, 15000L};
    private long[] notAvailableDuration = {0L, 0L, 30000L, 60000L, 120000L, 180000L, 600000L};

从数组中可以看出来,如果发送耗时特别大,也会触发延迟机制的

public void updateFaultItem(final String name, final long currentLatency, final long notAvailableDuration) {
        FaultItem old = this.faultItemTable.get(name);
        if (null == old) {
            // 新故障
            final FaultItem faultItem = new FaultItem(name);
            faultItem.setCurrentLatency(currentLatency);
            faultItem.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);

            // 并发更新
            old = this.faultItemTable.putIfAbsent(name, faultItem);
            if (old != null) {
                old.setCurrentLatency(currentLatency);
                old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
            }
        } else {
            // 已经处于故障状态,
            old.setCurrentLatency(currentLatency);
            old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
        }
    }

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值