1、问题来源:
压测环境是两台namesever,两台broker master,分别是10.255.255.142(broker-b)和10.255.255.151(broker-a),从监控上看151从2015-3-13后就没收到过消息。
测试环境两天master,总共的TPS是4000左右,消息大小是2K,
2、寻找问题点:
1、在eclipse环境连接压测环境,发现消息只发送到broker-b上,没有发送到broker-a上面。
2、怀疑是producer没有连接上broker-a,用netstat命令查看broker-a的连接,producer连接上了broker-a
3、怀疑producer从nameserver没有获取到broker-a上面的消息队列,使用MessageQueueSelector发现nameserver返回了broker-a的消息队列。
4、只往broker-a的消息队列上发送消息,报如下错误
com.alibaba.rocketmq.client.exception.MQBrokerException: CODE: 14 DESC: service not available now, maybe disk full, CL: 0.87 CQ: 0.87 INDEX: 0.87, maybe your broker machine memory too small.
For more information, please visit the url, https://github.com/alibaba/RocketMQ/issues/64
at com.alibaba.rocketmq.client.impl.MQClientAPIImpl.processSendResponse(MQClientAPIImpl.java:492)
at com.alibaba.rocketmq.client.impl.MQClientAPIImpl.sendMessageSync(MQClientAPIImpl.java:398)
at com.alibaba.rocketmq.client.impl.MQClientAPIImpl.sendMessage(MQClientAPIImpl.java:379)
at com.alibaba.rocketmq.client.impl.producer.DefaultMQProducerImpl.sendKernelImpl(DefaultMQProducerImpl.java:698)
at com.alibaba.rocketmq.client.impl.producer.DefaultMQProducerImpl.sendSelectImpl(DefaultMQProducerImpl.java:877)
at com.alibaba.rocketmq.client.impl.producer.DefaultMQProducerImpl.send(DefaultMQProducerImpl.java:851)
at com.alibaba.rocketmq.client.producer.DefaultMQProducer.send(DefaultMQProducer.java:163)
at com.ruishenh.rocketmq.example.Producer.main(Producer.java:78)
5、发现时硬盘不足,去broker-a上查看硬盘,硬盘还是有空间的
6、查看RocketMQ的源码,知道出现问题的地方:
DefaultMessageStore中的public PutMessageResult putMessage(MessageExtBrokerInner msg)
if (!this.runningFlags.isWriteable()) {
long value = this.printTimes.getAndIncrement();
if ((value % 50000) == 0) {
log.warn("message store is not writeable, so putMessage is forbidden "
+ this.runningFlags.getFlagBits());
}
return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
}
else {
this.printTimes.set(0);
}
RunningFlags类中的方法
public boolean isWriteable() {
if ((this.flagBits & (NotWriteableBit | WriteLogicsQueueErrorBit | DiskFullBit | WriteIndexFileErrorBit)) == 0) {
return true;
}
return false;
}
7、基本判断是硬盘不足了,让测试人员把释放一部分的硬盘空间,当硬盘空闲空间达到4G以上broker-a就能正常工作了,出问题的时候空闲的硬盘空间是2.5G
预留问题点:
1、为什么硬盘还有2.5G,但是broker确不能正常工作。
2、如何配置broker删除stroe/commitlog等文件的策略。