文章目录
前言
最近在本地做了一个测试发现,当producer端每分钟发送一条数据时,consumer端正常消费的情况下rocketmq-client-consumer-pull-rt指标的值却很高,本篇文章就来分析下原因。
一、场景描述
在本地启动一个producer和一个consumer,producer发送消息是每分钟发送一条,consumer消费消息正常消费即可,此时在grafana上发现rocketmq_client_consumer_pull_rt指标的值很高,具体如下图所示,rocketmq_client_consumer_pull_rt指标的含义是consumer拉取消息的响应时间。
二、原因分析
按照前面的场景描述会发现该场景其实可以概括为在consumer端要拉取的消息还没有被写入broker端的情况下rocketmq_client_consumer_pull_rt的指标值会较高。出现该现象的原因是broker端针对这种情况不会立即返回客户端响应而是通过轮询机制暂时将该请求hold一段时间,在这段时间内如果consumer请求的消息发送过来则会将消息返回给consumer,如果在这段时间内如果consumer请求的消息没有发送过来并且该请求在broker端hold的时间超过BROKER_SUSPEND_MAX_TIME_MILLIS时会将响应返回给客户端。接下来就来分析下RocketMQ的轮询机制。
1.broker端PullMessageProcessor处理consumer的拉取消息请求
(1)根据consumequeue从commitlog获取消息
如下所示是broker根据consumequeue从commitlog获取消息的关键代码,由于此时consumer端获取的消息producer还没有发送过来,所以getMessage方法会返回GetMessageStatus.OFFSET_OVERFLOW_ONE状态,接着PullMessageProcessor会根据该状态将其转换为ResponseCode.PULL_NOT_FOUND的响应码。
final GetMessageResult getMessageResult =
this.brokerController.getMessageStore().getMessage(requestHeader.getConsumerGroup(), requestHeader.getTopic(),
requestHeader.getQueueId(), requestHeader.getQueueOffset(), requestHeader.getMaxMsgNums(), messageFilter);
public GetMessageResult getMessage(final String group, final String topic, final int queueId, final long offset,
final int maxMsgNums,
final MessageFilter messageFilter) {
if (this.shutdown) {
log.warn("message store has shutdown, so getMessage is forbidden");
return null;
}
if (!this.runningFlags.isReadable()) {
log.warn("message store is not readable, so getMessage is forbidden " + this.runningFlags.getFlagBits());
return null;
}
long beginTime = this.getSystemClock().now();
GetMessageStatus status = GetMessageStatus.NO_MESSAGE_IN_QUEUE;
long nextBeginOffset = offset;
long minOffset = 0;
long maxOffset = 0;
GetMessageResult getResult = new GetMessageResult();
final long maxOffsetPy = this.commitLog.getMaxOffset();
ConsumeQueue consumeQueue = findConsumeQueue(topic, queueId);
if (consumeQueue != null) {
minOffset = consumeQueue.getMinOffsetInQueue();
maxOffset = consumeQueue.getMaxOffsetInQueue();
if (maxOffset == 0) {
status = GetMessageStatus.NO_MESSAGE_IN_QUEUE;
nextBeginOffset = nextOffsetCorrection(offset, 0);
} else if (offset < minOffset) {
status = GetMessageStatus.OFFSET_TOO_SMALL;
nextBeginOffset = nextOffsetCorrection(offset, minOffset);
} else if (offset == maxOffset) {
status = GetMessageStatus.OFFSET_OVERFLOW_ONE;
nextBeginOffset = nextOffsetCorrection(offset, offset);
} else if (offset > maxOffset) {
status = GetMessageStatus.OFFSET_OVERFLOW_BADLY;
if (0 == minOffset) {
nextBeginOffset = nextOffsetCorrection(offset, minOffset);
} else {
nextBeginOffset = nextOffsetCorrection(offset, maxOffset);
}
} else {
SelectMappedBufferResult bufferConsumeQueue = consumeQueue.getIndexBuffer(offset);
if (bufferConsumeQueue != null) {
try {
status = GetMessageStatus.NO_MATCHED_MESSAGE;
long nextPhyFileStartOffset = Long.MIN_VALUE;
long maxPhyOffsetPulling = 0;
...
}
switch (getMessageResult.getStatus()) {
case FOUND:
response.setCode(ResponseCode.SUCCESS);
break;
case MESSAGE_WAS_REMOVING:
response.setCode(ResponseCode.PULL_RETRY_IMMEDIATELY);
break;
case NO_MATCHED_LOGIC_QUEUE:
case NO_MESSAGE_IN_QUEUE:
if (0 != requestHeader.getQueueOffset()) {
response.setCode(ResponseCode.PULL_OFFSET_MOVED);
// XXX: warn and notify me
log.info("the broker store no queue data, fix the request offset {} to {}, Topic: {} QueueId: {} Consumer Group: {}",
requestHeader.getQueueOffset(),
getMessageResult.getNextBeginOffset(),
requestHeader.getTopic(),
requestHeader.getQueueId(),
requestHeader.getConsumerGroup()
);
} else {
response.setCode(ResponseCode.PULL_NOT_FOUND);
}
break;
case NO_MATCHED_MESSAGE:
response.setCode(ResponseCode.PULL_RETRY_IMMEDIATELY);
break;
case OFFSET_FOUND_NULL:
response.setCode(ResponseCode.PULL_NOT_FOUND);
break;
case OFFSET_OVERFLOW_BADLY:
response.setCode(ResponseCode.PULL_OFFSET_MOVED);
// XXX: warn and notify me
log.info("the request offset: {} over flow badly, broker max offset: {}, consumer: {}",
requestHeader.getQueueOffset(), getMessageResult.getMaxOffset(), channel.remoteAddress());
break;
case OFFSET_OVERFLOW_ONE:
response.setCode(ResponseCode.PULL_NOT_FOUND);
break;
case OFFSET_TOO_SMALL:
response.setCode(ResponseCode.PULL_OFFSET_MOVED);
log.info("the request offset too small. group={}, topic={}, requestOffset={}, brokerMinOffset={}, clientIp={}",
requestHeader.getConsumerGroup(), requestHeader.getTopic(), requestHeader.getQueueOffset(),
getMessageResult.getMinOffset(), channel.remoteAddress());
break;
default:
assert false;
break;
}
接下来PullMessageProcessor会根据不同的响应码做不同的处理,这里先介绍两个参数:
- brokerAllowSuspend
当PullMessageProcessor第一次处理拉取消息请求时会将brokerAllowSuspend设置为true;
case ResponseCode.PULL_NOT_FOUND:
if (brokerAllowSuspend && hasSuspendFlag) {
long pollingTimeMills = suspendTimeoutMillisLong;
if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
}
String topic = requestHeader.getTopic();
long offset = requestHeader.getQueueOffset();
int queueId = requestHeader.getQueueId();
PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);
this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
response = null;
break;
}
public RemotingCommand processRequest(final ChannelHandlerContext ctx,
RemotingCommand request) throws RemotingCommandException {
return this.processRequest(ctx.channel(), request, true);
}
- hasSuspendFlag
hasSuspendFlag是在consumer端拉取消息的请求中设置为true
-
suspendTimeoutMillisLong
suspendTimeoutMillisLong是consumer端拉取消息的请求中设置,其值为15秒
由于brokerAllowSuspend和hasSuspendFlag的值均为true,所以会进行以下操作: -
首先将pollingTimeMills设置为1000 * 15
-
判断broker端是否启用长轮询,如果没有则pollingTimeMills被设置为1000(默认情况下broker端是开启长轮询机制的,是通过broker配置项中的longPollingEnable设置的)
-
获取topic名称、queueId和queueoffset,然后构建请求pullRequest
-
调用PullRequestHoldService的suspendPullRequest方法将请求hold,具体是将请求存储在一个map结构pullRequestTable,key是topic@queueId,value是对应的请求列表
public void suspendPullRequest(final String topic, final int queueId, final PullRequest pullRequest) {
String key = this.buildKey(topic, queueId);
ManyPullRequest mpr = this.pullRequestTable.get(key);
if (null == mpr) {
mpr = new ManyPullRequest();
ManyPullRequest prev = this.pullRequestTable.putIfAbsent(key, mpr);
if (prev != null) {
mpr = prev;
}
}
mpr.addPullRequest(pullRequest);
}
- 将返回给consumer的response设置为null,由于response被设置为null,所以这里并没有触发consumer端的读事件,不会触发对响应结果的处理,具体见下图所示代码
2.PullRequestHoldService
接下来看看PullRequestHoldService的工作原理。
- 如果broker端开启长轮询,则会每5秒去遍历pullRequestTable,否则每1秒去遍历pullRequestTable;
- 调用checkHoldRequest方法遍历pullRequestTable,对每个请求PullRequestHoldService会先获取该topic在其queueId上的最大的queueoffset,然后比较queueoffset和请求中记录的待拉取的queueId的offset,如果后者大则会再次获取该queueId上的最大的queueoffset并进行比较,如果此时queueoffset大于待拉取的offset则说明有新消息到达,此时会对消息进行tag过滤,如果匹配则会调用executeRequestWhenWakeup方法进行消息拉取并返回给consumer;如果待拉取的offset大并且已经达到超时时间则会调用executeRequestWhenWakeup方法进行消息拉取并将响应结果返回给consumer;如果待拉取的offset大并且还没有到达超时时间则会重新将请求放入pullRequestTable中
注意:从notifyMessageArriving方法中不难看出当待拉取的offset大于queueoffset时会再次获取queueoffset进行比较;还有一点就是超时时间判断,每个请求中会记录其开始挂起的时间以及超时时间,如果broker端开启长轮询则该超时时间的值就是BROKER_SUSPEND_MAX_TIME_MILLIS否则该值为1秒;
public void run() {
log.info("{} service started", this.getServiceName());
while (!this.isStopped()) {
try {
if (this.brokerController.getBrokerConfig().isLongPollingEnable()) {
this.waitForRunning(5 * 1000);
} else {
this.waitForRunning(this.brokerController.getBrokerConfig().getShortPollingTimeMills());
}
long beginLockTimestamp = this.systemClock.now();
this.checkHoldRequest();
long costTime = this.systemClock.now() - beginLockTimestamp;
if (costTime > 5 * 1000) {
log.info("[NOTIFYME] check hold request cost {} ms.", costTime);
}
} catch (Throwable e) {
log.warn(this.getServiceName() + " service has exception. ", e);
}
}
log.info("{} service end", this.getServiceName());
}
public void notifyMessageArriving(final String topic, final int queueId, final long maxOffset, final Long tagsCode,
long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {
String key = this.buildKey(topic, queueId);
ManyPullRequest mpr = this.pullRequestTable.get(key);
if (mpr != null) {
List<PullRequest> requestList = mpr.cloneListAndClear();
if (requestList != null) {
List<PullRequest> replayList = new ArrayList<PullRequest>();
for (PullRequest request : requestList) {
long newestOffset = maxOffset;
if (newestOffset <= request.getPullFromThisOffset()) {
newestOffset = this.brokerController.getMessageStore().getMaxOffsetInQueue(topic, queueId);
}
if (newestOffset > request.getPullFromThisOffset()) {
boolean match = request.getMessageFilter().isMatchedByConsumeQueue(tagsCode,
new ConsumeQueueExt.CqExtUnit(tagsCode, msgStoreTime, filterBitMap));
// match by bit map, need eval again when properties is not null.
if (match && properties != null) {
match = request.getMessageFilter().isMatchedByCommitLog(null, properties);
}
if (match) {
try {
this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
request.getRequestCommand());
} catch (Throwable e) {
log.error("execute request when wakeup failed.", e);
}
continue;
}
}
if (System.currentTimeMillis() >= (request.getSuspendTimestamp() + request.getTimeoutMillis())) {
try {
this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
request.getRequestCommand());
} catch (Throwable e) {
log.error("execute request when wakeup failed.", e);
}
continue;
}
replayList.add(request);
}
if (!replayList.isEmpty()) {
mpr.addPullRequest(replayList);
}
}
}
}
最后还需要注意一点:executeRequestWhenWakeup方法在提交RequestTask给PullMessageProcessor时brokerAllowSuspend参数被设置为false,也就是这次不论是否能够拉取到消息都不会将该请求挂起。
3.broker端有新消息到达
当broker端有新消息到达则会由ReputMessageService构建对应consumequeue,然后这里有一个关键逻辑需要注意,代码如下,即如果当前broker是master节点、开启长轮询机制并且messageArrivingListener不为null的情况下(在broker启动过程中会构建messageArrivingListener,所以在里不为null),调用arriving方法,arriving方法会调用PullRequestHoldService的notifyMessageArriving来遍历pullRequestTable,看看新达到的消息是否是请求中待拉取的消息。
总结
最后笔者画了一幅图用于总结整个过程: