1.消费队列索引
RocketMQ中的消费队列索引(Consumer Queue Index,简称ConsumeQueue)是消息消费过程中至关重要的数据结构,它与消息的存储索引(IndexFile)有所不同但密切相关。以下是关于消费队列索引的主要特点:
-
ConsumeQueue的作用:
- ConsumeQueue是一个逻辑上的消费队列,每个Topic下的每个Message Queue都有对应的ConsumeQueue。
- 它主要用来记录某个Message Queue中消息的物理偏移量、消息大小以及其它元数据(比如消息Tag的Hash值等)。
- 由于RocketMQ的消息是以追加的方式写入CommitLog文件的,ConsumeQueue实际上起到了“间接寻址”的作用,帮助消费者快速定位到CommitLog中的具体消息位置。
-
存储结构:
- ConsumeQueue并不存储完整消息内容,仅存储指向CommitLog中消息的索引信息,这样大大减少了索引文件的大小,提高了读取效率。
- 每个ConsumeQueue文件对应一个Message Queue,文件内按照消息存储顺序排列,因此消费者可以通过单线程顺序读取ConsumeQueue来保证消费的顺序性。
-
消费过程:
- 消费者在拉取消息时,先从ConsumeQueue中获取到消息的物理地址和长度等信息,然后根据这些信息直接从CommitLog中读取消息的实际内容。
-
索引更新:
- 当新的消息被生产者发送并且存储到CommitLog后,Broker会同步更新相应的ConsumeQueue索引信息,确保消息能够被正确地消费。
-
顺序消费:
- 在实现顺序消费时,RocketMQ通过确保同一Message Queue的消息会被同一个消费者线程依次消费,同时结合ConsumeQueue的这种顺序存储特性,得以实现严格的消息顺序保证。
总之,消费队列索引在RocketMQ中扮演着消息路由和定位的角色,它是RocketMQ实现高效、稳定消息消费的基础组件之一。
2.存储结构
消费队列是个索引结构,所以设计越简单越好,就像MYSQL采用B+Tree一样,非叶子结点不会存储数据。同理,消费队列的索引同样不会有数据,只会保存消息的地址位置,消息大小,消息的tag三个信息。当然这个索引也同样是队列形式,并不像B+Tree那种树形结构,主要是使用场景的不同。
在代码中我们可以看到,每一个消息索引的存储结构只有这三个数据,长度是20字节。位于代码org.apache.rocketmq.store.ConsumeQueue中定义。
3.存储过程
上一篇文章中我们了解了commitLog的存储过程,其中没有看到有关consumeQueue的存储,那么就应该想到,它也是一个异步操作。同时底层依旧使用了MappedFile处理,那么也应该分为2步,第一步写入buffer,第二步进行刷盘
3.1 写入Buffer过程
首先consumeQueue文件并不会使用之前提到的TransientStorePool,会直接写入到MappedByteBuffer,然后定期进行刷盘操作。现在从代码层面看下如何进行写入,代码入口位于org.apache.rocketmq.store.DefaultMessageStore.ReputMessageService线程中,这里主要使用到了分发(dispatch)机制,主要是在org.apache.rocketmq.store.DefaultMessageStore类中使用责任链模式来实现。当前任务每1ms触发一次
// 判断是否有新的数据需要处理
public boolean isCommitLogAvailable() {
return this.reputFromOffset < DefaultMessageStore.this.getConfirmOffset();
}
public void doReput() {
// 常规判断,判断已处理的数据和最小偏移量大小
if (this.reputFromOffset < DefaultMessageStore.this.commitLog.getMinOffset()) {
LOGGER.warn("The reputFromOffset={} is smaller than minPyOffset={}, this usually indicate that the dispatch behind too much and the commitlog has expired.",
this.reputFromOffset, DefaultMessageStore.this.commitLog.getMinOffset());
this.reputFromOffset = DefaultMessageStore.this.commitLog.getMinOffset();
}
for (boolean doNext = true; this.isCommitLogAvailable() && doNext; ) {
// 这里是将上次处理偏移量到当前文件写入位置进行读取,只会处理reputFromOffset所处的文件
SelectMappedBufferResult result = DefaultMessageStore.this.commitLog.getData(reputFromOffset);
if (result == null) {
break;
}
try {
this.reputFromOffset = result.getStartOffset();
for (int readSize = 0; readSize < result.getSize() && reputFromOffset < DefaultMessageStore.this.getConfirmOffset() && doNext; ) {
// 对返回的Buffer进行解析,可能会出现上次同步后剩余空间不足以写下后续的message会出现BLANK_MAGIC_CODE
DispatchRequest dispatchRequest =
DefaultMessageStore.this.commitLog.checkMessageAndReturnSize(result.getByteBuffer(), false, false, false);
int size = dispatchRequest.getBufferSize() == -1 ? dispatchRequest.getMsgSize() : dispatchRequest.getBufferSize();
if (reputFromOffset + size > DefaultMessageStore.this.getConfirmOffset()) {
doNext = false;
break;
}
// decode成功,则进行dispatch
if (dispatchRequest.isSuccess()) {
// 如果数据是有效的,则进行dispatch
if (size > 0) {
// 关键逻辑,执行dispatch
DefaultMessageStore.this.doDispatch(dispatchRequest);
if (!notifyMessageArriveInBatch) {
notifyMessageArriveIfNecessary(dispatchRequest);
}
this.reputFromOffset += size;
readSize += size;
if (!DefaultMessageStore.this.getMessageStoreConfig().isDuplicationEnable() &&
DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole() == BrokerRole.SLAVE) {
DefaultMessageStore.this.storeStatsService
.getSinglePutMessageTopicTimesTotal(dispatchRequest.getTopic()).add(dispatchRequest.getBatchSize());
DefaultMessageStore.this.storeStatsService
.getSinglePutMessageTopicSizeTotal(dispatchRequest.getTopic())
.add(dispatchRequest.getMsgSize());
}
// 这里是遇到上述说的 出现上次同步后剩余空间不足以写下后续的message会出现BLANK_MAGIC_CODE,然后进行文件切换
} else if (size == 0) {
this.reputFromOffset = DefaultMessageStore.this.commitLog.rollNextFile(this.reputFromOffset);
readSize = result.getSize();
}
} else {
if (size > 0) {
LOGGER.error("[BUG]read total count not equals msg total size. reputFromOffset={}", reputFromOffset);
this.reputFromOffset += size;
} else {
doNext = false;
// If user open the dledger pattern or the broker is master node,
// it will not ignore the exception and fix the reputFromOffset variable
if (DefaultMessageStore.this.getMessageStoreConfig().isEnableDLegerCommitLog() ||
DefaultMessageStore.this.brokerConfig.getBrokerId() == MixAll.MASTER_ID) {
LOGGER.error("[BUG]dispatch message to consume queue error, COMMITLOG OFFSET: {}",
this.reputFromOffset);
this.reputFromOffset += result.getSize() - readSize;
}
}
}
}
} catch (RocksDBException e) {
ERROR_LOG.info("dispatch message to cq exception. reputFromOffset: {}", this.reputFromOffset, e);
return;
} finally {
result.release();
}
finishCommitLogDispatch();
}
}
写入位置位于org.apache.rocketmq.store.DefaultMessageStore.CommitLogDispatcherBuildConsumeQueue#dispatch----->org.apache.rocketmq.store.ConsumeQueue#putMessagePositionInfoWrapper
//org.apache.rocketmq.store.ConsumeQueue#putMessagePositionInfoWrapper
@Override
public void putMessagePositionInfoWrapper(DispatchRequest request) {
final int maxRetries = 30;
// 根据运行状态判断是否可以写入
boolean canWrite = this.messageStore.getRunningFlags().isCQWriteable();
// 重试机制,默认30次
for (int i = 0; i < maxRetries && canWrite; i++) {
long tagsCode = request.getTagsCode();
if (isExtWriteEnable()) {
ConsumeQueueExt.CqExtUnit cqExtUnit = new ConsumeQueueExt.CqExtUnit();
cqExtUnit.setFilterBitMap(request.getBitMap());
cqExtUnit.setMsgStoreTime(request.getStoreTimestamp());
cqExtUnit.setTagsCode(request.getTagsCode());
long extAddr = this.consumeQueueExt.put(cqExtUnit);
if (isExtAddr(extAddr)) {
tagsCode = extAddr;
} else {
log.warn("Save consume queue extend fail, So just save tagsCode! {}, topic:{}, queueId:{}, offset:{}", cqExtUnit,
topic, queueId, request.getCommitLogOffset());
}
}
// 在这里写入的consumeQueue
boolean result = this.putMessagePositionInfo(request.getCommitLogOffset(),
request.getMsgSize(), tagsCode, request.getConsumeQueueOffset());
if (result) {
if (this.messageStore.getMessageStoreConfig().getBrokerRole() == BrokerRole.SLAVE ||
this.messageStore.getMessageStoreConfig().isEnableDLegerCommitLog()) {
this.messageStore.getStoreCheckpoint().setPhysicMsgTimestamp(request.getStoreTimestamp());
}
this.messageStore.getStoreCheckpoint().setLogicsMsgTimestamp(request.getStoreTimestamp());
if (MultiDispatchUtils.checkMultiDispatchQueue(this.messageStore.getMessageStoreConfig(), request)) {
multiDispatchLmqQueue(request, maxRetries);
}
return;
} else {
// XXX: warn and notify me
log.warn("[BUG]put commit log position info to " + topic + ":" + queueId + " " + request.getCommitLogOffset()
+ " failed, retry " + i + " times");
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
log.warn("", e);
}
}
}
private boolean putMessagePositionInfo(final long offset, final int size, final long tagsCode,
final long cqOffset) {
if (offset + size <= this.getMaxPhysicOffset()) {
log.warn("Maybe try to build consume queue repeatedly maxPhysicOffset={} phyOffset={}", this.getMaxPhysicOffset(), offset);
return true;
}
//这里是一个固定20长度的byteBuffer
this.byteBufferIndex.flip();
this.byteBufferIndex.limit(CQ_STORE_UNIT_SIZE);
this.byteBufferIndex.putLong(offset);
this.byteBufferIndex.putInt(size);
this.byteBufferIndex.putLong(tagsCode);
// 根据cqOffset(1,2,3,4,5)计算所属的文件
final long expectLogicOffset = cqOffset * CQ_STORE_UNIT_SIZE;
// 计算所属的文件
MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile(expectLogicOffset);
if (mappedFile != null) {
// 判断是否是第一次创建
if (mappedFile.isFirstCreateInQueue() && cqOffset != 0 && mappedFile.getWrotePosition() == 0) {
this.minLogicOffset = expectLogicOffset;
this.mappedFileQueue.setFlushedWhere(expectLogicOffset);
this.mappedFileQueue.setCommittedWhere(expectLogicOffset);
this.fillPreBlank(mappedFile, expectLogicOffset);
log.info("fill pre blank space " + mappedFile.getFileName() + " " + expectLogicOffset + " "
+ mappedFile.getWrotePosition());
}
if (cqOffset != 0) {
long currentLogicOffset = mappedFile.getWrotePosition() + mappedFile.getFileFromOffset();
if (expectLogicOffset < currentLogicOffset) {
log.warn("Build consume queue repeatedly, expectLogicOffset: {} currentLogicOffset: {} Topic: {} QID: {} Diff: {}",
expectLogicOffset, currentLogicOffset, this.topic, this.queueId, expectLogicOffset - currentLogicOffset);
return true;
}
if (expectLogicOffset != currentLogicOffset) {
LOG_ERROR.warn(
"[BUG]logic queue order maybe wrong, expectLogicOffset: {} currentLogicOffset: {} Topic: {} QID: {} Diff: {}",
expectLogicOffset,
currentLogicOffset,
this.topic,
this.queueId,
expectLogicOffset - currentLogicOffset
);
}
}
// 记录最大物理偏移
this.setMaxPhysicOffset(offset + size);
// 进行Buffer写入
return mappedFile.appendMessage(this.byteBufferIndex.array());
}
return false;
}
到这里索引数据已经写入到了Buffer中,但是依旧没有刷新到文件。
3.2 刷入文件过程
之前commitLog是由一个线程执行,那么consumeQueue也是由一个线程执行。线程位于org.apache.rocketmq.store.DefaultMessageStore.FlushConsumeQueueService
private void doFlush(int retryTimes) {
// 最少一次刷盘页数
int flushConsumeQueueLeastPages = DefaultMessageStore.this.getMessageStoreConfig().getFlushConsumeQueueLeastPages();
// 这里是停机时执行的逻辑,flushConsumeQueueLeastPages=0会强制刷新数据
if (retryTimes == RETRY_TIMES_OVER) {
flushConsumeQueueLeastPages = 0;
}
long logicsMsgTimestamp = 0;
// 刷盘间隔
int flushConsumeQueueThoroughInterval = DefaultMessageStore.this.getMessageStoreConfig().getFlushConsumeQueueThoroughInterval();
long currentTimeMillis = System.currentTimeMillis();
// 判断上次刷新时间和当前时间差
if (currentTimeMillis >= (this.lastFlushTimestamp + flushConsumeQueueThoroughInterval)) {
this.lastFlushTimestamp = currentTimeMillis;
flushConsumeQueueLeastPages = 0;
logicsMsgTimestamp = DefaultMessageStore.this.getStoreCheckpoint().getLogicsMsgTimestamp();
}
// 当前的消费队列列表
ConcurrentMap<String, ConcurrentMap<Integer, ConsumeQueueInterface>> tables = DefaultMessageStore.this.getConsumeQueueTable();
for (ConcurrentMap<Integer, ConsumeQueueInterface> maps : tables.values()) {
for (ConsumeQueueInterface cq : maps.values()) {
boolean result = false;
for (int i = 0; i < retryTimes && !result; i++) {
// 按照顺序刷新各个队列
result = DefaultMessageStore.this.consumeQueueStore.flush(cq, flushConsumeQueueLeastPages);
}
}
}
if (messageStoreConfig.isEnableCompaction()) {
compactionStore.flush(flushConsumeQueueLeastPages);
}
// 安全检查点,需要对storeCheckPoint进行刷盘,只有在停止活着超时时才会执行
if (0 == flushConsumeQueueLeastPages) {
if (logicsMsgTimestamp > 0) {
DefaultMessageStore.this.getStoreCheckpoint().setLogicsMsgTimestamp(logicsMsgTimestamp);
}
DefaultMessageStore.this.getStoreCheckpoint().flush();
}
}
结语:
到这里可以看到,cq文件大体的逻辑和commitlog文件很相似,只有很小的差距,这样做我们就能在代码上有最大化的重用。consumeQueue是个对commitlog进行的索引文件。有了索引文件在消费消息时就不需要遍历commitlog,能够快速定位消息。下一期就看下在消费时这个文件是如何使用的吧