一、背景
主从模式部署的broker集群,主节点同步模式为:异步复制,刷盘方式为:异步刷盘。
在主节点宕机重启后,从节点的store.log里大量出现了如下日志,并且在不断的输出:
2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5
2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5
2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5
2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5
重启从节点后,依然无法恢复。
二、调查
-
首先搜索该警告对应的代码,
CommitLog.java
部分代码如下:public DispatchRequest checkMessageAndReturnSize(java.nio.ByteBuffer byteBuffer, final boolean checkCRC, final boolean readBody) { try { // 1 TOTAL SIZE int totalSize = byteBuffer.getInt(); // 2 MAGIC CODE int magicCode = byteBuffer.getInt(); switch (magicCode) { case MESSAGE_MAGIC_CODE: break; case BLANK_MAGIC_CODE: return new DispatchRequest(0, true /* success */); default: log.warn("found a illegal magic code 0x" + Integer.toHexString(magicCode)); return new DispatchRequest(-1, false /* success */); }
调用
checkMessageAndReturnSize
的为ReputMessageService.java
的如下代码:class ReputMessageService extends ServiceThread { public void run() { while (!this.isStopped()) { try { Thread.sleep(1); this.doReput(); } catch (Exception e) { ... ... } } } private void doReput() { for (boolean doNext = true; this.isCommitLogAvailable() && doNext; ) { SelectMappedBufferResult result = commitLog.getData(reputFromOffset); if (result != null) { try { this.reputFromOffset = result.getStartOffset(); for (int readSize = 0; readSize < result.getSize() && doNext; ) { DispatchRequest dispatchRequest = commitLog.checkMessageAndReturnSize(result.getByteBuffer(), false, false); int size = dispatchRequest.getBufferSize() == -1 ? dispatchRequest.getMsgSize() : dispatchRequest.getBufferSize(); if (dispatchRequest.isSuccess()) { ... ... } else if (!dispatchRequest.isSuccess()) { if (size > 0) { log.error("[BUG]read total count not equals msg total size. reputFromOffset={}", reputFromOffset); this.reputFromOffset += size; } else { doNext = false; // If user open the dledger pattern or the broker is master node, // it will not ignore the exception and fix the reputFromOffset variable if (DefaultMessageStore.this.getMessageStoreConfig().isEnableDLegerCommitLog() || DefaultMessageStore.this.brokerConfig.getBrokerId() == MixAll.MASTER_ID) { log.error("[BUG]dispatch message to consume queue error, COMMITLOG OFFSET: {}", this.reputFromOffset); this.reputFromOffset += result.getSize() - readSize; } } } } } finally { result.release(); } } else { doNext = false; } } } }
ReputMessageService.java
的作用是根据commitlog构建出consume queue和index文件,即参考如下broker存储的数据文件目录:data |-- commitlog |-- config |-- consumequeue |-- index
由此得知,由于
ReputMessageService
在循环执行构建任务时,调用checkMessageAndReturnSize
,返回的结果指示为不成功,导致doReput()
无限循环执行。 -
那么到底是哪个文件导致上面的无限循环呢?
经过走查当时主节点重启后的日志,发现如下关键日志:
2022-05-11 13:50:11 INFO main - recover from this mapped file /data/broker-h/data/commitlog/00000012532714569728 2022-05-11 13:50:11 INFO FlushIndexFileThread - flush index file elapsed time(ms) 117 2022-05-11 13:50:15 WARN main - found a illegal magic code 0x0 2022-05-11 13:50:15 INFO main - recover physics file end, /data/broker-h/data/commitlog/00000012532714569728 pos=995267206
即,broker在重启时,会根据最后一个commitlog恢复相应的consume queue,应该是防止commitlog损坏,进行修复,这里可以看到文件00000012532714569728出现问题。
-
commitlog数据到底哪里不一样?
首先主从的00000012532714569728文件md5值不一样,需要写代码详细分析,将主从节点的commitlog导出,编写如下代码程序进行测试:
public void testCommitLog() throws IOException { int fileSize = 1024 * 1024 * 1024; MappedFile mappedFile = new MappedFile("00000012532714569728", fileSize); mappedFile.setWrotePosition(fileSize); SelectMappedBufferResult result = mappedFile.selectMappedBuffer(0); ByteBuffer byteBuffer = result.getByteBuffer(); int curSize = 0; int msgNO = 0; while (byteBuffer.hasRemaining()) { int msgSize = byteBuffer.getInt(); int magicCode = byteBuffer.getInt(); ++msgNO; System.out.println("msgNO=" + msgNO + ",curSize=" + curSize + ",magicCode=" + magicCode + ",msgSize=" + curSize); curSize += msgSize; byteBuffer.position(curSize); if (magicCode == 0) { break; } } }
主节点commitlog的测试关键输出如下,部分省略:
msgNO=692026,curSize=995261690,magicCode=-626843481,msgSize=1498 msgNO=692027,curSize=995263188,magicCode=-626843481,msgSize=2953 msgNO=692028,curSize=995266141,magicCode=-626843481,msgSize=1057 msgNO=692029,curSize=995267198,magicCode=0,msgSize=0
即,主节点的commitlog到msgNO=692029条消息后,magicCode和msgSize都为0,即从此之后没有数据了。
从节点commitlog的测试关键输出如下,部分省略:
msgNO=692026,curSize=995261690,magicCode=-626843481,msgSize=1498 msgNO=692027,curSize=995263188,magicCode=-626843481,msgSize=2953 msgNO=692028,curSize=995266141,magicCode=-626843481,msgSize=1057 msgNO=692029,curSize=995267198,magicCode=-626843481,msgSize=3190 msgNO=692030,curSize=995270388,magicCode=-626843481,msgSize=1691 msgNO=692031,curSize=995272079,magicCode=-626843481,msgSize=1627 ... ... msgNO=692084,curSize=995363023,magicCode=-626843481,msgSize=655 msgNO=692085,curSize=995363678,magicCode=-626843481,msgSize=918 msgNO=692086,curSize=995364596,magicCode=-626843481,msgSize=2542 msgNO=692087,curSize=995367138,magicCode=-444028443,msgSize=-1930845766 java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.rocketmq.store.MappedFileTest.testCommitLog(MappedFileTest.java:79)
从节点的commitlog到msgNO=692029条消息后,仍然是合法的数据,一直到msgNO=692087后,magicCode和msgSize出现负值。
从这里可以直接得出答案,由于从节点的commitlog存在非法数据,导致
ReputMessageService
在构建相关索引文件时,出现无限循环的情况。 -
从节点的非法数据是如何写进去的?
先看下整个事件的顺序:
- 主节点宕机
- 从节点接管消费,此时日志正常
- 主节点机器恢复,启动主节点
- 从节点出现found a illegal magic code
- 重启从节点后依然无法恢复
根据上面的顺序,问题出现的时间点是第3步,即主节点启动后,从节点从主节点同步数据。
调查主节点的日志,发现如下日志:
2022-05-11 14:00:50 INFO AcceptSocketService - HAService receive new connection, /10.x.x.x:57510 2022-05-11 14:00:50 INFO ReadSocketService - ReadSocketService service started 2022-05-11 14:00:50 INFO WriteSocketService - WriteSocketService service started 2022-05-11 14:00:55 INFO ReadSocketService - slave[/10.x.x.x:57510] request offset 12533709936866 2022-05-11 14:00:55 INFO WriteSocketService - master transfer data from 12533709936866 to slave[/10.x.x.x:57510], and slave request 12533709936866
即从节点请求offset:12533709936866的数据。
而对应从节点的日志,如下:
2022-05-11 14:00:49 INFO BrokerControllerScheduledThread1 - update master address, OLD: 10.x.x.x:10912 NEW: 10.x.x.x:10912 2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5 2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5 2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5 2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5 2022-05-11 14:00:55 WARN ReputMessageService - found a illegal magic code 0xe588a9e5
即,主节点在2022-05-11 14:00:55把数据传输至从节点后,从节点就发生了
found a illegal magic code
。查看一下主节点数据同步
WriteSocketService.java
的部分代码:SelectMappedBufferResult selectResult = haService.getDefaultMessageStore().getCommitLogData(this.nextTransferFromWhere); int size = selectResult.getSize(); if (size > 1024 * 32) { size = 1024 * 32; } selectResult.getByteBuffer().limit(size);
即,最大传输的数据量为32K,也就是说,主节点传输的数据并不能保障是完整的消息数据。
看一下从节点接收数据
HAClient.java
的部分代码:while (true) { int diff = this.byteBufferRead.position() - this.dispatchPosition; if (diff >= msgHeaderSize) { if (diff >= (msgHeaderSize + bodySize)) { byte[] bodyData = byteBufferRead.array(); int dataStart = this.dispatchPosition + msgHeaderSize; HAService.this.defaultMessageStore.appendToCommitLog(masterPhyOffset, bodyData, dataStart, bodySize); } } }
即,只要数据包大于等于头数据+数据体大小时,就写入commitlog,从节点并没有对数据进行完整性校验。
想象一下如下情况:
- 主节点发送了一部分不完整的数据给从节点。
- 从节点接收到后发现数据不完整,等待主节点继续发送。
- 主节点宕机了。
此时,如果主节点commitlog已经完全持久化了,那么当主节点重启后,从节点还有可能获取到完整的数据。
看下本案例中,主从节点的commitlog:00000012532714569728的数据量:
- 主节点msgNO=692028
- 从节点msgNO=692086
从节点commitlog的消息量比主节点多692086-692028=58条!
也就是说,主节点重启后,从节点将永远无法从主节点拉取到需要的数据,因为从节点的数据多于主节点,主节点的部分数据丢失了!
由于主节点采用了异步刷盘,默认每500ms内凑够4页才会刷一次,所以存在数据丢失的可能。
三、解决方案
-
此问题的本质原因:主从同步过程中,由于数据不一致导致从节点无法正常同步数据。
-
此问题带来的影响:从节点无法构建consume queue,所以从节点的数据将无法查询,同时失去了主从高可用的意义。
-
解决方案:
由于从节点的commitlog数据跟主节点的不一致,而commitlog并不支持部分清除数据,故只能在主节点启动后,关闭从节点,删除其commitlog的最后一个数据文件,重启后,从主节点将自动拉取commitlog最后一个的数据文件。