RocketMQ的存储与读写是基于NIO的内存映射机制(MappedByteBuffer)的,消息存储时首先将消息追加到内存中,再根据配置的刷盘策略在不同时间刷盘。
1.刷盘策略
1.1刷盘策略分类:
- 同步刷盘,表示消息写入到内存之后需要立刻刷到磁盘文件中。
- 异步刷盘,表示消息写入内存成功之后就返回,由MQ定时将数据刷入到磁盘中,会有一定的数据丢失风险。
RocketMQ使用一个单独的线程执行刷盘操作。通过在broker.conf配置文件中配置flushDiskType来设定刷盘方式,可选值为ASYNC_FLUSH(异步刷盘:默认)、SYNC_FLUSH(同步刷盘)
2.刷盘流程
可以看到上图有两个执行流程:
- 直接通过内存映射文件,通过flush刷新到磁盘
- 当异步刷盘且启用了堆外内存池的时候,先write到writeBuffer,然后commit到Filechannel,最后flush到磁盘
在前面的broker消息接收流程(写入commitLog)章节简单的介绍到了CommitLog#asyncPutMessages()。broker接收到生产者发来的消息后,经过SendMessageProcessor处理后会写入到CommitLog文件中,
中间还要经过刷盘策略:CommitLog#submitFlushRequest():
public CompletableFuture<PutMessageStatus> submitFlushRequest(AppendMessageResult result, MessageExt messageExt) {
// Synchronization flush 同步刷盘
if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
//获取GroupCommitService
final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
// 判断是否等待
if (messageExt.isWaitStoreMsgOK()) {
//构建GroupCommitRequest对象
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes(),
this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
//向requestsRead集合中添加要刷盘的内容,会有一个单独的线程执行刷盘任务
service.putRequest(request);
return request.future();
} else {
service.wakeup();
return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
}
}
// Asynchronous flush 异步刷盘
else {
// 如果未使用暂存池
if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
// 唤醒刷盘线程进行刷盘
flushCommitLogService.wakeup();
} else {
// 如果使用暂存池,使用commitLogService,先将数据写入到FILECHANNEL,然后统一进行刷盘
commitLogService.wakeup();
}
// 返回结果
return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
}
}
3.刷盘线程的启动
最开始介绍到RocketMQ使用一个单独的线程执行刷盘操作,实际在BrokerController启动时也调用了CommitLog.start();
public class BrokerController {
public void start() throws Exception {
if (this.messageStore != null) {
// 启动
this.messageStore.start();
}
// ...
}
}
public class DefaultMessageStore implements MessageStore {
/**
* @throws Exception
*/
public void start() throws Exception {
// ...
this.flushConsumeQueueService.start();
// 调用CommitLog的启动方法
this.commitLog.start();
this.storeStatsService.start();
// ...
}
}
}
在实例化CommitLog对象时构造方法中也定义了实例化GroupCommitService
public class CommitLog {
private final FlushCommitLogService flushCommitLogService; // 刷盘
private final FlushCommitLogService commitLogService; // commitLogService
.......
public CommitLog(final DefaultMessageStore defaultMessageStore) {
......
//判断是否同步刷盘
if (FlushDiskType.SYNC_FLUSH == defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
this.flushCommitLogService = new GroupCommitService();
} else {
this.flushCommitLogService = new FlushRealTimeService();
}
......
}
......
public void start() {
// 启动刷盘的线程
this.flushCommitLogService.start();
flushDiskWatcher.setDaemon(true);
if (defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
this.commitLogService.start();
}
}
- 同步刷盘处理的对象:GroupCommitService
- 异步刷盘处理的对象:FlushRealTimeService
4 .同步刷盘
public CompletableFuture<PutMessageStatus> submitFlushRequest(AppendMessageResult result, MessageExt messageExt) {
// Synchronization flush 同步刷盘
if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
//获取GroupCommitService
final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
// 判断是否等待
if (messageExt.isWaitStoreMsgOK()) {
//构建GroupCommitRequest对象
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes(),
this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
//向requestsRead集合中添加要刷盘的内容,会有一个单独的线程执行刷盘任务
service.putRequest(request);
return request.future();
} else {
service.wakeup();
return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
}
}
// Asynchronous flush 异步刷盘
else {
......
}
}
上面代码执行同步刷盘时,可以分为几个步骤:
- 获取GroupCommitService
- 构建GroupCommitRequest对象
- 然后向GroupCommitService中添加要执行刷盘的数据
4.1GroupCommitRequest
- nextOffset:写入位置偏移量+写入数据字节数,也就是本次刷盘成功后应该对应的flush偏移量
- flushOKFuture:刷盘结果
- timeoutMillis:刷盘的超时时间,超过超时时间还未刷盘完毕会被认为超时
public static class GroupCommitRequest {
// 刷盘点偏移量
private final long nextOffset;
// 刷盘状态
private CompletableFuture<PutMessageStatus> flushOKFuture = new CompletableFuture<>();
private final long startTimestamp = System.currentTimeMillis();
// 超时时间
private long timeoutMillis = Long.MAX_VALUE;
public GroupCommitRequest(long nextOffset, long timeoutMillis) {
this.nextOffset = nextOffset;
this.timeoutMillis = timeoutMillis;
}
public void wakeupCustomer(final PutMessageStatus putMessageStatus) {
// todo 在这里调用 结束刷盘,设置刷盘状态
this.flushOKFuture.complete(putMessageStatus);
}
4.2 GroupCommitService处理刷盘
前面也只看到向GroupCommitService中添加要执行刷盘的数据,并没有看到在哪里执行的呀。因为GroupCommitService继承了ServiceThread单独开启了一个线程。在BrokerController启动时也调用了CommitLog.start();下面看到单独线程的执行方法GroupCommitService#run()
abstract class FlushCommitLogService extends ServiceThread {
protected static final int RETRY_TIMES_OVER = 10;
}
class GroupCommitService extends FlushCommitLogService {
public void run() {
CommitLog.log.info(this.getServiceName() + " service started");
//判断需要是否等待
while (!this.isStopped()) {
try {
//等待被wakeup()唤醒
this.waitForRunning(10);
//todo 执行刷盘
this.doCommit();
} catch (Exception e) {
CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
}
}
// 正常情况下休眠,等待请求的到来,然后刷新
// 请求,然后刷新
try {
Thread.sleep(10);
} catch (InterruptedException e) {
CommitLog.log.warn("GroupCommitService Exception, ", e);
}
synchronized (this) {
this.swapRequests();
}
this.doCommit();
CommitLog.log.info(this.getServiceName() + " service end");
}
}
4.3doCommit()执行刷盘
在唤醒后会执行CommmitLog#doCommit()方法执行刷盘
private void doCommit() {
synchronized (this.requestsRead) {
if (!this.requestsRead.isEmpty()) {
// 遍历刷盘数据列表
for (GroupCommitRequest req : this.requestsRead) {
// There may be a message in the next file, so a maximum of
// two times the flush
// 获取映射文件的flush位置,判断是否大于请求设定的刷盘位置
boolean flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
//下一个文件中可能有消息,所以最大值为两次刷新,请求1次+重试1次
for (int i = 0; i < 2 && !flushOK; i++) {
// todo 刷盘操作
CommitLog.this.mappedFileQueue.flush(0);
// 由于CommitLog大小为1G,所以本次刷完之后,如果当前已经刷入的偏移量小于请求设定的位置,
// 表示数据未刷完,需要继续刷,反之表示数据已经刷完,flushOK为true,for循环条件不满足结束执行
flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
}
// todo 唤醒消息发送线程并通知刷盘结果
req.wakeupCustomer(flushOK ? PutMessageStatus.PUT_OK : PutMessageStatus.FLUSH_DISK_TIMEOUT);
}
long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
if (storeTimestamp > 0) {
CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
}
// 请求处理完之后清空
this.requestsRead.clear();
} else {
// Because of individual messages is set to not sync flush, it
// will come to this process
CommitLog.this.mappedFileQueue.flush(0);
}
}
}
处理逻辑如下:
- 获取CommitLog映射文件记录的刷盘位置偏移量flushedWhere,判断是否大于请求设定的刷盘位置偏移量nextOffset,正常情况下flush的位置应该小于本次刷入数据后的偏移量,所以如果flush位置大于等于本次请求设置的flush偏移量,本次将不能进行刷盘
- 开启两次刷盘操作,因为Commit文件最大为1G,如果刷盘数据写满了第一个Commit后需要往下一个文件继续写,如果第一次刷盘就已经往CommitLog文件写完所有的消息数据。则flushOK=true;在下一次循环中不满足执行条件。不会继续执行刷盘。
- 请求处理之后会清空requestsRead。
5. 异步刷盘
public CompletableFuture<PutMessageStatus> submitFlushRequest(AppendMessageResult result, MessageExt messageExt) {
// Synchronization flush 同步刷盘
if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
......
// Asynchronous flush 异步刷盘
else {
// 如果未使用暂存池
if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
// 唤醒刷盘线程进行刷盘
flushCommitLogService.wakeup();
} else {
// 如果使用暂存池,使用commitLogService,先将数据写入到FILECHANNEL,然后统一进行刷盘
commitLogService.wakeup();
}
// 返回结果
return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
}
}
首先会判断是否使用了暂存池,如果未开启调用flushCommitLogService的wakeup唤醒刷盘线程,否则使用commitLogService先将数据写入到FileChannel,然后统一进行刷盘
在构造CommitLog能看到如果是异步刷盘,则会实例化FlushRealTimeService作为实现类,刷盘策略都会单独启动一个线程执行刷盘操作。所以看到线程的执行内容:
FlushRealTimeService#run()
class FlushRealTimeService extends FlushCommitLogService {
private long lastFlushTimestamp = 0;
private long printTimes = 0;
public void run() {
CommitLog.log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
// 默认为false,表示使用await方法等待;如果为true,表示使用Thread.sleep方法等待
boolean flushCommitLogTimed = CommitLog.this.defaultMessageStore.getMessageStoreConfig().isFlushCommitLogTimed();
// 线程任务运行间隔时间
int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushIntervalCommitLog();
// 一次提交任务至少包含的页数,如果待提交数据不足,小于该参数配置的值,将忽略本次提交任务,默认4页
int flushPhysicQueueLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogLeastPages();
// 两次真实刷盘任务的最大间隔时间,默认10s
int flushPhysicQueueThoroughInterval =
CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogThoroughInterval();
boolean printFlushProgress = false;
// Print flush progress
long currentTimeMillis = System.currentTimeMillis();
if (currentTimeMillis >= (this.lastFlushTimestamp + flushPhysicQueueThoroughInterval)) {
this.lastFlushTimestamp = currentTimeMillis;
flushPhysicQueueLeastPages = 0;
printFlushProgress = (printTimes++ % 10) == 0;
}
try {
// 执行一次刷盘任务前先等待指定时间间隔
if (flushCommitLogTimed) {
Thread.sleep(interval);
} else {
// 等待flush被唤醒
this.waitForRunning(interval);
}
//是否打印进度
if (true) {
this.printFlushProgress();
}
long begin = System.currentTimeMillis();
CommitLog.this.mappedFileQueue.flush(flushPhysicQueueLeastPages);
long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
if (storeTimestamp > 0) {
CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
}
long past = System.currentTimeMillis() - begin;
if (past > 500) {
log.info("Flush data to disk costs {} ms", past);
}
} catch (Throwable e) {
CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
this.printFlushProgress();
}
}
// Normal shutdown, to ensure that all the flush before exit
//如果服务停止,确保数据被刷盘完毕
boolean result = false;
for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) {
//todo 执行刷盘
result = CommitLog.this.mappedFileQueue.flush(0);
CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + " times " + (result ? "OK" : "Not OK"));
}
this.printFlushProgress();
CommitLog.log.info(this.getServiceName() + " service end");
}
刷盘在MappedFileQueue#flush()
public class MappedFileQueue {
protected long flushedWhere = 0; // flush的位置偏移量
private long committedWhere = 0; // 提交的位置偏移量
// flush刷盘
public boolean flush(final int flushLeastPages) {
boolean result = true;
// 获取flush的位置偏移量映射文件
MappedFile mappedFile = this.findMappedFileByOffset(this.flushedWhere, this.flushedWhere == 0);
if (mappedFile != null) {
// 获取时间戳
long tmpTimeStamp = mappedFile.getStoreTimestamp();
// todo 调用MappedFile的flush方法进行刷盘,返回刷盘后的偏移量
int offset = mappedFile.flush(flushLeastPages);
// 计算最新的flush偏移量
long where = mappedFile.getFileFromOffset() + offset;
result = where == this.flushedWhere;
// 更新flush偏移量
this.flushedWhere = where;
if (0 == flushLeastPages) {
this.storeTimestamp = tmpTimeStamp;
}
}
// 返回flush的偏移量
return result;
}
}
处理逻辑如下:
- 根据 flush的位置偏移量获取映射文件
- 调用mappedFile的flush方法进行刷盘,并返回刷盘后的位置偏移量
- 计算最新的flush偏移量
- 更新flushedWhere的值为最新的flush偏移量
最后进入到了MappedFile#flush()
public class MappedFile extends ReferenceResource {
protected final AtomicInteger wrotePosition = new AtomicInteger(0);
protected final AtomicInteger committedPosition = new AtomicInteger(0);
private final AtomicInteger flushedPosition = new AtomicInteger(0);
/**
* 进行刷盘并返回flush后的偏移量
*/
public int flush(final int flushLeastPages) {
// 是否可以刷盘
if (this.isAbleToFlush(flushLeastPages)) {
if (this.hold()) {
int value = getReadPosition();
try {
// 如果writeBuffer不为空
if (writeBuffer != null || this.fileChannel.position() != 0) {
// 将数据刷到硬盘
this.fileChannel.force(false);
} else {
this.mappedByteBuffer.force();
}
} catch (Throwable e) {
log.error("Error occurred when force data to disk.", e);
}
// 记录flush位置
this.flushedPosition.set(value);
this.release();
} else {
log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
this.flushedPosition.set(getReadPosition());
}
}
// 返回flush位置
return this.getFlushedPosition();
}
// 是否可以刷盘
private boolean isAbleToFlush(final int flushLeastPages) {
// 获取上次flush位置
int flush = this.flushedPosition.get();
// 写入位置偏移量
int write = getReadPosition();
if (this.isFull()) {
return true;
}
// 如果flush的页数大于0,校验本次flush的页数是否满足条件
if (flushLeastPages > 0) {
// 本次flush的页数:写入位置偏移量/OS_PAGE_SIZE - 上次flush位置偏移量/OS_PAGE_SIZE,是否大于flushLeastPages
return ((write / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE)) >= flushLeastPages;
}
// 写入位置偏移量是否大于flush位置偏移量
return write > flush;
}
// 文件是否已写满
public boolean isFull() {
// 文件大小是否与写入数据位置相等
return this.fileSize == this.wrotePosition.get();
}
/**
* 返回当前有效数据的位置
*/
public int getReadPosition() {
// 如果writeBuffer为空使用写入位置,否则使用提交位置
return this.writeBuffer == null ? this.wrotePosition.get() : this.committedPosition.get();
}
}
6.异步刷盘有消息丢失可能性的原因:
最后再来看看为什么最开始说异步刷盘会有一定的数据丢失风险,还是回到CommitLog#submitFlushRequest():
可以看到同步刷盘等待被刷盘线程任务完成后唤醒,并调用future()方法获取CompletableFuture,返回刷盘线程执行结果。
而异步刷盘唤醒刷盘线程后,就直接返回