RocketMQ事务消息的实现原理基于两阶段提交和定时事务状态回查来决定消息最终是提交还是回滚。
- 应用程序在事务内完成相关业务数据落库后,需要同步调用RocketMQ消息发送接口,发送状态为prepare的消息。消息发送成功后,RocketMQ服务器会回调RocketMQ消息发送者的事件监听程序,记录消息的本地事务状态,该相关标记与本地业务操作同属一个事务,确保消息发送与本地事务的原子性。
- RocketMQ在收到类型为prepare的消息时,会首先备份消息的原主题与原消息消费队列,然后将消息存储在主题为RMQ_SYS_TRANS_HALF_TOPIC的消息消费队列中。
- RocketMQ消息服务器开启一个定时任务,消费RMQ_SYS_TRANS_HALF_TOPIC的消息,向消息发送端(应用程序)发起消息事务状态回查,应用程序根据保存的事务状态回馈消息服务器事务的状态(提交、回滚、未知),如果是提交或回滚,则消息服务器提交或回滚消息,如果是未知,待下一次回查,RocketMQ允许设置一条消息的回查间隔与回查次数,如果在超过回查次数后依然无法获知消息的事务状态,则默认回滚消息。
事务消息发送流程
RocketMQ事务消息发送者为org.apache.rocketmq.client.producer.TransactionMQProducer。
TransactionMQProducer#sendMessageInTransaction
public TransactionSendResult sendMessageInTransaction(final Message msg,
final Object arg) throws MQClientException {
if (null == this.transactionListener) {
throw new MQClientException("TransactionListener is null", null);
}
msg.setTopic(NamespaceUtil.wrapNamespace(this.getNamespace(), msg.getTopic()));
return this.defaultMQProducerImpl.sendMessageInTransaction(msg, null, arg);
}
如果事件监听器为空,则直接返回异常,最终调用DefaultMQProducerImpl的sendMessageInTransaction方法。
public TransactionSendResult sendMessageInTransaction(final Message msg,
final LocalTransactionExecuter localTransactionExecuter, final Object arg)
throws MQClientException {
TransactionListener transactionListener = getCheckListener();
if (null == localTransactionExecuter && null == transactionListener) {
throw new MQClientException("tranExecutor is null", null);
}
// ignore DelayTimeLevel parameter
if (msg.getDelayTimeLevel() != 0) {
MessageAccessor.clearProperty(msg, MessageConst.PROPERTY_DELAY_TIME_LEVEL);
}
Validators.checkMessage(msg, this.defaultMQProducer);
SendResult sendResult = null;
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_TRANSACTION_PREPARED, "true");
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_PRODUCER_GROUP, this.defaultMQProducer.getProducerGroup());
try {
sendResult = this.send(msg);
} catch (Exception e) {
throw new MQClientException("send message Exception", e);
}
LocalTransactionState localTransactionState = LocalTransactionState.UNKNOW;
Throwable localException = null;
switch (sendResult.getSendStatus()) {
case SEND_OK: {
try {
if (sendResult.getTransactionId() != null) {
msg.putUserProperty("__transactionId__", sendResult.getTransactionId());
}
String transactionId = msg.getProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
if (null != transactionId && !"".equals(transactionId)) {
msg.setTransactionId(transactionId);
}
if (null != localTransactionExecuter) {
localTransactionState = localTransactionExecuter.executeLocalTransactionBranch(msg, arg);
} else if (transactionListener != null) {
log.debug("Used new transaction API");
localTransactionState = transactionListener.executeLocalTransaction(msg, arg);
}
if (null == localTransactionState) {
localTransactionState = LocalTransactionState.UNKNOW;
}
if (localTransactionState != LocalTransactionState.COMMIT_MESSAGE) {
log.info("executeLocalTransactionBranch return {}", localTransactionState);
log.info(msg.toString());
}
} catch (Throwable e) {
log.info("executeLocalTransactionBranch exception", e);
log.info(msg.toString());
localException = e;
}
}
break;
case FLUSH_DISK_TIMEOUT:
case FLUSH_SLAVE_TIMEOUT:
case SLAVE_NOT_AVAILABLE:
localTransactionState = LocalTransactionState.ROLLBACK_MESSAGE;
break;
default:
break;
}
try {
this.endTransaction(sendResult, localTransactionState, localException);
} catch (Exception e) {
log.warn("local transaction execute " + localTransactionState + ", but end broker transaction failed", e);
}
TransactionSendResult transactionSendResult = new TransactionSendResult();
transactionSendResult.setSendStatus(sendResult.getSendStatus());
transactionSendResult.setMessageQueue(sendResult.getMessageQueue());
transactionSendResult.setMsgId(sendResult.getMsgId());
transactionSendResult.setQueueOffset(sendResult.getQueueOffset());
transactionSendResult.setTransactionId(sendResult.getTransactionId());
transactionSendResult.setLocalTransactionState(localTransactionState);
return transactionSendResult;
}
步骤一
首先为消息添加属性,TRAN_MSG和PGROUP,分别表示消息为prepare消息、消息所属消息生产者组。设置消息生产者组的目的是在查询事务消息本地事务状态时,从该生产者组中随机选择一个消息生产者即可,然后通过同步调用方式向RocketMQ发送消息。
SendResult sendResult = null;
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_TRANSACTION_PREPARED, "true");
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_PRODUCER_GROUP, this.defaultMQProducer.getProducerGroup());
try {
sendResult = this.send(msg);
} catch (Exception e) {
throw new MQClientException("send message Exception", e);
}
步骤二
根据消息发送结果执行相应的操作。
- 如果消息发送成功,则执行TransactionListener#executeLocalTransaction方法,该方法的职责是记录事务消息的本地事务状态,例如可以通过将消息唯一ID存储在数据中,并且该方法与业务代码处于同一个事务,与业务事务要么一起成功,要么一起失败。这里是事务消息设计的关键理念之一,为后续的事务状态回查提供唯一依据。
- 如果消息发送失败,则设置本次事务状态为LocalTransactionState.ROLLBACK_MESSAGE。
LocalTransactionState localTransactionState = LocalTransactionState.UNKNOW;
Throwable localException = null;
switch (sendResult.getSendStatus()) {
case SEND_OK: {
try {
if (sendResult.getTransactionId() != null) {
msg.putUserProperty("__transactionId__", sendResult.getTransactionId());
}
String transactionId = msg.getProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
if (null != transactionId && !"".equals(transactionId)) {
msg.setTransactionId(transactionId);
}
if (null != localTransactionExecuter) {
localTransactionState = localTransactionExecuter.executeLocalTransactionBranch(msg, arg);
} else if (transactionListener != null) {
log.debug("Used new transaction API");
localTransactionState = transactionListener.executeLocalTransaction(msg, arg);
}
if (null == localTransactionState) {
localTransactionState = LocalTransactionState.UNKNOW;
}
if (localTransactionState != LocalTransactionState.COMMIT_MESSAGE) {
log.info("executeLocalTransactionBranch return {}", localTransactionState);
log.info(msg.toString());
}
} catch (Throwable e) {
log.info("executeLocalTransactionBranch exception", e);
log.info(msg.toString());
localException = e;
}
}
break;
case FLUSH_DISK_TIMEOUT:
case FLUSH_SLAVE_TIMEOUT:
case SLAVE_NOT_AVAILABLE:
localTransactionState = LocalTransactionState.ROLLBACK_MESSAGE;
break;
default:
break;
}
步骤三
结束事务。根据第二步返回的事务状态执行提交、回滚或暂时不处理事务。
- LocalTransactionState.COMMIT_MESSAGE:提交事务。
- LocalTransactionState.COMMIT_MESSAGE:回滚事务。
- LocalTransactionState.UNKNOW:结束事务,但不做任何处理。
由于this.endTransaction的执行,其业务事务并没有提交,故在使用事务消息TransactionListener #execute方法除了记录事务消息状态后,应该返回LocalTransaction.UNKNOW,事务消息的提交与回滚通过事务消息状态回查时再决定是否提交或回滚。
prepare消息发送的过程
在消息发送之前,如果消息为prepare类型,则设置消息标准为prepare消息类型,方便消息服务器正确识别事务类型的消息。
DefaultMQProducerImpl#sendKernelImpl
final String tranMsg = msg.getProperty(MessageConst.PROPERTY_TRANSACTION_PREPARED);
if (tranMsg != null && Boolean.parseBoolean(tranMsg)) {
sysFlag |= MessageSysFlag.TRANSACTION_PREPARED_TYPE;
}
Broker端在收到消息存储请求时,如果消息为prepare消息,则执行prepareMessage方法,否则走普通消息的存储流程。
SendMessageProcessor#asyncSendMessage
Map<String, String> origProps = MessageDecoder.string2messageProperties(requestHeader.getProperties());
String transFlag = origProps.get(MessageConst.PROPERTY_TRANSACTION_PREPARED);
if (transFlag != null && Boolean.parseBoolean(transFlag)) {
if (this.brokerController.getBrokerConfig().isRejectTransactionMessage()) {
response.setCode(ResponseCode.NO_PERMISSION);
response.setRemark(
"the broker[" + this.brokerController.getBrokerConfig().getBrokerIP1()
+ "] sending transaction message is forbidden");
return CompletableFuture.completedFuture(response);
}
putMessageResult = this.brokerController.getTransactionalMessageService().asyncPrepareMessage(msgInner);
} else {
putMessageResult = this.brokerController.getMessageStore().asyncPutMessage(msgInner);
}
这里是事务消息与非事务消息发送流程的主要区别,如果是事务消息则备份消息的原主题与原消息消费队列,然后将主题变更为RMQ_SYS_TRANS_HALF_TOPIC,消费队列变更为0,然后消息按照普通消息存储在commitlog文件进而转发到RMQ_SYS_TRANS_HALF_TOPIC主题对应的消息消费队列。也就是说,事务消息在未提交之前并不会存入消息原有主题,自然也不会被消费者消费。既然变更了主题,RocketMQ通常会采用定时任务(单独的线程)去消费该主题,然后将该消息在满足特定条件下恢复消息主题,进而被消费者消费。
TransactionalMessageBridge#putHalfMessage
public PutMessageResult putHalfMessage(MessageExtBrokerInner messageInner) {
return store.putMessage(parseHalfMessageInner(messageInner));
}
private MessageExtBrokerInner parseHalfMessageInner(MessageExtBrokerInner msgInner) {
MessageAccessor.putProperty(msgInner, MessageConst.PROPERTY_REAL_TOPIC, msgInner.getTopic());
MessageAccessor.putProperty(msgInner, MessageConst.PROPERTY_REAL_QUEUE_ID,
String.valueOf(msgInner.getQueueId()));
msgInner.setSysFlag(
MessageSysFlag.resetTransactionValue(msgInner.getSysFlag(), MessageSysFlag.TRANSACTION_NOT_TYPE));
msgInner.setTopic(TransactionalMessageUtil.buildHalfTopic());
msgInner.setQueueId(0);
msgInner.setPropertiesString(MessageDecoder.messageProperties2String(msgInner.getProperties()));
return msgInner;
}
提交或回滚事务
根据消息所属的消息队列获取Broker的IP与端口信息,然后发送结束事务命令,其关键就是根据本地执行事务的状态分别发送提交、回滚或“不作为”的命令。Broker服务端的结束事务处理器为:EndTransactionProcessor。
DefaultMQProducerImpl#endTransaction
public void endTransaction(
final SendResult sendResult,
final LocalTransactionState localTransactionState,
final Throwable localException) throws RemotingException, MQBrokerException, InterruptedException, UnknownHostException {
final MessageId id;
if (sendResult.getOffsetMsgId() != null) {
id = MessageDecoder.decodeMessageId(sendResult.getOffsetMsgId());
} else {
id = MessageDecoder.decodeMessageId(sendResult.getMsgId());
}
String transactionId = sendResult.getTransactionId();
final String brokerAddr = this.mQClientFactory.findBrokerAddressInPublish(sendResult.getMessageQueue().getBrokerName());
EndTransactionRequestHeader requestHeader = new EndTransactionRequestHeader();
requestHeader.setTransactionId(transactionId);
requestHeader.setCommitLogOffset(id.getOffset());
switch (localTransactionState) {
case COMMIT_MESSAGE:
requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_COMMIT_TYPE);
break;
case ROLLBACK_MESSAGE:
requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_ROLLBACK_TYPE);
break;
case UNKNOW:
requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_NOT_TYPE);
break;
default:
break;
}
requestHeader.setProducerGroup(this.defaultMQProducer.getProducerGroup());
requestHeader.setTranStateTableOffset(sendResult.getQueueOffset());
requestHeader.setMsgId(sendResult.getMsgId());
String remark = localException != null ? ("executeLocalTransactionBranch exception: " + localException.toString()) : null;
this.mQClientFactory.getMQClientAPIImpl().endTransactionOneway(brokerAddr, requestHeader, remark,
this.defaultMQProducer.getSendMsgTimeout());
}
如果结束事务动作为提交事务,则执行提交事务逻辑,其关键实现如下。
- 首先从结束事务请求命令中获取消息的物理偏移量(commitlogOffset),其实现逻辑由TransactionalMessageService#.commitMessage实现。
- 然后恢复消息的主题、消费队列,构建新的消息对象,由TransactionalMessageServ-ice#endMessageTransaction实现。
- 然后将消息再次存储在commitlog文件中,此时的消息主题则为业务方发送的消息,将被转发到对应的消息消费队列,供消息消费者消费,其实现由TransactionalMessage-Service#sendFinalMessage实现。
- 消息存储后,删除prepare消息,其实现方法并不是真正的删除,而是将prepare消息存储到RMQ_SYS_TRANS_OP_HALF_TOPIC主题中,表示该事务消息(prepare状态的消息)已经处理过(提交或回滚),为未处理的事务进行事务回查提供查找依据。
EndTransactionProcessor#processRequest
OperationResult result = new OperationResult();
if (MessageSysFlag.TRANSACTION_COMMIT_TYPE == requestHeader.getCommitOrRollback()) {
result = this.brokerController.getTransactionalMessageService().commitMessage(requestHeader);
if (result.getResponseCode() == ResponseCode.SUCCESS) {
RemotingCommand res = checkPrepareMessage(result.getPrepareMessage(), requestHeader);
if (res.getCode() == ResponseCode.SUCCESS) {
MessageExtBrokerInner msgInner = endMessageTransaction(result.getPrepareMessage());
msgInner.setSysFlag(MessageSysFlag.resetTransactionValue(msgInner.getSysFlag(), requestHeader.getCommitOrRollback()));
msgInner.setQueueOffset(requestHeader.getTranStateTableOffset());
msgInner.setPreparedTransactionOffset(requestHeader.getCommitLogOffset());
msgInner.setStoreTimestamp(result.getPrepareMessage().getStoreTimestamp());
MessageAccessor.clearProperty(msgInner, MessageConst.PROPERTY_TRANSACTION_PREPARED);
RemotingCommand sendResult = sendFinalMessage(msgInner);
if (sendResult.getCode() == ResponseCode.SUCCESS) {
this.brokerController.getTransactionalMessageService().deletePrepareMessage(result.getPrepareMessage());
}
return sendResult;
}
return res;
}
}
事务的回滚与提交的唯一差别是无须将消息恢复原主题,直接删除prepare消息即可,同样是将预处理消息存储在RMQ_SYS_TRANS_OP_HALF_TOPIC主题中,表示已处理过该消息。
事务消息回查事务状态
事务消息存储在消息服务器时主题被替换为RMQ_SYS_TRANS_HALF_TOPIC,执行完本地事务返回本地事务状态为UN_KNOW时,结束事务时将不做任何处理,而是通过事务状态定时回查以期得到发送端明确的事务操作(提交事务或回滚事务)。
RocketMQ通过TransactionalMessageCheckService线程定时去检测RMQ_SYS_TRANS_HALF_TOPIC主题中的消息,回查消息的事务状态。TransactionalMessageCheckService的检测频率默认为1分钟,可通过在broker.conf文件中设置transactionCheck Interval来改变默认值,单位为毫秒。
TransactionalMessageCheckService#onWaitEnd
protected void onWaitEnd() {
long timeout = brokerController.getBrokerConfig().getTransactionTimeOut();
int checkMax = brokerController.getBrokerConfig().getTransactionCheckMax();
long begin = System.currentTimeMillis();
log.info("Begin to check prepare message, begin time:{}", begin);
this.brokerController.getTransactionalMessageService().check(timeout, checkMax, this.brokerController.getTransactionalMessageCheckListener());
log.info("End to check prepare message, consumed time:{}", System.currentTimeMillis() - begin);
}
- transactionTimeOut:事务的过期时间,只有当消息的存储时间加上过期时间大于系统当前时间时,才对消息执行事务状态回查,否则在下一次周期中执行事务回查操作。
- transactionCheckMax:事务回查最大检测次数,如果超过最大检测次数还是无法获知消息的事务状态,RocketMQ将不会继续对消息进行事务状态回查,而是直接丢弃即相当于回滚事务。
public void check(long transactionTimeout, int transactionCheckMax,
AbstractTransactionalMessageCheckListener listener) {
try {
String topic = TopicValidator.RMQ_SYS_TRANS_HALF_TOPIC;
Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
if (msgQueues == null || msgQueues.size() == 0) {
log.warn("The queue of topic is empty :" + topic);
return;
}
log.debug("Check topic={}, queues={}", topic, msgQueues);
for (MessageQueue messageQueue : msgQueues) {
long startTime = System.currentTimeMillis();
MessageQueue opQueue = getOpQueue(messageQueue);
long halfOffset = transactionalMessageBridge.fetchConsumeOffset(messageQueue);
long opOffset = transactionalMessageBridge.fetchConsumeOffset(opQueue);
log.info("Before check, the queue={} msgOffset={} opOffset={}", messageQueue, halfOffset, opOffset);
if (halfOffset < 0 || opOffset < 0) {
log.error("MessageQueue: {} illegal offset read: {}, op offset: {},skip this queue", messageQueue,
halfOffset, opOffset);
continue;
}
List<Long> doneOpOffset = new ArrayList<>();
HashMap<Long, Long> removeMap = new HashMap<>();
PullResult pullResult = fillOpRemoveMap(removeMap, opQueue, opOffset, halfOffset, doneOpOffset);
if (null == pullResult) {
log.error("The queue={} check msgOffset={} with opOffset={} failed, pullResult is null",
messageQueue, halfOffset, opOffset);
continue;
}
// single thread
int getMessageNullCount = 1;
long newOffset = halfOffset;
long i = halfOffset;
while (true) {
if (System.currentTimeMillis() - startTime > MAX_PROCESS_TIME_LIMIT) {
log.info("Queue={} process time reach max={}", messageQueue, MAX_PROCESS_TIME_LIMIT);
break;
}
if (removeMap.containsKey(i)) {
log.info("Half offset {} has been committed/rolled back", i);
Long removedOpOffset = removeMap.remove(i);
doneOpOffset.add(removedOpOffset);
} else {
GetResult getResult = getHalfMsg(messageQueue, i);
MessageExt msgExt = getResult.getMsg();
if (msgExt == null) {
if (getMessageNullCount++ > MAX_RETRY_COUNT_WHEN_HALF_NULL) {
break;
}
if (getResult.getPullResult().getPullStatus() == PullStatus.NO_NEW_MSG) {
log.debug("No new msg, the miss offset={} in={}, continue check={}, pull result={}", i,
messageQueue, getMessageNullCount, getResult.getPullResult());
break;
} else {
log.info("Illegal offset, the miss offset={} in={}, continue check={}, pull result={}",
i, messageQueue, getMessageNullCount, getResult.getPullResult());
i = getResult.getPullResult().getNextBeginOffset();
newOffset = i;
continue;
}
}
if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {
listener.resolveDiscardMsg(msgExt);
newOffset = i + 1;
i++;
continue;
}
if (msgExt.getStoreTimestamp() >= startTime) {
log.debug("Fresh stored. the miss offset={}, check it later, store={}", i,
new Date(msgExt.getStoreTimestamp()));
break;
}
long valueOfCurrentMinusBorn = System.currentTimeMillis() - msgExt.getBornTimestamp();
long checkImmunityTime = transactionTimeout;
String checkImmunityTimeStr = msgExt.getUserProperty(MessageConst.PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS);
if (null != checkImmunityTimeStr) {
checkImmunityTime = getImmunityTime(checkImmunityTimeStr, transactionTimeout);
if (valueOfCurrentMinusBorn < checkImmunityTime) {
if (checkPrepareQueueOffset(removeMap, doneOpOffset, msgExt)) {
newOffset = i + 1;
i++;
continue;
}
}
} else {
if ((0 <= valueOfCurrentMinusBorn) && (valueOfCurrentMinusBorn < checkImmunityTime)) {
log.debug("New arrived, the miss offset={}, check it later checkImmunity={}, born={}", i,
checkImmunityTime, new Date(msgExt.getBornTimestamp()));
break;
}
}
List<MessageExt> opMsg = pullResult.getMsgFoundList();
boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
|| (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
|| (valueOfCurrentMinusBorn <= -1);
if (isNeedCheck) {
if (!putBackHalfMsgQueue(msgExt, i)) {
continue;
}
listener.resolveHalfMsg(msgExt);
} else {
pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(), halfOffset, doneOpOffset);
log.debug("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
messageQueue, pullResult);
continue;
}
}
newOffset = i + 1;
i++;
}
if (newOffset != halfOffset) {
transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
}
long newOpOffset = calculateOpOffset(doneOpOffset, opOffset);
if (newOpOffset != opOffset) {
transactionalMessageBridge.updateConsumeOffset(opQueue, newOpOffset);
}
}
} catch (Throwable e) {
log.error("Check error", e);
}
}
获取RMQ_SYS_TRANS_HALF_TOPIC主题下的所有消息队列,然后依次处理。
TransactionalMessageServiceImpl#check
String topic = TopicValidator.RMQ_SYS_TRANS_HALF_TOPIC;
Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
if (msgQueues == null || msgQueues.size() == 0) {
log.warn("The queue of topic is empty :" + topic);
return;
}
根据事务消息消费队列获取与之对应的消息队列,其实就是获取已处理消息的消息消费队列,其主题为:RMQ_SYS_TRANS_OP_HALF_TOPIC。
long startTime = System.currentTimeMillis();
MessageQueue opQueue = getOpQueue(messageQueue);
long halfOffset = transactionalMessageBridge.fetchConsumeOffset(messageQueue);
long opOffset = transactionalMessageBridge.fetchConsumeOffset(opQueue);
log.info("Before check, the queue={} msgOffset={} opOffset={}", messageQueue, halfOffset, opOffset);
if (halfOffset < 0 || opOffset < 0) {
log.error("MessageQueue: {} illegal offset read: {}, op offset: {},skip this queue", messageQueue,
halfOffset, opOffset);
continue;
}
fillOpRemoveMap主要的作用是根据当前的处理进度依次从已处理队列拉取32条消息,方便判断当前处理的消息是否已经处理过,如果处理过则无须再次发送事务状态回查请求,避免重复发送事务回查请求。事务消息的处理涉及如下两个主题。
- RMQ_SYS_TRANS_HALF_TOPIC:prepare消息的主题,事务消息首先进入到该主题。
- RMQ_SYS_TRANS_OP_HALF_TOPIC:当消息服务器收到事务消息的提交或回滚请求后,会将消息存储在该主题下。
List<Long> doneOpOffset = new ArrayList<>();
HashMap<Long, Long> removeMap = new HashMap<>();
PullResult pullResult = fillOpRemoveMap(removeMap, opQueue, opOffset, halfOffset, doneOpOffset);
if (null == pullResult) {
log.error("The queue={} check msgOffset={} with opOffset={} failed, pullResult is null",
messageQueue, halfOffset, opOffset);
continue;
}
代码@1:先解释几个局部变量的含义。
- getMessageNullCount :获取空消息的次数。
- newOffset :当前处理RMQ_SYS_TRANS_HALF_TOPIC#queueId的最新进度。
- i:当前处理消息的队列偏移量,其主题依然为RMQ_SYS_TRANS_HALF_TOPIC。
代码@2:RocketMQ处理任务的一个通用处理逻辑就是为每个任务一次只分配某个固定时长,超过该时长则需等待下次任务调度。RocketMQ为待检测主题RMQ_SYS_TRANS_HALF_TOPIC的每个队列做事务状态回查,一次最多不超过60秒,目前该值不可配置。
代码@3:如果该消息已被处理,则继续处理下一条消息。
代码@4:根据消息队列偏移量i从消费队列中获取消息。
代码@5:从待处理任务队列中拉取消息,如果未拉取到消息,则根据允许重复次数进行操作,默认重试一次,目前不可配置。
代码@6:判断该消息是否需要discard(吞没、丢弃、不处理)或skip(跳过),其依据如下。
- needDiscard依据:如果该消息回查的次数超过允许的最大回查次数,则该消息将被丢弃,即事务消息提交失败,具体实现方式为每回查一次,在消息属性TRANSACTION_CHECK_TIMES中增1,默认最大回查次数为5次。
- needSkip依据:如果事务消息超过文件的过期时间,默认为72小时(具体请查看RocketMQ过期文件相关内容),则跳过该消息。
代码@7:处理事务超时相关概念,先解释几个局部变量。
- valueOfCurrentMinusBorn:消息已存储的时间,为系统当前时间减去消息存储的时间戳。
- checkImmunityTime:立即检测事务消息的时间,其设计的意义是,应用程序在发送事务消息后,事务不会马上提交,该时间就是假设事务消息发送成功后,应用程序事务提交的时间,在这段时间内,RocketMQ任务事务未提交,故不应该在这个时间段向应用程序发送回查请求。
- transactionTimeout:事务消息的超时时间,这个时间是从OP拉取的消息的最后一条消息的存储时间与check方法开始的时间,如果时间差超过了transactionTimeout,就算时间小于checkImmunityTime时间,也发送事务回查指令。
代码@8:如果消息指定了事务消息过期时间属性(PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS),如果当前时间已超过该值。
代码@9:如果当前时间还未过(应用程序事务结束时间),则跳出本次处理,等下一次再试。
代码@10:判断是否需要发送事务回查消息,具体逻辑如下。
- 如果操作队列(RMQ_SYS_TRANS_OP_HALF_TOPIC)中没有已处理消息并且已经超过应用程序事务结束时间即transactionTimeOut值。
- 如果操作队列不为空并且最后一条消息的存储时间已经超过transactionTimeOut值。
代码@11:如果需要发送事务状态回查消息,则先将消息再次发送到RMQ_SYS_TRANS_HALF_TOPIC主题中,发送成功则返回true,否则返回false。
代码@12:如果无法判断是否发送回查消息,则加载更多的已处理消息进行筛选。
代码@13:保存(Prepare)消息队列的回查进度。
代码@14:保存处理队列(OP)的进度。
int getMessageNullCount = 1;
long newOffset = halfOffset;
long i = halfOffset; //@1
while (true) {
if (System.currentTimeMillis() - startTime > MAX_PROCESS_TIME_LIMIT) {//@2
log.info("Queue={} process time reach max={}", messageQueue, MAX_PROCESS_TIME_LIMIT);
break;
}
if (removeMap.containsKey(i)) {//@3
log.info("Half offset {} has been committed/rolled back", i);
Long removedOpOffset = removeMap.remove(i);
doneOpOffset.add(removedOpOffset);
} else {
GetResult getResult = getHalfMsg(messageQueue, i);//@4
MessageExt msgExt = getResult.getMsg();
if (msgExt == null) {//@5
if (getMessageNullCount++ > MAX_RETRY_COUNT_WHEN_HALF_NULL) {
break;
}
if (getResult.getPullResult().getPullStatus() == PullStatus.NO_NEW_MSG) {
log.debug("No new msg, the miss offset={} in={}, continue check={}, pull result={}", i,
messageQueue, getMessageNullCount, getResult.getPullResult());
break;
} else {
log.info("Illegal offset, the miss offset={} in={}, continue check={}, pull result={}",
i, messageQueue, getMessageNullCount, getResult.getPullResult());
i = getResult.getPullResult().getNextBeginOffset();
newOffset = i;
continue;
}
}
if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {//@6
listener.resolveDiscardMsg(msgExt);
newOffset = i + 1;
i++;
continue;
}
if (msgExt.getStoreTimestamp() >= startTime) {
log.debug("Fresh stored. the miss offset={}, check it later, store={}", i,
new Date(msgExt.getStoreTimestamp()));
break;
}
long valueOfCurrentMinusBorn = System.currentTimeMillis() - msgExt.getBornTimestamp();//@7
long checkImmunityTime = transactionTimeout;
String checkImmunityTimeStr = msgExt.getUserProperty(MessageConst.PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS);
if (null != checkImmunityTimeStr) {//@8
checkImmunityTime = getImmunityTime(checkImmunityTimeStr, transactionTimeout);
if (valueOfCurrentMinusBorn < checkImmunityTime) {
if (checkPrepareQueueOffset(removeMap, doneOpOffset, msgExt)) {
newOffset = i + 1;
i++;
continue;
}
}
} else {//@9
if ((0 <= valueOfCurrentMinusBorn) && (valueOfCurrentMinusBorn < checkImmunityTime)) {
log.debug("New arrived, the miss offset={}, check it later checkImmunity={}, born={}", i,
checkImmunityTime, new Date(msgExt.getBornTimestamp()));
break;
}
}
List<MessageExt> opMsg = pullResult.getMsgFoundList();
boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
|| (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
|| (valueOfCurrentMinusBorn <= -1);
//@10
if (isNeedCheck) {
if (!putBackHalfMsgQueue(msgExt, i)) {
//@11
continue;
}
listener.resolveHalfMsg(msgExt);
} else {
pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(), halfOffset, doneOpOffset);//@12
log.debug("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
messageQueue, pullResult);
continue;
}
}
newOffset = i + 1;
i++;
}
if (newOffset != halfOffset) {//@13
transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
}
long newOpOffset = calculateOpOffset(doneOpOffset, opOffset);
if (newOpOffset != opOffset) {//@14
transactionalMessageBridge.updateConsumeOffset(opQueue, newOpOffset);
}
总结
RocketMQ事务消息基于两阶段提交和事务状态回查机制来实现,所谓的两阶段提交,即首先发送prepare消息,待事务提交或回滚时发送commit、rollback命令。再结合定时任务,RocketMQ使用专门的线程以特定的频率对RocketMQ服务器上的prepare信息进行处理,向发送端查询事务消息的状态来决定是否提交或回滚消息。