上一篇 Fescar-TM 学习了 TM 的注册和大概的工作原理,接下来我们学习下RM。
1、简介
RM组件用来控制分支的事务,还负责分支注册、状态的汇报,并接收事务协调器的指令,驱动分支(本地)事务的提交和回滚。
2、源码学习
RM 和 TM 的初始化一样,在创建bean 代理的时候进行初始化,
2.1、RM初始化
GlobalTransactionScanner.java
private void initClient() {
if (LOGGER.isInfoEnabled()) {
LOGGER.info("Initializing Global Transaction Clients ... ");
}
if (StringUtils.isEmpty(applicationId) || StringUtils.isEmpty(txServiceGroup)) {
throw new IllegalArgumentException(
"applicationId: " + applicationId + ", txServiceGroup: " + txServiceGroup);
}
TMClient.init(applicationId, txServiceGroup);
if (LOGGER.isInfoEnabled()) {
LOGGER.info(
"Transaction Manager Client is initialized. applicationId[" + applicationId + "] txServiceGroup["
+ txServiceGroup + "]");
}
if ((AT_MODE & mode) > 0) {
RMClientAT.init(applicationId, txServiceGroup);
if (LOGGER.isInfoEnabled()) {
LOGGER.info(
"Resource Manager for AT Client is initialized. applicationId[" + applicationId
+ "] txServiceGroup["
+ txServiceGroup + "]");
}
}
if ((MT_MODE & mode) > 0) {
throw new NotSupportYetException();
}
if (LOGGER.isInfoEnabled()) {
LOGGER.info("Global Transaction Clients are initialized. ");
}
}
RMClientAT.init(applicationId, txServiceGroup),看名字只是针对于AT 模式的,MT模式暂时还不支持,关于 AT 和 MT 的模式,后面会时间再来学习下,这里先带过。
public class RMClientAT {
public static void init(String applicationId, String transactionServiceGroup) {
RmRpcClient rmRpcClient = RmRpcClient.getInstance(applicationId, transactionServiceGroup);
AsyncWorker asyncWorker = new AsyncWorker();
asyncWorker.init();
DataSourceManager.init(asyncWorker);
rmRpcClient.setResourceManager(DataSourceManager.get());
rmRpcClient.setClientMessageListener(new RmMessageListener(new RMHandlerAT()));
rmRpcClient.init();
}
}
public synchronized void init() {
LOGGER.info("Async Commit Buffer Limit: " + ASYNC_COMMIT_BUFFER_LIMIT);
timerExecutor = new ScheduledThreadPoolExecutor(1,
new NamedThreadFactory("AsyncWorker", 1, true));
timerExecutor.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
doBranchCommits();
} catch (Throwable e) {
LOGGER.info("Failed at async committing ... " + e.getMessage());
}
}
}, 10, 1000 * 1, TimeUnit.MILLISECONDS);
}
private void doBranchCommits() {
if (ASYNC_COMMIT_BUFFER.size() == 0) {
return;
}
Map<String, List<Phase2Context>> mappedContexts = new HashMap<>();
Iterator<Phase2Context> iterator = ASYNC_COMMIT_BUFFER.iterator();
while (iterator.hasNext()) {
Phase2Context commitContext = iterator.next();
List<Phase2Context> contextsGroupedByResourceId = mappedContexts.get(commitContext.resourceId);
if (contextsGroupedByResourceId == null) {
contextsGroupedByResourceId = new ArrayList<>();
mappedContexts.put(commitContext.resourceId, contextsGroupedByResourceId);
}
contextsGroupedByResourceId.add(commitContext);
iterator.remove();
}
for (String resourceId : mappedContexts.keySet()) {
Connection conn = null;
try {
try {
DataSourceProxy dataSourceProxy = DataSourceManager.get().get(resourceId);
conn = dataSourceProxy.getPlainConnection();
} catch (SQLException sqle) {
LOGGER.warn("Failed to get connection for async committing on " + resourceId, sqle);
continue;
}
List<Phase2Context> contextsGroupedByResourceId = mappedContexts.get(resourceId);
for (Phase2Context commitContext : contextsGroupedByResourceId) {
try {
UndoLogManager.deleteUndoLog(commitContext.xid, commitContext.branchId, conn);
} catch (Exception ex) {
LOGGER.warn("Failed to delete undo log [" + commitContext.branchId + "/" + commitContext.xid + "]", ex);
}
}
} finally {
if (conn != null) {
try {
conn.close();
} catch (SQLException closeEx) {
LOGGER.warn("Failed to close JDBC resource while deleting undo_log ", closeEx);
}
}
}
}
}
asyncWorker.init() 方法中创建了 一个计划任务线程池,每隔1s 中去处理 ASYNC_COMMIT_BUFFER 队列中的commit 事物rmRpcClient.setClientMessageListener(new RmMessageListener(new RMHandlerAT())); 注册一个listener 监听器去接收 TC 发送的指令,RM 根据指令去提交还是回滚,方法 rmRpcClient.init() 创建好了与TC 的链接,并每隔5s实时维护这个链接。
2.2、分支事物
RM需要控制分支事物,首先需要fescar 对数据源进行代理,才能加入自己的逻辑,再来看下数据源代理的结构
前面也提到了,fescar 也是采用两阶段的设计,在第一阶段就会提交本地事物,但是提交之前需要获取全局事物锁,sql 最终是交给statement 去执行,fescar 提供 StatementProxy,PreparedStatementProxy 分别代理了 stetament 和 PreparedStatement 最终会交给 ExecuteTemplat ,之后会使用 druid 去解析 sql 根据解析sql 出的执行动作决定创建哪一个动作执行者
ExecuteTemplate.java
public static <T, S extends Statement> T execute(SQLRecognizer sqlRecognizer,
StatementProxy<S> statementProxy,
StatementCallback<T, S> statementCallback,
Object... args) throws SQLException {
if (!RootContext.inGlobalTransaction()) {
// Just work as original statement
return statementCallback.execute(statementProxy.getTargetStatement(), args);
}
if (sqlRecognizer == null) {
// 解析sql
sqlRecognizer = SQLVisitorFactory.get(
statementProxy.getTargetSQL(),
statementProxy.getConnectionProxy().getDbType());
}
Executor<T> executor = null;
if (sqlRecognizer == null) {
executor = new PlainExecutor<T, S>(statementProxy, statementCallback);
} else {
// 判断SQL执行者
switch (sqlRecognizer.getSQLType()) {
case INSERT:
executor = new InsertExecutor<T, S>(statementProxy, statementCallback, sqlRecognizer);
break;
case UPDATE:
executor = new UpdateExecutor<T, S>(statementProxy, statementCallback, sqlRecognizer);
break;
case DELETE:
executor = new DeleteExecutor<T, S>(statementProxy, statementCallback, sqlRecognizer);
break;
case SELECT_FOR_UPDATE:
executor = new SelectForUpdateExecutor(statementProxy, statementCallback, sqlRecognizer);
break;
default:
executor = new PlainExecutor<T, S>(statementProxy, statementCallback);
break;
}
}
T rs = null;
try {
// 执行者 执行具体的操作
rs = executor.execute(args);
} catch (Throwable ex) {
if (ex instanceof SQLException) {
throw (SQLException) ex;
} else {
// Turn everything into SQLException
new SQLException(ex);
}
}
return rs;
}
BaseTransactionalExecutor.java
@Override
public Object execute(Object... args) throws Throwable {
// 获取当前XID
String xid = RootContext.getXID();
// 绑定到代理的链接中
statementProxy.getConnectionProxy().bind(xid);
// 执行具体内容
return doExecute(args);
}
AbstractDMLBaseExecutor.java
@Override
public T doExecute(Object... args) throws Throwable {
AbstractConnectionProxy connectionProxy = statementProxy.getConnectionProxy();
// 获取原有自动提交标志
if (connectionProxy.getAutoCommit()) {
return executeAutoCommitTrue(args);
} else {
return executeAutoCommitFalse(args);
}
}
// 自动提交为false
protected T executeAutoCommitFalse(Object[] args) throws Throwable {
// 创建前置快照
TableRecords beforeImage = beforeImage();
// 执行sql操作
T result = statementCallback.execute(statementProxy.getTargetStatement(), args);
// 创建后置快照
TableRecords afterImage = afterImage(beforeImage);
// 将前置和后置的快照信息 处置之后存入 ConnectionContext
statementProxy.getConnectionProxy().prepareUndoLog(sqlRecognizer.getSQLType(), sqlRecognizer.getTableName(), beforeImage, afterImage);
return result;
}
// 自动提交为true,最终还是调用自动提交为false 的方法,其他的为处理自动提交的标志
protected T executeAutoCommitTrue(Object[] args) throws Throwable {
T result = null;
AbstractConnectionProxy connectionProxy = statementProxy.getConnectionProxy();
LockRetryController lockRetryController = new LockRetryController();
try {
connectionProxy.setAutoCommit(false);
while (true) {
try {
// 调用自动提交为false 的方法
result = executeAutoCommitFalse(args);
connectionProxy.commit();
break;
} catch (LockConflictException lockConflict) {
lockRetryController.sleep(lockConflict);
}
}
} catch (Exception e) {
// when exception occur in finally,this exception will lost, so just print it here
LOGGER.error("exception occur", e);
throw e;
}
finally {
connectionProxy.setAutoCommit(true);
}
return result;
}
ConnectionProxy.java
public void prepareUndoLog(SQLType sqlType, String tableName, TableRecords beforeImage, TableRecords afterImage) throws SQLException {
if(beforeImage.getRows().size() == 0 && afterImage.getRows().size() == 0) {
return;
}
TableRecords lockKeyRecords = afterImage;
if (sqlType == SQLType.DELETE) {
lockKeyRecords = beforeImage;
}
// 这里是创建lock 值
String lockKeys = buildLockKey(lockKeyRecords);
context.appendLockKey(lockKeys);
// 创建undolog 的对象
SQLUndoLog sqlUndoLog = buildUndoItem(sqlType, tableName, beforeImage, afterImage);
context.appendUndoItem(sqlUndoLog);
}
executeAutoCommitTrue 的方法中需要注意的一个点就是,有一个while 的循环重试,每次数据生产前置快照数据的时候,通过 select ... for update 的查询,如果数据已经被事物占有的话,再给数据加上排它锁的话,事物需要进行等待,在等待10ms(默认) 重试获取次数 30 (默认)之后依然失败,那么会抛出异常 ,本地事物也就回滚了,如果执行成功,说明获取并且占有了本地事物锁
执行完 execute() 方法,接下来要进行 本地事物的提交
ConnectionProxy.java
@Override
public void commit() throws SQLException {
if (context.inGlobalTransaction()) {
try {
// 向TC 注册分支事物并且进行全局锁的校验,这里的锁结构在学习TC 的地方我们再来说一说
register();
} catch (TransactionException e) {
// 如果没有获取到全局事物锁,那么此处会抛出异常,
recognizeLockKeyConflictException(e);
}
try {
// 将之前保存的undolog 存储到数据库中
if (context.hasUndoLog()) {
UndoLogManager.flushUndoLogs(this);
}
targetConnection.commit();
} catch (Throwable ex) {
report(false);
if (ex instanceof SQLException) {
throw (SQLException) ex;
} else {
throw new SQLException(ex);
}
}
report(true);
context.reset();
} else {
targetConnection.commit();
}
}
获取到全局事物锁,执行commit的操作可能成功可能失败,之后调用report() 方法将成功或者失败的结果上报给TC
2.3、分支事物的提交
TC下发提交指令给RM,还记得之前初始化注册的监听器 RmMessageListener吗
RmMessageListener.java
@Override
public void onMessage(long msgId, String serverAddress, Object msg, ClientMessageSender sender) {
if (LOGGER.isInfoEnabled()) {
LOGGER.info("onMessage:" + msg);
}
if (msg instanceof BranchCommitRequest) {
handleBranchCommit(msgId, serverAddress, (BranchCommitRequest)msg, sender);
} else if (msg instanceof BranchRollbackRequest) {
handleBranchRollback(msgId, serverAddress, (BranchRollbackRequest)msg, sender);
}
}
handleBranchCommit 方法会将 提交的指令交给 AbstractRMHandlerAT 的 doBranchCommit 方法,最终交由 asyncWorker 添加到 待处理的队列中,
AsyncWorker.java
@Override
public BranchStatus branchCommit(String xid, long branchId, String resourceId, String applicationData) throws TransactionException {
if (ASYNC_COMMIT_BUFFER.size() < ASYNC_COMMIT_BUFFER_LIMIT) {
ASYNC_COMMIT_BUFFER.add(new Phase2Context(xid, branchId, resourceId, applicationData));
} else {
LOGGER.warn("Async commit buffer is FULL. Rejected branch [" + branchId + "/" + xid + "] will be handled by housekeeping later.");
}
return BranchStatus.PhaseTwo_Committed;
}
asyncWorker 在RM 初始化的时候,也就是上面提到的,创建了一个定时调度的线程池异步处理 提交的指令信息,最终在UndoLogManager.deleteUndoLog(commitContext.xid, commitContext.branchId, conn); 方法中对undolog 的数据进行删除。
2.4、分支事物的回滚
onMessage 方法判断 走 handleBranchRollback 方法,最终交由 DataSourceManager 的 branchRollback 方法进行事物的回滚操作
@Override
public BranchStatus branchRollback(String xid, long branchId, String resourceId, String applicationData) throws TransactionException {
DataSourceProxy dataSourceProxy = get(resourceId);
if (dataSourceProxy == null) {
throw new ShouldNeverHappenException();
}
try {
UndoLogManager.undo(dataSourceProxy, xid, branchId);
} catch (TransactionException te) {
if (te.getCode() == TransactionExceptionCode.BranchRollbackFailed_Unretriable) {
return BranchStatus.PhaseTwo_RollbackFailed_Unretryable;
} else {
return BranchStatus.PhaseTwo_RollbackFailed_Retryable;
}
}
return BranchStatus.PhaseTwo_Rollbacked;
}
undo 的方法中进行实物的回滚
public static void undo(DataSourceProxy dataSourceProxy, String xid, long branchId) throws TransactionException {
assertDbSupport(dataSourceProxy.getTargetDataSource().getDbType());
Connection conn = null;
ResultSet rs = null;
PreparedStatement selectPST = null;
try {
conn = dataSourceProxy.getPlainConnection();
// The entire undo process should run in a local transaction.
conn.setAutoCommit(false);
// Find UNDO LOG
selectPST = conn.prepareStatement(SELECT_UNDO_LOG_SQL);
selectPST.setLong(1, branchId);
selectPST.setString(2, xid);
rs = selectPST.executeQuery();
while (rs.next()) {
Blob b = rs.getBlob("rollback_info");
String rollbackInfo = StringUtils.blob2string(b);
BranchUndoLog branchUndoLog = UndoLogParserFactory.getInstance().decode(rollbackInfo);
for (SQLUndoLog sqlUndoLog : branchUndoLog.getSqlUndoLogs()) {
TableMeta tableMeta = TableMetaCache.getTableMeta(dataSourceProxy, sqlUndoLog.getTableName());
sqlUndoLog.setTableMeta(tableMeta);
AbstractUndoExecutor undoExecutor = UndoExecutorFactory.getUndoExecutor(dataSourceProxy.getDbType(), sqlUndoLog);
undoExecutor.executeOn(conn);
}
}
deleteUndoLog(xid, branchId, conn);
conn.commit();
} catch (Throwable e) {
if (conn != null) {
try {
conn.rollback();
} catch (SQLException rollbackEx) {
LOGGER.warn("Failed to close JDBC resource while undo ... ", rollbackEx);
}
}
throw new TransactionException(BranchRollbackFailed_Retriable, String.format("%s/%s", branchId, xid), e);
} finally {
try {
if (rs != null) {
rs.close();
}
if (selectPST != null) {
selectPST.close();
}
if (conn != null) {
conn.close();
}
} catch (SQLException closeEx) {
LOGGER.warn("Failed to close JDBC resource while undo ... ", closeEx);
}
}
}
根据XID 和 branchId 查询对应的undolog, 然后进行数据的覆盖,然后删除undolog 的日志,还记的在创建undolog 信息的时候,是包含前置快照和后置快照的,而现在只是获取的前置快照进行数据的恢复,按照文档上的说法是首先查询当前的数据与后置快照的数据进行对比,如果相同说明没有其他的事物对数据进行修改,再用前置快照的数据对数据进行恢复,如果不相同说明有其他的事物对数据进行了修改,此时如果再用前置的快照数据进行恢复将会产生错误,这部分的数据就变成的脏数据,目前fescar 一直处于版本迭代中,这部分的功能也许后面几个版本就能看到了。
总结:
1、RM 在初始化的时候创建了相关的任务调度线程池用于后续指令的处理
2、数据提交之前,需要申请全局事物锁并获取分支id,生产相关的前置及后置的数据快照
3、提交之前需要将快照信息存储在undolog之后在进行提交
4、提交之后需要向TC 报告执行的结果
5、接收TC 指令,进行事物的提交或者回滚的操作
6、如果TC 下发的是提交指令,那么交给 AsyncWorker 异步的处理undolog的数据
7、如果TC 下发的是回滚指令,那么跟据XID 和branchId 查找undolog 的日志,比对数据之后在进行回滚的操作
参考:https://github.com/alibaba/fescar/wiki/%E6%A6%82%E8%A7%88