mysql5.7以后的版本支持了xa prepare事务的持久化,这使得基于mysql
xa的分布式事务方案变的可行。但mysql目前的XA实现在极端故障场景下是会出现事务丢失或innodb数据与binlog不一致的情况。如下是几个这种场景下的官方确认的bug:
1. TDSQL报告的bug
TDSQL报告bug后也给了修复方案,但应该是修的有问题,下一篇博客里会进行分析。
2.
我和我的同事之前报告的bug:
我们对其中部分的问题提供了修复patch,但那不够,后续的blog(MySQL的XA事务丢数据问题修复)里我们会提供完成的修复方案并报告给官方。
本文将基于mysql5.7.25版本代码(当期最新5.7.29,依然有一样的问题),分别分析描述下问题的现象和原因。
问题现象描述
在开启mysql双1配置的情况下,如下mysql的xa 语句:
1. xa
prepare 'xid'
2. xa
commit 'xid' one phase
3. xa
commit 'xid'
4. xa
rollback 'xid'
上述操作的执行在binlog刷盘后(ordered_commit第2阶段后)引擎层操作前(ordered_commit第3阶段前)mysql挂掉的话,那么mysql将会出现事务丢失或数据不一致:
1. xa prepare 丢了,但binlog里有这个xa prepare的binlog记录
2. xa commit one phase丢了,事务依然处于prepare状态,但binlog里有这个xa commit
one phase的binlog记录
3. xa commit 丢了,事务依然处于prepare状态,但binlog里有这个xa commit one
phase的binlog记录
4. xa
rollback丢了,如果事务已经处于prepare状态了,那么recover后依然处于prepare状态,但binlog里有这个xa
rollback的binlog记录
上述问题很容易导致xa相关的主从出现数据不一致,极端情况下出现客户已提交事务丢失。
问题原理分析
本文我们将从代码和原理层面分析下这个问题的原因。我们以xa prepare操作为例进行分析,prepare问题是最大的,xa
commit one phase和它类似;而xa commit和xa rollback的问题比较简单,在文章的最后介绍。
xa prepare的提交流程
因为xa
prepare是会单独记录binlog的,我们可以从某种程度上认为是一个事务完结(或阶段性完成)的操作,所以和普通的commit操作一样,它是会调用ordered_commit进行提交和持久化的。
ordered_commit可以分为3个阶段:
1. flush stage
2. sync stage
3. commit stage
第一阶段flush stage会将当期的redo日志刷盘,然后将binlog的event
buffer到文件系统缓存。如下是redo日志刷盘的堆栈:
#0 innobase_flush_logs (hton=0x2f8a920, binlog_group_flush=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/handler/ha_innodb.cc:4218
#1 0x0000000000fbff18 in flush_handlerton (thd=0x0,
plugin=0x7fffd815efe0, arg=0x7fffd815f0a4) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/handler.cc:2473
#2 0x000000000166cd44 in plugin_foreach_with_mask (thd=0x0,
funcs=0x7fffd815f070, type=1, state_mask=4294967287,
arg=0x7fffd815f0a4) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_plugin.cc:2517
#3 0x000000000166ce0f in plugin_foreach_with_mask (thd=0x0,
func=0xfbfeba , type=1, state_mask=8,
arg=0x7fffd815f0a4)
at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_plugin.cc:2532
#4 0x0000000000fbff74 in ha_flush_logs (db_type=0x0,
binlog_group_flush=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/handler.cc:2483
#5 0x000000000190abd6 in MYSQL_BIN_LOG::process_flush_stage_queue
(this=0x2edde40 , total_bytes_var=0x7fffd815f180,
rotate_var=0x7fffd815f172,
out_queue_var=0x7fffd815f188)
at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/binlog.cc:8951
#6 0x000000000190c256 in MYSQL_BIN_LOG::ordered_commit (this=0x2edde40
, thd=0x7fff94000b70, all=true, skip_commit=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/binlog.cc:9595
#7 0x000000000190a854 in MYSQL_BIN_LOG::commit (this=0x2edde40 ,
thd=0x7fff94000b70, all=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/binlog.cc:8851
#8 0x00000000018f8f92 in binlog_prepare (hton=0x3ad63c0,
thd=0x7fff94000b70, all=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/binlog.cc:1855
#9 0x0000000000fbde1a in ha_prepare (thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/handler.cc:1469
#10 0x00000000017581c2 in
Sql_cmd_xa_prepare::trans_xa_prepare (this=0x7fff94006110,
thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/xa.cc:820
#11 0x00000000017583a5 in
Sql_cmd_xa_prepare::execute (this=0x7fff94006110,
thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/xa.cc:861
#12 0x000000000163ef11 in
mysql_execute_command (thd=0x7fff94000b70, first_level=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:4835
#13 0x0000000001640fce in
mysql_parse (thd=0x7fff94000b70, parser_state=0x7fffd8161550) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:5570
#14 0x0000000001635f52 in
dispatch_command (thd=0x7fff94000b70, com_data=0x7fffd8161e00,
command=COM_QUERY) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:1484
#15 0x0000000001634df0 in
do_command (thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:1025
第2阶段将binlog刷盘后,第3阶段就进行数据提交/prepare。
先刷binlog在进行引擎层提交/prepare是因为binlog是事务协调日志,后续recover的时候,一个待恢复的事务是应该commit还是rollback就是以binlog为准,如果binlog里有就commit,否则就rollback。对应的mysql代码位于xa.cc的xarecover_handlerton函数中。
同样是在ordered_commit的第2阶段后第3阶段前crash,普通事务可以recover,而为什么xa事务不能正常recover呢。这就涉及到xa事务在ordered_commit之后的innodb引擎层的ht->prepare的提交/prepare阶段的操作了,如下是xa
prepare在innodb引擎层的堆栈:
#0 trx_prepare (trx=0x7fffed5cfd20) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/trx/trx0trx.cc:2812
#1 0x0000000001c230a8 in trx_prepare_for_mysql (trx=0x7fffed5cfd20) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/trx/trx0trx.cc:2905
#2 0x00000000019f10c3 in innobase_xa_prepare (hton=0x2f8a920,
thd=0x7fff94000b70, prepare_trx=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/handler/ha_innodb.cc:17145
#3 0x0000000000fbde1a in ha_prepare (thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/handler.cc:1469
#4 0x00000000017581c2 in Sql_cmd_xa_prepare::trans_xa_prepare
(this=0x7fff94006110, thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/xa.cc:820
#5 0x00000000017583a5 in Sql_cmd_xa_prepare::execute
(this=0x7fff94006110, thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/xa.cc:861
#6 0x000000000163ef11 in mysql_execute_command (thd=0x7fff94000b70,
first_level=true) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:4835
#7 0x0000000001640fce in mysql_parse (thd=0x7fff94000b70,
parser_state=0x7fffd8161550) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:5570
#8 0x0000000001635f52 in dispatch_command (thd=0x7fff94000b70,
com_data=0x7fffd8161e00, command=COM_QUERY) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:1484
#9 0x0000000001634df0 in do_command (thd=0x7fff94000b70) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_parse.cc:1025
#10 0x00000000017769f9 in
handle_connection (arg=0x3ba00f0) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/conn_handler/connection_handler_per_thread.cc:306
#11 0x0000000001de5090 in
pfs_spawn_thread (arg=0x3bca680) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/perfschema/pfs.cc:2190
#12 0x00007ffff71706ba in
start_thread (arg=0x7fffd8162700) at
pthread_create.c:333
#13 0x00007ffff660541d in clone
() at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
innobase_xa_prepare调用trx_prepare_for_mysql->trx_prepare进行prepare:
1.
首先检查系统undo表空间中是否有变更的回滚段(trx->rsegs.m_redo),有的话对其调用lsn =
trx_prepare_low(trx, &trx->rsegs.m_redo, false);
2.
检查临时表空间中是否有回滚段(trx->rsegs.m_noredo,crash情况下不需要recover),有的话对其调用trx_prepare_low(trx,
&trx->rsegs.m_noredo, true)
trx_prepare_low
对insert和update的undo回滚段调用trx_undo_set_state_at_prepare(storage/innobase/trx/trx0undo.cc:1891)将其state设置为TRX_UNDO_PREPARED状态并写入xid到undo_page,将其写入undo
page并关联mtr记录mtr_log,最后通过
mtr_t::commit->mtr_t::Command::execute提交mtr并写redo日志
可以看到,在prepare阶段,innodb将TRX_UNDO_PREPARED状态和xid信息写入到了undo_page里,而这些信息是xa
prepare的recover流程中必须的信息。
我们接下来看下innodb的xa 事务的recover流程,启动recover流程。
Innodb xa事务的启动recover流程
mysql启动后会先调用引擎层的innodb的recover,如下堆栈:
#0 trx_resurrect_insert (undo=0x3d4b7f8, rseg=0x3d4b4b8) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/trx/trx0trx.cc:814
#1 0x0000000001c1e703 in trx_lists_init_at_db_start () at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/trx/trx0trx.cc:1030
#2 0x0000000001c17faf in trx_sys_init_at_db_start () at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/trx/trx0sys.cc:493
#3 0x0000000001bce7b6 in innobase_start_or_create_for_mysql () at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/srv/srv0start.cc:2250
#4 0x00000000019d6bc3 in innobase_init (p=0x2f8a920) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/storage/innobase/handler/ha_innodb.cc:4075
#5 0x0000000000fbd14b in ha_initialize_handlerton (plugin=0x30de490)
at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/handler.cc:840
#6 0x00000000016696e5 in plugin_initialize (plugin=0x30de490) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_plugin.cc:1226
#7 0x000000000166a3ca in plugin_register_builtin_and_init_core_se
(argc=0x2ead6d0 , argv=0x2f89560) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/sql_plugin.cc:1589
#8 0x0000000000f2f031 in init_server_components () at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/mysqld.cc:4082
#9 0x0000000000f306b7 in mysqld_main (argc=14, argv=0x2f89560) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/mysqld.cc:4776
#10 0x0000000000f27e96 in main
(argc=2, argv=0x7fffffffdd38) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/main.cc:25
启动时innodb引擎判定需要进行recover(recv_recovery_from_checkpoint_start
中判断checkpoint_lsn !=
flush_lsn),调用recv_init_crash_recovery_spaces进入crash recover流程(Check
if all tablespaces were found for crash recovery)。
然后调用trx_lists_init_at_db_start基于undo日志分别对insert操作和update操作做Resurrect,
1.
对于insert调用的是trx_resurrect_insert,基于undo日志准备用于recover的trx对象,对于prepare状态的事务,undo段状态为undo->state
== TRX_UNDO_PREPARED (xa prepare的时候在trx_prepare_low中调用
trx_undo_set_state_at_prepare设置的),那么生成的trx_t会被标记为state
= TRX_STATE_PREPARED。
trx->id =
undo->trx_id;
2. 基于undo发现的所有需要recover的事务都会放入 trx_sys->rw_trx_set
3. 并调用trx_lists_init_at_db_start 恢复事务所需要持有的锁。
4.
对于update调用trx_resurrect_update,这里会先确认对应的undo事务是不是在trx_resurrect_insert中已经生成,生产的话就复用。
5. 基于undo发现的所有需要recover的事务都会放入 trx_sys->rw_trx_set
6.
把trx_sys->rw_trx_set中所有处于TRX_STATE_ACTIVE和TRX_STATE_PREPARED状态的trx放入trx_sys->rw_trx_ids。
7.
而后在trx_rollback_or_clean_recovered中,遍历trx_sys->rw_trx_list调用trx_rollback_resurrected
(只有处于TRX_STATE_COMMITTED_IN_MEMORY和TRX_STATE_ACTIVE状态的事务才会被rollback(trx_rollback_active),对于TRX_STATE_PREPARED的不处理,交给server层处理)
所以可以看到,只有已经处于PREPARED状态的事务才会被交到server层进行recover,其他的事务都会在innodb层直接就回滚了。而判定一个recover的事务是否为PREPARED状态,取决于它的undo日志里undo->state是否为TRX_UNDO_PREPARED,而这个状态的设置是在在ordered_commit的第3阶段设置的,所以如果第3阶段前mysql
crash了,那么这个事务就无法被recover了。
其实整个mysql的xa事务的recover流程都基本是只针对xa
prepare成功的事务,完全没有考虑xa prepare事务需要recover的事务。
xa commit和xa rollback的问题原因
与xa
prepare不同,xa commit和xa
rollback的问题原因是因为server层的基于binlog的recover逻辑里没有考虑xa commit和xa
rollback的recover需求,具体而言,mysql启动的时候会去检查最后一个binlog,注意只检查最后一个binlog,将所有提交的事务的xid记录到commit_list里。如下堆栈所示,binlog.cc的open_binlog函数如果发现binlog没有正常关闭打印Recovering
after a crash using
gaoxiaoxin-Latitude-5591-bin),那么调用MYSQL_BIN_LOG::recover进行recover
(打印Starting crash recovery...)
MYSQL_BIN_LOG::recover函数中会遍历最新的一个binlog文件,将其中发现的xid事务的xid号记录到xids中,作为参数commit_list传递给ha_recover。这个commit_list就是从binlog中找到的已经完成的事务列表。
(gdb) bt
#0 ha_recover
(commit_list=0x7fffffffcb40) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/xa.cc:140
#1 0x000000000190ceaf in MYSQL_BIN_LOG::recover (this=0x2edde40 ,
log=0x7fffffffccb0, fdle=0x30c99c0,
valid_pos=0x7fffffffcc00)
at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/binlog.cc:9915
#2 0x000000000190961a in MYSQL_BIN_LOG::open_binlog (this=0x2edde40 ,
opt_name=0x3090c70 "gaoxiaoxin-Latitude-5591-bin") at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/binlog.cc:8483
#3 0x00000000019128ab in MYSQL_BIN_LOG::open (this=0x2edde40 ,
opt_name=0x3090c70 "gaoxiaoxin-Latitude-5591-bin") at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/binlog.h:655
#4 0x0000000000f2f6ca in init_server_components () at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/mysqld.cc:4249
#5 0x0000000000f306b7 in mysqld_main (argc=14, argv=0x2f89560) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/mysqld.cc:4776
#6 0x0000000000f27e96 in main (argc=2, argv=0x7fffffffdd38) at
/home/gaoxiaoxin/mysql-repo/mysql-5.7.25/sql/main.cc:25
注意,这里没有去检查binlog里的xa
commit或xa rollback信息:
if (ev->get_type_code() ==
binary_log::QUERY_EVENT &&
!strcmp(((Query_log_event*)ev)->query, "BEGIN"))
in_transaction=
TRUE;
if (ev->get_type_code() ==
binary_log::QUERY_EVENT &&
!strcmp(((Query_log_event*)ev)->query, "COMMIT"))
{
DBUG_ASSERT(in_transaction ==
TRUE);
in_transaction=
FALSE;
}
else if (ev->get_type_code() ==
binary_log::XID_EVENT)
{
DBUG_ASSERT(in_transaction ==
TRUE);
in_transaction=
FALSE;
Xid_log_event
*xev=(Xid_log_event *)ev;
uchar *x= (uchar *)
memdup_root(&mem_root, (uchar*) &xev->xid,
sizeof(xev->xid));
if (!x ||
my_hash_insert(&xids, x))
goto
err2;
}
基于上面的代码,可以看到,没有匹配xa commit和xa
rollback的情况,所以也不会记录xa commit或xa rollback的xid
到commit_list里,自然也不会去recover。
同时,mysql在xa.cc的xarecover_handlerton中基于commit_list对从引擎层获取的待recover的事务信息进行recover处理,对于非外部xa事务的待recover事务,如果在commit_list中存在就提交,否则就rollback;但对于外部xa
事务而言,mysql只是将事务记录到transaction_cache中(可以被xa
recover命令看到),没有任何其他的commit或rollback的操作:
my_xid x=
info->list[i].get_my_xid();
if (!x) //
not "mine" - that is generated by external TM
{
#ifndef DBUG_OFF
char buf[XIDDATASIZE * 4 + 6]; // see
xid_to_str
XID *xid= info->list +
i;
sql_print_information("ignore xid %s",
xid->xid_to_str(buf));
#endif
transaction_cache_insert_recovery(info->list
+ i);
info->found_foreign_xids++;
continue;
}
所以对于待recover的xa commit或xa
rollback的事务,就全部不处理,丢了,innodb引擎层的数据与binlog不一致。binlog协调日志的作用失效了。
转载请注明转自高孝鑫的博客!