使用场景
今天要节介绍的是关于 errant transaction 的问题,这个怎么翻译,额。。。异常事务?
这个问题出现的场景是:在slave上进行了DML操作,这时候slave就多出来一个或多个其他slave节点和master节点没有的事务。当发生failover 的时候,这个slave被提升为主,那么这些事务将会污染整个复制环境。我们通常的做法是注入一个空事务来解决。
关于这个问题的描述官方是这么说的:
An errant transaction is a transaction that exists on a slave but not on all of the slaves connected to the master. An errant transaction has a GTID associated with the UUID of the slave to which it was committed. These type of transactions can result from write operations performed on the slave while binary logging is enabled. By nature, these transactions should not be replicated.
传统解决方案
PS:环境是1*master+4*slave的多实例复制环境,
master:6up
slave: 7up 8up 9up
我们模拟在7up这个slave节点进行更新操作之后的处理过程。
操作之前先查看主和从的状态:
[root@msnode1 src]# 6up -e "show master status"
+------------------+----------+--------------+------------------+----------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+----------------------------------------------+
| mysql-bin.000020 | 52320535 | | | b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60874 |
+------------
[root@msnode1 ~]# 7up -e "show slave status\G"|grep "Gtid_Se"
Retrieved_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:39163-608
Executed_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60874
可以看到,目前状态ok。接下来在7up上进行DML操作。
[root@msnode1 ~]# 7up -e "create database gtiddemo";
[root@msnode1 ~]# 7up -e "show slave status\G"|grep "Gtid_Se"
Retrieved_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:39163-60874
Executed_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60874,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1#7up这个从库已经多出一个事务
通过 subtract函数也可以找出多出的那个事务,
[root@msnode1 ~]# 7up -e "select gtid_subtract('b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60874,b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1','b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60874') result";
+----------------------------------------+
| result |
+----------------------------------------+
| b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1 |
+----------------------------------------+
下面是空事务的注入处理,
在6up操作:
mysql> set gtid_next='b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1';
Query OK, 0 rows affected (0.02 sec)
mysql> begin ;
Query OK, 0 rows affected (0.01 sec)
mysql> commit
-> ;
Query OK, 0 rows affected (0.01 sec)
mysql> set gtid_next='automatic';
Query OK, 0 rows affected (0.00 sec)
mysql> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000020
Position: 52320720
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60874,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1 #7up上生成的这个事务已经注入到master上了
处理完毕。
mysqlslavetrx优雅处理方式
下面我们看看 mysqlslavetrx 是怎么优雅的解决这个问题的。
简单讲,mysqlslavetrx 可以在基于GTID复制环境中,在多个slave上同时一步跳过多个事务,即将多个空事务的注入绑定为一个操作,并且在多个slave上同时进行注入。
用法:
mysqlslavetrx
--dryrun
运行在 dry-run 模式下,只会显示要跳过的事务,并不实际执行。
--gtid-set=gtid-set
指定要跳过的空事务
--slaves=slaves_connections
指定要操作的slave,可以多个,用逗号分隔,格式: user[:passwd]@host[:port][:socket]
--verbose, -v
-v = verbose, -vv = more verbose, -vvv = debug
现在开看看mysqlslavetrx具体使用,
PS:因为下面的实验是后来写的,实验环境已经写了很多数据,为了演示方便,我将无关输出屏蔽掉了。
首先看下现在的状态:
[root@msnode1 ~]# 6up -e "show master status\G"
*************************** 1. row ***************************
File: mysql-bin.000021
Position: 3949
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
[root@msnode1 ~]#
[root@msnode1 ~]#
[root@msnode1 ~]# 7up -e "show slave status\G"|tail -8
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:39163-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Auto_Position: 1
[root@msnode1 ~]#
[root@msnode1 ~]#
[root@msnode1 ~]# 8up -e "show slave status\G"|tail -8
Retrieved_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:39198-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1,
Auto_Position: 1
[root@msnode1 ~]#
[root@msnode1 ~]#
[root@msnode1 ~]# 9up -e "show slave status\G"|tail -8
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:39198-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Auto_Position: 1
目前四个节点状态一致。
接下来在7up上进行DML操作:
[root@msnode1 ~]# 7up -e "drop database gtiddemo";
[root@msnode1 ~]# 6up -e "show master status\G"
*************************** 1. row ***************************
File: mysql-bin.000021
Position: 3949
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
[root@msnode1 ~]#
[root@msnode1 ~]# 7up -e "show slave status\G"|tail -8
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:39163-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1-2 #这个多了一个事务
Auto_Position: 1
[root@msnode1 ~]#
[root@msnode1 ~]# 8up -e "show slave status\G"|tail -8
Retrieved_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:39198-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1,
Auto_Position: 1
[root@msnode1 ~]#
[root@msnode1 ~]# 9up -e "show slave status\G"|tail -8
Retrieved_Gtid_Set: b8ef801e-3ae0-11e7-981f-000c29a1cc19:39198-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Executed_Gtid_Set:
b8ef801e-3ae0-11e7-981f-000c29a1cc19:1-60883,
b8ef81ca-3ae0-11e7-981f-000c29a1cc19:1
Auto_Position: 1
可以看到7up已经多出来一个事务 b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2
接下来我们用mysqlslavetrx的dryrun模式分析一下,
[root@msnode1 ~]# mysqlslavetrx --gtid-set="b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2" --slaves=root:123456@localhost:3306:/tmp/mysql3306.sock,\
root:123456@localhost:3308:/tmp/mysql3308.sock,root:123456@localhost:3309:/tmp/mysql3309.sock ,\
root:123456@localhost:3307:/tmp/mysql3307.sock --dryrun -vvv
WARNING: Using a password on the command line interface can be insecure.
#
# WARNING: Executing utility in dry run mode (read only).
#
# GTID set to be skipped for each server:
# - localhost@3306: b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2
# - localhost@3308: b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2
# - localhost@3309: b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2
#
# (dry run) Injecting empty transactions for 'localhost:3306'...
# - b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2
# (dry run) Injecting empty transactions for 'localhost:3308'...
# - b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2
# (dry run) Injecting empty transactions for 'localhost:3309'...
# - b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2
#
#...done.
可以看到程序提示要在 localhost:3306,localhost:3308,localhost:3309三个节点注入b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2,接下来我们就可以一次性将这个事务注入所有节点,只要将上面命令去掉 -dryrun选项即可。
[root@msnode1 ~]# mysqlslavetrx --gtid-set="b8ef81ca-3ae0-11e7-981f-000c29a1cc19:2" --verbose --slaves=root:123456@localhost:3306:/tmp/mysql3306.sock,\
root:123456@localhost:3308:/tmp/mysql3308.sock,root:123456@localhost:3309:/tmp/mysql3309.sock , root:123456@localhost:3307:/tmp/mysql3307.sock
WARNING: Using a password on the command line interface can be insecure.
#
# GTID set to be skipped for each server:
# - localhost@3306: None
# - localhost@3308: None
# - localhost@3309: None
#
# No transaction to skip.
#
#...done.
#
ok,完成,手工。更多关于mysqlslavetrx介绍,请参考官方资料 https://dev.mysql.com/doc/mysql-utilities/1.6/en/mysqlslavetrx.html