照理说现在都8102年了,sql_slave_skip_counter应该不再是一个热门的讨论点了,但作为一个经典的容易误导的知识点,不去纸上谈兵,用实验总结一下还是很有必要的。
以下实验是在MySQL 5.6的基于行的一主一从复制下展开。
主库如下建表,并插入两行记录:
mysql> CREATE TABLE `rpl_test1` (`id` int(11) NOT NULL,PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
mysql> insert into rpl_test1 select 1;
mysql> insert into rpl_test1 select 2;
除此之外,从库另外插入一条记录:
mysql> insert into rpl_test1 select 4;
此时,主从已经数据不一致。
#主库:
mysql> select * from rpl_test1;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set (0.00 sec)
#从库:
mysql> select * from rpl_test1;
+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
+----+
3 rows in set (0.00 sec)
现在,在主库上执行两个事务:
TR1:插入3、4、5、6,并commit
TR2:插入7、8、9,并commit
在binlog中,对应的event信息如下
+------------+------+-------------+-----------+-------------+---------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------+------+-------------+-----------+-------------+---------------------------------------+
| bin.000018 | 321 | Query | 237303306 | 396 | BEGIN |
| bin.000018 | 396 | Table_map | 237303306 | 451 | table_id: 73 (testdb1.rpl_test1) |
| bin.000018 | 451 | Write_rows | 237303306 | 491 | table_id: 73 flags: STMT_END_F |
| bin.000018 | 491 | Table_map | 237303306 | 546 | table_id: 73 (testdb1.rpl_test1) |
| bin.000018 | 546 | Write_rows | 237303306 | 586 | table_id: 73 flags: STMT_END_F |
| bin.000018 | 586 | Table_map | 237303306 | 641 | table_id: 73 (testdb1.rpl_test1) |
| bin.000018 | 641 | Write_rows | 237303306 | 681 | table_id: 73 flags: STMT_END_F |
| bin.000018 | 681 | Table_map | 237303306 | 736 | table_id: 73 (testdb1.rpl_test1) |
| bin.000018 | 736 | Write_rows | 237303306 | 776 | table_id: 73 flags: STMT_END_F |
| bin.000018 | 776 | Xid | 237303306 | 807 | COMMIT /* xid=176 */ |
| bin.000018 | 807 | Query | 237303306 | 882 | BEGIN |
| bin.000018 | 882 | Table_map | 237303306 | 937 | table_id: 73 (testdb1.rpl_test1) |
| bin.000018 | 937 | Write_rows | 237303306 | 977 | table_id: 73 flags: STMT_END_F |
| bin.000018 | 977 | Table_map | 237303306 | 1032 | table_id: 73 (testdb1.rpl_test1) |
| bin.000018 | 1032 | Write_rows | 237303306 | 1072 | table_id: 73 flags: STMT_END_F |
| bin.000018 | 1072 | Table_map | 237303306 | 1127 | table_id: 73 (testdb1.rpl_test1) |
| bin.000018 | 1127 | Write_rows | 237303306 | 1167 | table_id: 73 flags: STMT_END_F |
| bin.000018 | 1167 | Xid | 237303306 | 1198 | COMMIT /* xid=182 */ |
从库势必在执行到插入4的时候,SQL进程会报错。
Last_SQL_Error: Could not execute Write_rows event on table testdb1.rpl_test1; Duplicate entry '4' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log bin.000018, end_log_pos 586
后面分别讨论 sql_slave_skip_counter 在设置不同的N值来跳过错误时,最后的从库数据情况。
一、sql_slave_skip_counter <= 10
将sql_slave_skip_counter设置为1~10,均能跳过1062错误,最终从库数据呈现如下:第一个事务的操作将会被跳过,第二个事务的操作被保留。
mysql> select * from rpl_test1;
+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
| 7 |
| 8 |
| 9 |
+----+
6 rows in set (0.00 sec)
二、sql_slave_skip_counter = 11
将操作过程罗列如下,以便分析:
主库如下执行两个事务
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into rpl_test1 select 3;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into rpl_test1 select 4;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into rpl_test1 select 5;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into rpl_test1 select 6;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into rpl_test1 select 7;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into rpl_test1 select 8;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into rpl_test1 select 9;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> commit;
Query OK, 0 rows affected (0.01 sec)
对应的binlog event如下:
+------------+-----+-------------+-----------+-------------+---------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------+-----+-------------+-----------+-------------+---------------------------------------+
| bin.000028 | 4 | Format_desc | 237303306 | 120 | Server ver: 5.6.36-log, Binlog ver: 4 |
| bin.000028 | 120 | Query | 237303306 | 195 | BEGIN |
| bin.000028 | 195 | Table_map | 237303306 | 250 | table_id: 72 (testdb1.rpl_test1) |
| bin.000028 | 250 | Write_rows | 237303306 | 290 | table_id: 72 flags: STMT_END_F |
| bin.000028 | 290 | Table_map | 237303306 | 345 | table_id: 72 (testdb1.rpl_test1) |
| bin.000028 | 345 | Write_rows | 237303306 | 385 | table_id: 72 flags: STMT_END_F |
| bin.000028 | 385 | Table_map | 237303306 | 440 | table_id: 72 (testdb1.rpl_test1) |
| bin.000028 | 440 | Write_rows | 237303306 | 480 | table_id: 72 flags: STMT_END_F |
| bin.000028 | 480 | Table_map | 237303306 | 535 | table_id: 72 (testdb1.rpl_test1) |
| bin.000028 | 535 | Write_rows | 237303306 | 575 | table_id: 72 flags: STMT_END_F |
| bin.000028 | 575 | Xid | 237303306 | 606 | COMMIT /* xid=2218 */ |
| bin.000028 | 606 | Query | 237303306 | 681 | BEGIN |
| bin.000028 | 681 | Table_map | 237303306 | 736 | table_id: 72 (testdb1.rpl_test1) |
| bin.000028 | 736 | Write_rows | 237303306 | 776 | table_id: 72 flags: STMT_END_F |
| bin.000028 | 776 | Table_map | 237303306 | 831 | table_id: 72 (testdb1.rpl_test1) |
| bin.000028 | 831 | Write_rows | 237303306 | 871 | table_id: 72 flags: STMT_END_F |
| bin.000028 | 871 | Table_map | 237303306 | 926 | table_id: 72 (testdb1.rpl_test1) |
| bin.000028 | 926 | Write_rows | 237303306 | 966 | table_id: 72 flags: STMT_END_F |
| bin.000028 | 966 | Xid | 237303306 | 997 | COMMIT /* xid=2224 */ |
+------------+-----+-------------+-----------+-------------+---------------------------------------+
19 rows in set (0.00 sec)
从库show slave status部分输出结果:
Master_Log_File: bin.000028
Read_Master_Log_Pos: 997
Relay_Log_File: relay.000065
Relay_Log_Pos: 277
Relay_Master_Log_File: bin.000028
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table testdb1.rpl_test1; Duplicate entry '4' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log bin.000028, end_log_pos 385
Exec_Master_Log_Pos: 120
Relay_Log_Space: 861155
Last_SQL_Errno: 1062
Last_SQL_Error: Could not execute Write_rows event on table testdb1.rpl_test1; Duplicate entry '4' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log bin.000028, end_log_pos 385
Last_SQL_Error_Timestamp: 180814 16:36:51
skip的过程
mysql> set global sql_slave_skip_counter=11;
Query OK, 0 rows affected (0.00 sec)
mysql> start slave sql_thread;
Query OK, 0 rows affected (0.01 sec)
最终从库的数据呈现如下:两个事务都被跳过,从库数据不发生变化。
mysql> select * from rpl_test1;
+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
+----+
3 rows in set (0.00 sec)
小结:
TX1在binlog中对应了10个event,TX2则对应8个event。
根据官方手册的理解,sql_slave_skip_counter=N,N指的是跳过前一次正常运行的事务之后的N个event
This statement skips the next N events from the master
当sql_slave_skip_counter设置在10以内时,由于最后第N个event还在第一个事务中,所以将会连带跳过event group中剩余event,效果就是第一个事务整个被跳过;当sql_slave_skip_counter超过10之后,跳过的最后第N个event已经落到第二个事务中,同样需要连带跳过第二个event group中剩余event,即第二个事务也会被跳过。
When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group.
注:
1、对于事务表,一个事务对应一个event group,一个event group中存在多个event;
对于非事务表,一条SQL对应一个event group。并且,当binlog_format=STATEMENT时,一个event group中只有一个event;当binlog_format=ROW时,一条SQL对应最少4个event(即使是非事务表),分别是:Query(begin),Table_map,Write_rows/Update_rows/Delete_rows,Xid(commit)
2、基于GTID的复制无法使用sql_slave_skip_counter跳过错误,可以利用替换空事务的方法来处理,操作方法见参考文档。
参考文档:
Using GTIDs for Failover and Scaleout: Injecting empty transactions