写在前面:

    最近一段时间都在做传统主从复制相关的测试,思考了很多上线主从复制架构后,可能会发生的问题,然后针对性设置了这些故障,再然后思考如何在保证业务可用,或者对业务冲击尽可能的小的前提下,进行故障的恢复,也算是小有所得,现在呢,笔者就主从复制故障时,需要跳过故障点时所用到的 sql_slave_skip_counte 变量进行记录描述


sql_slave_skip_counter 介绍:

摘自MySQL官方的解释(强烈建议阅读英文原文。中文版,是笔者自己的理解,只能说仁者见仁)

SET GLOBAL sql_slave_skip_counter Syntax:
        SET GLOBAL sql_slave_skip_counter = N
This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.
        跳过N个events。注意:以event为单位,而不是以事务为单位,只有在由单条语句组成的事务时,两者才等价。
        如:一个事务由多个EVENT组成,BEGIN;INSERT;UPDATE;DELETE;COMMOIT; 这种情况下,两者绝不相等
This statement is valid only when the slave threads are not running. Otherwise, it produces an error.

When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.

For transactional tables, an event group corresponds to a transaction.
        对于事务表,一个event group对应一个事务
or nontransactional tables, an event group corresponds to a single SQL statement.
        对于非事务表,一个event group对应一条SQL
When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group
        当你跳过event的时候,如果N的值,处于event group之中,那么slave会继续跳过event,直至跳过这个event group,从下一个event group开始


对于事务表使用sql_slave_skip_counter的情况:


1、跳过1032复制错误(update/delete error)

跳过由单条SQL组成的事务:


在Slave主机上人为的删除两条数据:

DELETE FROM `edusoho_e`.`t1` WHERE `id` = '9';
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '11';


而Master在变更上述两条记录的时候会报错,导致复制中断:

INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('孙权', '吴国', '妹妹');
UPDATE `edusoho_e`.`t1` SET xname='游戏' WHERE id=7;
UPDATE `edusoho_e`.`t1` SET age=40 WHERE id=11;    #报错
DELETE FROM `edusoho_e`.`t1` WHERE age=40;            #报错
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('曹丕', '魏国', '甄姬');
DELETE FROM `edusoho_e`.`t1` WHERE id=1;
UPDATE `edusoho_e`.`t1` SET hobby='Games' WHERE id=3; 


在Slave查看主从复制状态时,就会发现报错信息:

mysql> show slave status\G;
*************************** 1. row ***************************
Read_Master_Log_Pos: 2176
Exec_Master_Log_Pos: 874
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 1127
Slave_IO_Running: Yes
Slave_SQL_Running: No


在Master主机上查看position做了什么操作:

mysql> show binlog events in 'mysql-bin.000002' from 874;
+------------------+------+-------------+-----------+-------------+---------------------------------+