主从复制报错非常规解决方法

介绍

主从复制搭建过程中报错一般有2种:

  • 认证阶段错误(用户名,密码,ip等这些认证相关的报错)
  • 主从数据冲突引起的报错(主要原因都是从库被写入过)

第一种报错对应的是IO线程不yes,第二种报错对应的是SQL线程不yes,这里主要介绍从库被写入报错冲突的时候怎么跳过冲突 。这种情况 属于特殊情况,操作者必须明白确定一定以及肯定,这些错误跳过之后不会导致数据不一致,如果没有把握,这些奇技淫巧建议不要用,直接重做主从是最安全的方式。

基于日志位点的同步方式

如果主从同步的方式是基于日志位点,跳过错误的方式有2种:

  • sql_slave_skip_counter

  • slave_skip_errors

环境

IP角色操作系统数据库版本
192.168.240.20CentOS7.6MySQL5.7.26
192.168.240.21CentOS7.6MySQL5.7.26
  1. 安装MySQL5.7
  2. 修改配置文件并启动
  3. 主库建立test数据库

在主库查看Log位点

mysql> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000002 |      154 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

可以看到写到了binlog.000002日志,位点为 154

CREATE DATABASE test;
  1. 从库也手动建立test数据库
CREATE DATABASE test;
  1. 基于这个状态建立主从

主库执行

CREATE USER rpl_user@'%' IDENTIFIED BY 'Hal@123';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';

在从库执行

CHANGE MASTER TO 
MASTER_HOST='192.168.240.20', 
MASTER_USER='rpl_user', 
MASTER_PASSWORD='Hal@123', 
MASTER_LOG_FILE='binlog.000002', 
MASTER_LOG_POS=154;

在从库启动复制

start slave ;

查看slave状态信息

mysql>  SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.240.20
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000002
          Read_Master_Log_Pos: 762
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 317
        Relay_Master_Log_File: binlog.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1007
                   Last_Error: Error 'Can't create database 'test'; database exists' on query. Default database: 'test'. Query: 'CREATE DATABASE test'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 1136
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1007
               Last_SQL_Error: Error 'Can't create database 'test'; database exists' on query. Default database: 'test'. Query: 'CREATE DATABASE test'
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 20
                  Master_UUID: 72ad46d3-96f6-11e9-8c4b-000c29360f4e
             Master_Info_File: /datadir/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 190624 23:10:29
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)


可以看到报错了,这里报错信息显示,从库已经有test库了,所以sql线程执行失败,解决方法如下:

手动跳过错误

解决具体操作如下:

stop slave;
set global  sql_slave_skip_counter = 1;
start slave;

查看主从状态

mysql>  SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.240.20
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000002
          Read_Master_Log_Pos: 762
               Relay_Log_File: localhost-relay-bin.000003
                Relay_Log_Pos: 317
        Relay_Master_Log_File: binlog.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 762
              Relay_Log_Space: 1299
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 20
                  Master_UUID: 72ad46d3-96f6-11e9-8c4b-000c29360f4e
             Master_Info_File: /datadir/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

错误就没有了,因为我们跳过了这个事件,sql_slave_skip_counter这个参数官方网站的解释如下:

This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.

该语句跳过 主服务器的下N个事件。这对于从语句引起的复制停止中恢复非常有用。

This statement is valid only when the slave threads are not running. Otherwise, it produces an error.

仅当从属线程未运行时,此语句才有效。否则,会产生错误。

When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.

使用此语句时,重要的是要了解二进制日志实际上被组织为一组称为事件组的组。每个事件组由一系列事件组成

When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group.

当您使用SET GLOBAL sql_slave_skip_counter跳过事件并且结果位于组的中间时,从站会继续跳过事件,直到它到达组的末尾。然后执行从下一个事件组开始。

跳过错误码

现在需要再创造个环境,在主库的test库下建立表t1

use test;
create table t1(id int primary key);

在从库插入表t1一行id=1数据

insert test.t1 values(1);

然后在主库也执行一次

insert test.t1 values(1);

这时候在从库查看状态

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.240.20
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000002
          Read_Master_Log_Pos: 1532
               Relay_Log_File: localhost-relay-bin.000003
                Relay_Log_Pos: 834
        Relay_Master_Log_File: binlog.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table test.t1; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000002, end_log_pos 1501
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1279
              Relay_Log_Space: 2069
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows event on table test.t1; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000002, end_log_pos 1501
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 20
                  Master_UUID: 72ad46d3-96f6-11e9-8c4b-000c29360f4e
             Master_Info_File: /datadir/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 190624 23:17:35
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

这时候可以看到错误码为1062,报错信息是唯一键冲突

这时候也可以用第二种方式设置slave_skip_errors参数

从库在配置文件/etc/my.cnf中添加一行

slave_skip_errors='1032,1062' 
# 1032:删除数据时找不到行
# 1062:插入数据时唯一键冲突

然后重启数据库

mysqladmin  shutdown
mysqld --daemonize

再次查看slave状态

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.240.20
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000002
          Read_Master_Log_Pos: 1532
               Relay_Log_File: localhost-relay-bin.000006
                Relay_Log_Pos: 317
        Relay_Master_Log_File: binlog.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1532
              Relay_Log_Space: 528
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 20
                  Master_UUID: 72ad46d3-96f6-11e9-8c4b-000c29360f4e
             Master_Info_File: /datadir/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

这里也可以看到同步状态没有报错了。

基于GTID的主从复制

由于GTID必须是连续的,正常情况下同一个服务器产生的GTID是不会存在空缺的,所以不能简单skip掉一个事务,只能通过注入空事务的方法替换掉一个实际操作事务

环境

主库从库都修改配置文件/etc/my.cnf,添加2行配置

gtid_mode=ON  # 启用GTID类型,否则就是普通的复制架构
enforce-gtid-consistency=true # 强制GTID的一致性

重启主从数据库

mysqladmin  shutdown
mysqld --daemonize

从库设置主从同步

mysql> STOP SLAVE;
mysql> RESET SLAVE ALL;
mysql> CHANGE MASTER TO 
MASTER_HOST='192.168.240.20', 
MASTER_USER='rpl_user', 
MASTER_PASSWORD='Hal@123', 
MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

查看主从状态信息

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.240.20
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000006
          Read_Master_Log_Pos: 154
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 361
        Relay_Master_Log_File: binlog.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 572
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 20
                  Master_UUID: 72ad46d3-96f6-11e9-8c4b-000c29360f4e
             Master_Info_File: /datadir/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

然后我们现在创造出数据不一致的场景,在从库执行

insert into test.t1 values(2);

在主库执行

insert into test.t1 values(2);

在从库查看状态

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.240.20
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000006
          Read_Master_Log_Pos: 403
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 361
        Relay_Master_Log_File: binlog.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table test.t1; Duplicate entry '2' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000006, end_log_pos 372
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 821
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows event on table test.t1; Duplicate entry '2' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000006, end_log_pos 372
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 20
                  Master_UUID: 72ad46d3-96f6-11e9-8c4b-000c29360f4e
             Master_Info_File: /datadir/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 190624 23:35:45
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 72ad46d3-96f6-11e9-8c4b-000c29360f4e:1
            Executed_Gtid_Set: 95cc1a55-96f6-11e9-8fad-000c291dfe89:1
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

注入空事务

gtid时的报错解决思路,在从库注入空事务

STOP SLAVE;
SET gtid_next='72ad46d3-96f6-11e9-8c4b-000c29360f4e:1';
BEGIN;COMMIT;
SET gtid_next='AUTOMATIC';
START SLAVE;

查看主从状态

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.240.20
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000006
          Read_Master_Log_Pos: 403
               Relay_Log_File: localhost-relay-bin.000004
                Relay_Log_Pos: 445
        Relay_Master_Log_File: binlog.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 403
              Relay_Log_Space: 947
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 20
                  Master_UUID: 72ad46d3-96f6-11e9-8c4b-000c29360f4e
             Master_Info_File: /datadir/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 72ad46d3-96f6-11e9-8c4b-000c29360f4e:1
            Executed_Gtid_Set: 72ad46d3-96f6-11e9-8c4b-000c29360f4e:1,
95cc1a55-96f6-11e9-8fad-000c291dfe89:1
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

总结

通过上面的实验,实际上是一些绕过去错误的方式,并没有真正解决数据上不一致的问题,如果报告数据冲突最好还是重做主从复制,除非是有百分之1w的把握,确定这样做能保证数据还是一致。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值