MySQL主从复制中断,跳过指定事务恢复

前言

MySQL主从复制中断,绝大多数的情况是SQL线程应用错误。有时我们会选择跳过这个事务来恢复主从复制,事后再分析主从复制中断的原因。

选择跳过指定事务常见的情况如下:

  1. 从库中对象已经存在;
  2. 从库中对象已经不存在;
  3. 主键重复;
  4. 其他。

以上情况的直接原因是主从数据不一致,跳过只是临时的解决办法,最保险的方式是重建复制。

对于主键重复的情况,需要格外注意,一定不能轻易跳过,需要确认:

  • 这条记录主库从库是否只是主键相同,其他字段不完全相同;
  • 此事务可能包含多个操作。

如果是以上情况,不能通过跳过事务恢复。可行方案是从库删除这条记录,然后重启复制。

下面分享一个案例:

1 查看复制状态

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.131.99
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000068
          Read_Master_Log_Pos: 586
               Relay_Log_File: mysql002-relay-bin.000010
                Relay_Log_Pos: 672
        Relay_Master_Log_File: binlog.000061
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1410
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed                       executing transaction 'bd4b724b-ab29-11ee-826f-000c294bd026:424255' at source log binlog.000061, end_log_pos 2765. See error log and/or perfor                      mance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1766
              Relay_Log_Space: 9989536
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1410
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed                       executing transaction 'bd4b724b-ab29-11ee-826f-000c294bd026:424255' at source log binlog.000061, end_log_pos 2765. See error log and/or perfor                      mance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: bd4b724b-ab29-11ee-826f-000c294bd026
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 240302 23:30:53
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: bd4b724b-ab29-11ee-826f-000c294bd026:424253-426568
            Executed_Gtid_Set: 2218063c-aef7-11ee-9e40-000c29f059d3:1-6,
bd4b724b-ab29-11ee-826f-000c294bd026:1-424254
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 1
            Network_Namespace:
1 row in set, 1 warning (0.00 sec)

2 错误分析

根据复制状态,可以看到 Slave_SQL_Running: No,此时sql线程已经停止,错误信息为:

Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction'bd4b724b-ab29-11ee-826f-000c294bd026:424255' at source log binlog.000061, end_log_pos 2765. See error log and/or perfor mance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

提示 Worker 1 有执行错误的事务,根据提示查看性能视图:

mysql> select * from performance_schema.replication_applier_status_by_worker where LAST_ERROR_MESSAGE <> ''\G
*************************** 1. row ***************************
                                           CHANNEL_NAME:
                                              WORKER_ID: 1
                                              THREAD_ID: NULL
                                          SERVICE_STATE: OFF
                                      LAST_ERROR_NUMBER: 1410
                                     LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'bd4b724b-ab29-11ee-826f-000c294bd026:424255' at source log binlog.000061, end_log_pos 2765; Error 'You are not allowed to create a user with GRANT' on query. Default database: ''. Query: 'GRANT SELECT, RELOAD, PROCESS, REPLICATION CLIENT, BACKUP_ADMIN ON *.* TO 'pmm'@'192.168.131.99''
                                   LAST_ERROR_TIMESTAMP: 2024-03-02 23:30:53.719870
                               LAST_APPLIED_TRANSACTION: bd4b724b-ab29-11ee-826f-000c294bd026:424254
     LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2024-01-29 23:25:54.899869
    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2024-01-29 23:25:54.899869
         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2024-03-02 23:30:53.712714
           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2024-03-02 23:30:53.717294
                                   APPLYING_TRANSACTION: bd4b724b-ab29-11ee-826f-000c294bd026:424255
         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2024-01-29 23:26:22.316756
        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2024-01-29 23:26:22.316756
             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2024-03-02 23:30:53.717343
                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                     APPLYING_TRANSACTION_RETRIES_COUNT: 0
       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
    APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
1 row in set (0.00 sec)

关键信息为:Error ‘You are not allowed to create a user with GRANT’,可知执行失败的语句是赋权相关的。

从库执行错误的sql为:

GRANT SELECT, RELOAD, PROCESS, REPLICATION CLIENT, BACKUP_ADMIN ON *.* TO 'pmm'@'192.168.131.99'

查看从库’pmm’@'192.168.131.99’用户的权限:

mysql> show grants for 'pmm'@'192.168.131.99';
+------------------------------------------------------------------------------------+
| Grants for pmm@192.168.131.99                                                      |
+------------------------------------------------------------------------------------+
| GRANT SELECT, RELOAD, PROCESS, REPLICATION CLIENT ON *.* TO `pmm`@`192.168.131.99` |
+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

查看主库’pmm’@'192.168.131.99’用户的权限:

mysql> show grants for 'pmm'@'192.168.131.99';
+------------------------------------------------------------------------------------+
| Grants for pmm@192.168.131.99                                                      |
+------------------------------------------------------------------------------------+
| GRANT SELECT, RELOAD, PROCESS, REPLICATION CLIENT ON *.* TO `pmm`@`192.168.131.99` |
| GRANT BACKUP_ADMIN ON *.* TO `pmm`@`192.168.131.99`                                |
+------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

可知报错的sql语句从库已经应用成功。

3 错误处理方法

根据前面搜集到的信息可知,从库sql进程错误执行的事务为一个赋权语句,且从库已经应用。所以此事务直接跳过即可。

步骤如下:

1)关闭复制

mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

确认是否关闭:

mysql> show slave status\G
*************************** 1. row ***************************
...
             Slave_IO_Running: No
            Slave_SQL_Running: No
...

2)确定跳过的事务

跳过的事务为Executed_Gtid_Set的最大值加1。

mysql> show slave status\G
*************************** 1. row ***************************
...
           Retrieved_Gtid_Set: bd4b724b-ab29-11ee-826f-000c294bd026:424253-426568
            Executed_Gtid_Set: 2218063c-aef7-11ee-9e40-000c29f059d3:1-6,
bd4b724b-ab29-11ee-826f-000c294bd026:1-424254
...

跳过的事务为:bd4b724b-ab29-11ee-826f-000c294bd026:424255

3)跳过事务

跳过事务的本质是插入一个空的事务,关键步骤是 set session gtid_next。

跳过事务后,需要将gtid_next设置回automatic。

mysql> set session gtid_next='bd4b724b-ab29-11ee-826f-000c294bd026:424255';
Query OK, 0 rows affected (0.00 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

mysql> set session gtid_next='automatic';
Query OK, 0 rows affected (0.00 sec)

4)重启复制

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.05 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.131.99
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000068
          Read_Master_Log_Pos: 586
               Relay_Log_File: mysql002-relay-bin.000022
                Relay_Log_Pos: 451
        Relay_Master_Log_File: binlog.000068
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 586
              Relay_Log_Space: 1303
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: bd4b724b-ab29-11ee-826f-000c294bd026
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: bd4b724b-ab29-11ee-826f-000c294bd026:424253-426568
            Executed_Gtid_Set: 2218063c-aef7-11ee-9e40-000c29f059d3:1-6,
bd4b724b-ab29-11ee-826f-000c294bd026:1-426568
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 1
            Network_Namespace:
1 row in set, 1 warning (0.00 sec)

双YES,说明复制已经恢复正常。

4 验证主从是否一致

查看从库数据:

mysql> show grants for 'pmm'@'192.168.131.99';
+------------------------------------------------------------------------------------+
| Grants for pmm@192.168.131.99                                                      |
+------------------------------------------------------------------------------------+
| GRANT SELECT, RELOAD, PROCESS, REPLICATION CLIENT ON *.* TO `pmm`@`192.168.131.99` |
+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

查看主库数据:

mysql> show grants for 'pmm'@'192.168.131.99';
+------------------------------------------------------------------------------------+
| Grants for pmm@192.168.131.99                                                      |
+------------------------------------------------------------------------------------+
| GRANT SELECT, RELOAD, PROCESS, REPLICATION CLIENT ON *.* TO `pmm`@`192.168.131.99` |
| GRANT BACKUP_ADMIN ON *.* TO `pmm`@`192.168.131.99`                                |
+------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

发现从库pmm@192.168.131.99用户少了BACKUP_ADMIN权限。

把这个缺失的权限补齐:

#主库执行:
mysql> GRANT BACKUP_ADMIN ON *.* TO `pmm`@`192.168.131.99`;
Query OK, 0 rows affected (0.00 sec)
  • 7
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
MySQL主从复制异常是指在MySQL数据库的主从复制过程中出现了错误或异常情况。主从复制是一种常用的数据库复制技术,通过将主数据库的更新操作同步到从数据库,实现数据的备份和读写分离。 常见的MySQL主从复制异常包括以下几种情况: 1. 主从延迟:主从延迟是指从数据库相对于主数据库存在一定的数据同步延迟。可能的原因包括网络延迟、主库负载过高、从库性能不足等。可以通过优化网络环境、增加从库资源、调整主从同步参数等方式来解决延迟问题。 2. 主从数据不一致:主从数据不一致是指主数据库和从数据库之间的数据出现了不一致的情况。可能的原因包括网络丢包、主库故障、从库故障等。可以通过检查主从同步状态、修复数据不一致的表、重新搭建主从复制等方式来解决数据不一致问题。 3. 主从同步中断:主从同步中断是指主数据库和从数据库之间的同步过程被中断。可能的原因包括网络中断、主库宕机、从库宕机等。可以通过检查主从同步状态、重新启动主从复制、修复宕机的数据库等方式来解决同步中断问题。 4. 主从切换异常:主从切换是指将从数据库切换为主数据库的过程。在切换过程中可能会出现数据丢失、数据不一致等异常情况。可以通过备份数据、确保主从同步正常、进行灾备演练等方式来避免主从切换异常。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值