虽然学习并实验过 mysql GTID 主从复制,但是,实际运行起来,还是会出现各种问题,有些是学习时遇到过的,有些是没有遇到过的,比方说这次,之前的错误处理记录中就没有,只好网上各种搜索。
主服务器版本
$ mysql -V
mysql Ver 8.0.33-0ubuntu0.20.04.2 for Linux on x86_64 ((Ubuntu))
从服务器版本
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.34 |
+-----------+
1 row in set (0.00 sec)
本次完全依照以下文章解决
MySQL主从复制中的问题(Coordinator stopped beacause there were errors in the workers…)
错误 1032
mysql> show replica status\G;
...
Relay_Log_Pos: 367020
Relay_Source_Log_File: binlog.000001
Replica_IO_Running: Yes
Replica_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'be41e172-f145-11ec-9b3c-00155d5aa636:30' at source log binlog.000001, end_log_pos 367097. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
...
定位错误 1032
mysql> select * from performance_schema.replication_applier_status_by_worker\G
mysql> select * from performance_schema.replication_applier_status_by_worker\G
...
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 1032
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'be41e172-f145-11ec-9b3c-00155d5aa636:30' at source log binlog.000001, end_log_pos 367097; Could not execute Delete_rows event on table XXX.sys_locker_sms; Can't find record in 'sys_locker_sms', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's source log binlog.000001, end_log_pos 367097
LAST_ERROR_TIMESTAMP: 2023-07-26 21:13:37.673641
...
重点:
Could not execute Delete_rows event on table XXX.sys_locker_sms; Can’t find record in ‘sys_locker_sms’, Error_code: 1032;
在 master 上删除一条记录,在 replica 上找不到
尝试解决 1032
- 按照参考文章中的第一个办法
set global sql_slave_skip_counter=1;
mysql> set global sql_slave_skip_counter=1;
Query OK, 0 rows affected, 2 warnings (0.00 sec)
跳过一条记录,sql 没有报错,但是,也没有解决以上问题!
-
按照参考文章中的第二个办法, OK!
mysql> STOP REPLICA; Query OK, 0 rows affected (0.02 sec) mysql> SET @@SESSION.GTID_NEXT='be41e172-f145-11ec-9b3c-00155d5aa636:30'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN; Query OK, 0 rows affected (0.00 sec) mysql> COMMIT; Query OK, 0 rows affected (0.03 sec) mysql> SET SESSION GTID_NEXT=AUTOMATIC; Query OK, 0 rows affected (0.00 sec) mysql> START REPLICA; Query OK, 0 rows affected (0.29 sec) mysql> show replica status \G; *************************** 1. row *************************** Replica_IO_State: Waiting for source to send event Source_Host: 192.168.0.201 Source_User: root Source_Port: 3306 Connect_Retry: 60 Source_Log_File: binlog.000002 Read_Source_Log_Pos: 8682708 Relay_Log_File: ubuntu2004-195-relay-bin.000002 Relay_Log_Pos: 2015574 Relay_Source_Log_File: binlog.000001 Replica_IO_Running: Yes Replica_SQL_Running: Yes ...
大概理解一下 : 用一条空事务代替这一条 delete 操作,跳过了这个错误地点
1). SET @@SESSION.GTID_NEXT 到错误提示中报告的 GTID :'be41e172-f145-11ec-9b3c-00155d5aa636:30'
2) . BEGIN;
COMMIT;
提交一条空事务
3). SET SESSION GTID_NEXT=AUTOMATIC;
恢复GTID , 继续
错误 1062
mysql> show replica status \G;
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
...
Last_Errno: 1062
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'be41e172-f145-11ec-9b3c-00155d5aa636:247' at source log binlog.000002, end_log_pos 50007. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
...
定位错误 1062
mysql> select * from performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************
...
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'be41e172-f145-11ec-9b3c-00155d5aa636:247' at source log binlog.000002, end_log_pos 50007; Could not execute Write_rows event on table lyd-test.zt_debuglog; Duplicate entry '18666' for key 'zt_debuglog.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's source log binlog.000002, end_log_pos 50007
LAST_ERROR_TIMESTAMP: 2023-07-27 08:43:30.727645
...
重点 :
Duplicate entry ‘18666’ for key ‘zt_debuglog.PRIMARY’,
key 重复了?
在从服务器上已经有该key,现在又插入了同一条记录,冲突了!
尝试解决 1062
按照以上 1032 的解决办法,直接继续!
mysql> STOP REPLICA;
Query OK, 0 rows affected (0.06 sec)
mysql> SET @@SESSION.GTID_NEXT='be41e172-f145-11ec-9b3c-00155d5aa636:247';
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN; COMMIT;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.06 sec)
mysql> SET SESSION GTID_NEXT=AUTOMATIC;
Query OK, 0 rows affected (0.01 sec)
mysql> START REPLICA;
Query OK, 0 rows affected (0.19 sec)
接下来,居然连续出现好多次 Duplicate,照方抓药!
后记
既然直接空事务跳过去,那么还不如直接让 replica 忽略这个错误
$ sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf
...
# 跳过指定error no类型的错误
# slave-skip-errors=1062,1053,1146
replica-skip-errors=1032,1062,1053,1146
...
$ sudo service mysql restart
再次查看就 OK 了!