mysql外建导致的复制中断,从库mysqldump会导致复制中断

最新推荐文章于 2021-03-23 18:25:38 发布

Darren zzz

最新推荐文章于 2021-03-23 18:25:38 发布

阅读量267

点赞数

文章标签： mysql外建导致的复制中断

作者：张政俊

就职于中欧基金，知数堂粉丝，数据库爱好者，熟悉RDBMS、nosql、new sql等各类数据库。

背景环境数据库版本：5.6.24-log

主从架构：一主两从搭配mha

操作步骤开发要求truncate一张表，在删除之前需要备份下数据；

在从库上使用mysqldump命令做逻辑备份，备份sql：

mysqldump -uXXX -p osdc osdc_XXX > /tmp/osdc_info.sql；这台从库与主库的同步出现中断，报错为：

Slave_IO_Running: YesSlave_SQL_Running: NoLast_Errno: 1756Last_Error: … The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details).ERROR NO 是1756，而且只是 Slave_SQL_Running 停了。

处理方式停止同步：

stop slave;打开同步：

start slave;重新启停同步服务后，恢复正常。

分析之前形形色色的主从问题遇到过很多，但是 1756 这个错误还是第一次遇到，所以事后还是很有兴趣去深入分析下这个问题。

1. 详细报错查看数据库的 error log ，这个报错的具体输出如下：

2020-08-18 22:04:19 208840 [Warning] Slave SQL: Worker 2 failed executing transaction ‘fe0ed3ca-cacc-11e9-8d49-6c92bfcf73b6:1377629720’ at master log mysql-bin.003536, end_log_pos 861741231; Could not execute Update_rows event on table osdc.osdc_XXX; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event’smaster log mysql-bin.003536, end_log_pos 861741231, Error_code: 12052020-08-18 22:04:19 208840 [Note] Worker 2 is exiting: killed 0, error 1, running_status 12020-08-18 22:04:19 208840 [ERROR] Slave SQL: … The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756错误日志里有个 Lock wait timeout exceeded ，这个我们先记下，稍后再说。

先找到 master 事务的 gtid，直接去主库的 binlog 里找下这个 gtid 所对应的 sql：

SET @@SESSION.GTID_NEXT= ‘fe0ed3ca-cacc-11e9-8d49-6c92bfcf73b6:1377629720’/*!*/;# at 861740635#200818 21:58:42 server id 31 end_log_pos 861740715 CRC32 0xa9ab0dd2 Query thread_id=9403182 exec_time=0 error_code=0SET TIMESTAMP=1597759122/*!*/;BEGIN/*!*/;# at 861740715#200818 21:58:42 server id 31 end_log_pos 861740840 CRC32 0xbd8ba613 Rows_query# update osdc_XXX# set modify_time=now(),status=’04’# where id=9298XXXXX and status=’01’可以看到，这时候主库对 osdc_XXX 表进行了 update 操作。

2. 问题分析首先，mysqldump 不加任何参数去执行，会对备份的表加表级锁。

这时候从库需要执行从主库同步过来的 update 语句，因为 mysqldump 表锁的存在，该语句会处理等待状态。

update 等待超时，这个错误我们在上面的 error log 里可以看到(Lock wait timeout exceeded)。

因为在5.6.26版本下，在使用 MTS 的情况下，slave_transaction_retries 是不生效的。所以当 update 语句超时后，它没有重试机制，导致整个 SQL threads 都停掉了(这也就是为什么 Slave_IO_Running 显示YES，而 Slave_SQL_Running 显示NO)。

在 rpl_slave.cc 里可以看到：

—-/* MTS technical limitation no support of trans retry */if (mi->rli->opt_slave_parallel_workers != 0 && slave_trans_retries != 0—-这个 bug 很早就由 Yoshinori Matsunobu 提在 https://bugs.mysql.com/bug.phpid=68465 了，官方也采纳了，并于 MySQL 5.7.5 修复掉了该问题。

Yoshinori Matsunobu 就是大名鼎鼎的 MHA 的作者！

3. 问题复现相关参数slave_parallel_workers：3

[root@localhost][bug]> select * from repl_test;+——+——-+| id | info |+——+——-+| 1 | tom || 2 | jerry |+——+——-+从库加锁操作从库：

[root@localhost][bug]> begin;update repl_test set info=’all’;Query OK, 0 rows affected (0.00 sec)Query OK, 2 rows affected (0.01 sec)Rows matched: 2 Changed: 2 Warnings: 0显性开启事务，让它持续持有 InnoDB lock

主库更新操作主库：

[root@localhost][bug]> update repl_test set info=’all’ where id=1;commit;Query OK, 1 row affected (0.00 sec)Rows matched: 1 Changed: 1 Warnings: 0Query OK, 0 rows affected (0.00 sec)手工更新 repl_test 表的一条数据，此时从库的复制线程会等待这条sql的应用，并处于 system lock 状态：

从库：

[root@localhost][(none)]> show processlist;+—-+—————–+———–+——+———+——+—————————————————————————–+——————————————–+| Id | User | Host | db | Command | Time | State | Info |+—-+—————–+———–+——+———+——+—————————————————————————–+——————————————–+| 6 | event_scheduler | localhost | NULL | Daemon | 545 | Waiting on empty queue | NULL || 9 | root | localhost | bug | Sleep | 10 | | NULL || 15 | system user | | NULL | Connect | 44 | Waiting for master to send event | NULL || 16 | system user | | NULL | Connect | 5 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL || 17 | system user | | NULL | Connect | 44 | Waiting for an event from Coordinator | NULL || 18 | system user | | NULL | Connect | 44 | Waiting for an event from Coordinator | NULL || 19 | system user | | NULL | Connect | 5 | System lock | update repl_test set info=’all’ where id=1 || 20 | root | localhost | NULL | Query | 0 | init | show processlist |+—-+—————–+———–+——+———+——+—————————————————————————–+——————————————–+8 rows in set (0.04 sec)errorlog错误日志开始打印错误信息：

2020-08-20 11:01:00 22391 [Warning] Slave SQL: Worker 2 failed executing transaction ” at master log mysql-bin.000272, end_log_pos 654587717; Could not execute Update_rows event on table bug.repl_test; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event’s master log FIRST, end_log_pos 654587717, Error_code: 12052020-08-20 11:01:00 22391 [ERROR] Slave SQL: … The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 17562020-08-20 11:01:00 22391 [Note] Error reading relay log event: slave SQL thread was killed报的错与当时的内容一致。

4. 总结mysqldump 即使在从库中执行，也必须加上 –single-transcation 等参数，直接执行会上表锁，成本大大。

部分mysql工具，如 navicat 直接使用它自带的导出功能，也会锁住全表。所以尽量不要使用工具去处理导出工作。

全文完。

Enjoy MySQL:)

叶老师的「MySQL核心优化」大课已升级到MySQL 8.0，扫码开启MySQL 8.0修行之旅吧

死锁案例七来源：公众号yangyidba一、前言死锁，其实是一个很有意思也很有挑战的技术问题，大概每个 DBA 和部分开发同学都会在工作过程中遇见。关于死锁我会持续写一个系列的案例分析，希望能够对想了解死锁的…...

Server 层混杂信息字典表 | 全方位认识 information_schema(中)上期《Server 层混杂信息字典表 | 全方位认识 information_schema(上)》为大家介绍了部分关于Server层混杂信息字典表的知识，本期“Server层混杂信息字典表(中)”将继…...

死锁案例五来源：公众号yangyidba一、前言死锁其实是一个很有意思也很有挑战的技术问题，大概每个 DBA 和部分开发朋友都会在工作过程中遇见。关于死锁我会持续写一个系列的案例分析，希望能够对想了解死锁的朋友…...

免费好用的MySQL高可用方案徐晨亮，MySQL DBA，知数堂学员。热衷于数据库优化，自动化运维及数据库周边工具开发，对MySQL源码有一定的兴趣。环境说明地址端口角色说明127.0.0.13306master-127.0.0.…...

Darren zzz

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mysql外建导致的复制中断,从库mysqldump会导致复制中断

作者：张政俊就职于中欧基金，知数堂粉丝，数据库爱好者，熟悉RDBMS、nosql、new sql等各类数据库。背景环境数据库版本：5.6.24-log主从架构：一主两从搭配mha操作步骤开发要求truncate一张表，在删除之前需要备份下数据；在从库上使用mysqldump命令做逻辑备份，备份sql：mysqldump -uXXX -p osdc osdc_XXX > /tmp/osdc_i...
复制链接

扫一扫