昨天线上发现一个数据库诡异问题,在这里记录下问题的发现以及处理的过程。
研发的同事反馈一个数据库的数据从19:44就没有更新了(该从库线上服务不适用,只是线下查询以及统计数据使用)。但是登录到从库查看发现主从同步正常,负责主从同步的两个进程为yes,主从延迟为0,Slave_SQL_Running_State为Waiting for Slave Workers to free pending events
Waiting for Slave Workers to free pending events
This waiting action occurs when the total size of events being processed by Workers exceeds the size of theslave_pending_jobs_size_max system variable. The Coordinator resumes scheduling when the size drops below this limit. This state occurs only when slave_parallel_workers is set greater than 0.
翻译如下:
这个等待状态会在什么时候发生呢?这个状态只会在
查看错误日志信息
2019-09-02T20:45:40.060392+08:00 6 [ERROR] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
show variables like '%slave_pending_jobs_size_max%'; 默认为16M
show variables like 'max_allowed_packet'; 为512M
解决:
先修改slave_pending_jobs_size_max的大小为128M
停止slave
stop slave; 命令发出后一直在等待
只能用kill -9 杀掉进程;(因为我这个是从库,停掉后不影响业务,具体情况具体处理)
将slave_pending_jobs_size_max=128M 写进配置文件。
启动MySQL后状态变为show slave status\G
Slave_SQL_Running_State:变为 Waiting for dependent transaction to commit
Relay_Master_Log_File: Master-mysql-bin.001676
Exec_Master_Log_Pos: 798682256 一直没有改变,说明从库没有回放主库的日志,