1)操作系统
cat /etc/issue
CentOS release 6.9 (Final)
Kernel \r on an \m
cat /proc/version
Linux version 2.6.32-696.el6.x86_64 (mockbuild@c1bm.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Tue Mar 21 19:29:05 UTC 2017
2)mysql数据库版本
mysql --version
mysql Ver 14.14 Distrib 5.6.26, for linux-glibc2.5 (x86_64) using EditLine wrapper
2. 问题描述
2.1 发现问题
数据库监控报某个从库 sql 线程停掉,查看报错如下:
[Slave_IO_State:Waiting for master to send event]
[Master_Host:xxx.xxx.xxx.xxx]
[Master_User:replnew]
[Master_Port:3306]
[Connect_Retry:60]
[Master_Log_File:logs.001426]
[Read_Master_Log_Pos:572692288]
[Relay_Log_File:service-db02-relay-bin.003830]
[Relay_Log_Pos:119533555]
[Relay_Master_Log_File:logs.001426]
[Slave_IO_Running:Yes]
[Slave_SQL_Running:No]
[Replicate_Do_DB:]
[Replicate_Ignore_DB:]
[Replicate_Do_Table:]
[Replicate_Ignore_Table:]
[Replicate_Wild_Do_Table:]
[Replicate_Wild_Ignore_Table:]
[Last_Errno:1197]
[Last_Error:Could not execute Update_rows event on table xxxx.xxxx; Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again, Error_code: 1197; handler error HA_ERR_RBR_LOGGING_FAILED; the event's master log logs.001426, end_log_pos 455315297]
[Skip_Counter:0]
[Exec_Master_Log_Pos:119533397]
[Relay_Log_Space:572692668]
[Until_Condition:None]
[Until_Log_File:]
[Until_Log_Pos:0]
[Master_SSL_Allowed:No]
[Master_SSL_CA_File:]
[Master_SSL_CA_Path:]
[Master_SSL_Cert:]
[Master_SSL_Cipher:]
[Master_SSL_Key:]
[Seconds_Behind_Master:null]
[Master_SSL_Verify_Server_Cert:No]
[Last_IO_Errno:0]
[Last_IO_Error:]
[Last_SQL_Errno:1197]
[Last_SQL_Error:Could not execute Update_rows event on table xxxx.xxxx; Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again, Error_code: 1197; handler error HA_ERR_RBR_LOGGING_FAILED; the event's master log logs.001426, end_log_pos 455315297]
[Replicate_Ignore_Server_Ids:]
[Master_Server_Id:50]
[Master_UUID:e90c3ccf-fcbe-11e6-82f9-0cc47abad838]
[Master_Info_File:/data/mysql/data/master.info]
[SQL_Delay:0]
[SQL_Remaining_Delay:null]
[Slave_SQL_Running_State:]
[Master_Retry_Count:86400]
[Master_Bind:]
[Last_IO_Error_Timestamp:]
[Last_SQL_Error_Timestamp:181218 09:23:30]
[Master_SSL_Crl:]
[Master_SSL_Crlpath:]
[Retrieved_Gtid_Set:]
[Executed_Gtid_Set:]
[Auto_Position:0]
##报错原因很简单,单个事务需要的max_binlog_cache_size 大于该从库当前配置的值,
mysql> show variables like 'max_binlog_cache_size';
+-----------------------+-----------+
| Variable_name | Value |
+-----------------------+-----------+
| max_binlog_cache_size | 335544320 |
+-----------------------+-----------+
1 row in set (0.00 sec)
# 320M
#从库该参数配置为320M,小于当前事务需要的值,所以sql 线程报错
2.2 问题原因
[root@service-db01 mysql]# mysqlbinlog -vvv logs.001426 --start-position=119533397 --stop-datetime="2018-12-18 09:28:37" | grep 'thread_id=' | more
#181218 9:23:00 server id 59158 end_log_pos 119533460 CRC32 0x52df933f Query thread_id=10900970 exec_time=16 error_code=0
SET @@session.pseudo_thread_id=10900970/*!*/;
#181218 9:23:15 server id 59158 end_log_pos 572411873 CRC32 0x56968c97 Query thread_id=10903464 exec_time=11 error_code=0
# 通过查看主库binlog中,两个相邻事务的的 end_log_pos 差值估算该事务产生的binog 大小为(572411873-119533460)/1024/1024=431.8985M,所以从库 max_binlog_cache_size 要设置大于该值。(--start-position="119533397" 为从库最后一个执行成功的事务的最后一个end_log_pos位置)
3. 问题解决
该错误的解决方法很简单,调大从库的 max_binlog_cache_size 重启 sql 线程即可
mysql> set global max_binlog_cache_size=3355443200;
Query OK, 0 rows affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
##但是我们应该同研发沟通,尽可能避免在主库执行大事务(大事务产生过多的锁还有undolog)