生产排查 | MySQL主从同步时报错1864之slave_pending_jobs_size_max设置生产排查

大家都知道,MySQL 的各参数很多,设置不恰当,会引起各种异常;

本文根据一则实际生产异常场景,即从库的同步异常报错Last_Errno: 1864,来介绍slave_pending_jobs_size_max参数的调整。

问题现象描述

生产环境上从库出现SQL进程停止的异常,错误信息如下:

mysql> show slave status \G;                                                                                                                                                                 *************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.11
                  Master_User: replica_01
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.004783
          Read_Master_Log_Pos: 3681586
               Relay_Log_File: DB2-relay-bin.000007
                Relay_Log_Pos: 671091671
        Relay_Master_Log_File: mysql-bin.004782
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1864
                   Last_Error: Cannot schedule event Update_rows, relay-log name ./DB2-relay-bin.000007, position 671094876 to Worker thread because its size 50450016 exceeds 16777216 of slave_pending_jobs_size_max.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 671091458
              Relay_Log_Space: 1077425236
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1864
               Last_SQL_Error: Cannot schedule event Update_rows, relay-log name ./DB2-relay-bin.000007, position 671094876 to Worker thread because its size 50450016 exceeds 16777216 of slave_pending_jobs_size_max.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 17860
                  Master_UUID: 938bda2c-41ed-11e6-96c1-0050569c39f0
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 210608 15:02:19
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

ERROR: 
No query specified

根据报错信息,可以很明显看出从库的Slave_SQL_Running: No线程异常,原因为Pending的事件所占用的内存为50450016(约48MB)大于slave_pending_jobs_size_max参数的默认设置。
Last_Error: Cannot schedule event Update_rows, relay-log name ./DB2-relay-bin.000007, position 671094876 to Worker thread because its size 50450016 exceeds 16777216 of slave_pending_jobs_size_max.

–根据错误提示,确认下当前slave_pending_jobs_size_max参数的默认设置为16777216(16MB)
mysql> show variables like ‘%pending%’;
±----------------------------±---------+
| Variable_name | Value |
±----------------------------±---------+
| slave_pending_jobs_size_max | 16777216 |
±----------------------------±---------+
1 row in set (0.00 sec)

–原因分析

MySQL 5.7开启了多线程复制,与多线程复制相关的有一个参数slave_pending_jobs_size_max,错误日志中提示这个参数设置的小。

此参数在MySQL 5.6以后引入,默认单位是字节。如果没有开启多线程复制,则此参数无用。参数的最小值是1024字节,默认值是16MB,最大值是18446744073709551615。
注意,从库中的此参数的值需等于或大于主库的max_allowed_packet参数值,否则从库的工作队列可能会变满。

官方建议
On a multi-threaded replica (with slave_parallel_workers > 0), ensure that the slave_pending_jobs_size_max system variable is set to a value equal to or greater than the setting for the max_allowed_packet system variable on the source.

The replica actually accepts packets up to the limit set by its slave_max_allowed_packet setting, which defaults to the maximum setting of 1GB, to prevent a replication failure due to a large packet. However, the value of slave_pending_jobs_size_max controls the memory that is made available on the replica to hold incoming packets. The specified memory is shared among all the replica worker queues.

在多线程复制时,在队列中Pending的事件所占用的最大内存,默认为16M,如果内存充足(本案例中服务器内存为64GB),或者延迟较大时,可适当调大,最大可设置为1GB;

再看有啥建议:
The value of slave_pending_jobs_size_max is a soft limit, and if an unusually large event (consisting of one or multiple packets) exceeds this size, the transaction is held until all the replica workers have empty queues, and then processed. All subsequent transactions are held until the large transaction has been completed. So although unusual events larger than slave_pending_jobs_size_max can be processed, the delay to clear the queues of all the replica workers and the wait to queue subsequent transactions can cause lag on the replica and decreased concurrency of the replica workers. slave_pending_jobs_size_max should therefore be set high enough to accommodate most expected event sizes.

简翻译下:
slave_pending_jobs_size_max的值是一软限制,如果一个异常大的事件(包含一个或多个数据包)超过这个大小,事务将被保留,直到所有的replica workers都有空队列,然后再进行处理。所有后续事务都被保存,直到大事务完成。尽管可以处理大于slave_pending_jobs_size_max的异常事件,但清除所有replica worker队列的延迟和后续事务排队的等待可能会导致replica延迟,并降低replica workers的并发性。因此,应该将Slave_pending_jobs_size_max设置得足够高,以适应大多数预期的事件大小。

–从库查看两个并行度参数(并行度为64)
mysql> show variables like ‘%parallel%’;
±-----------------------±--------------+
| Variable_name | Value |
±-----------------------±--------------+
| slave_parallel_type | LOGICAL_CLOCK |
| slave_parallel_workers | 64 |
±-----------------------±--------------+
2 rows in set (0.00 sec)

查看主库的max_allowed_packet参数值为16MB
mysql> show variables like ‘%allowed_packet%’;
±-------------------------±-----------+
| Variable_name | Value |
±-------------------------±-----------+
| max_allowed_packet | 16777216 |
| slave_max_allowed_packet | 1073741824 |
±-------------------------±-----------+
2 rows in set (0.01 sec)

–问题解决
设置slave_pending_jobs_size_max的默认值16777216(16MB),此参数单位为MB,大于50450016(48MB)即可。这里设置为100MB(104857600=10010241024字节)
set global slave_pending_jobs_size_max=104857600;

为了是从库重启后slave_pending_jobs_size_max=104857600依旧生效,在初始参数文件/etc/my.cnf中添加如下参数

[mysqld]
global slave_pending_jobs_size_max=104857600

注意:需重启mysql进程才能生效

从库中修改global slave_pending_jobs_size_max为104857600(100MB)的操作如下:

–执行步骤为
mysql> stop slave;
Query OK, 0 rows affected (0.04 sec)

mysql> set global slave_pending_jobs_size_max=104857600;
Query OK, 0 rows affected (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.61 sec)

mysql> show slave status \G;

经查看,同步恢复正常。

【参考】
https://dev.mysql.com/doc/refman/8.0/en/replication-features-max-allowed-packet.html
**
文章至此。**
以下为个人公众号,欢迎扫码关注:
image.png

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值