碰到 mysql 主从复制报 ‘max_allowed_packet error’ bug 一例

    前几天在生产环境大批量迁移数据时,碰到一个mysql的主从同步中断的问题,在使用load data local infile  " " into table   导入数据时,导的数据确实也比较大,大概是1.2G一个文件,导了5个这样大小的文件,在导入的过程中,一个从库发生一个io复制线程中断问题,如下:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mysql_bin.214495' at 3161501542, the last event read from '/data/logs01/mysql_bin.214495' at 123, the last byte read from '/data/logs01/mysql_bin.214495' at 3161501561.'

详细信息如下:

Master_Log_File: mysql_bin.214495
Read_Master_Log_Pos: 3161501542
Relay_Log_File: relaylog.075545
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql_bin.214495
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table: mysql.%,test.%
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 3161501542
Relay_Log_Space: 1120
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mysql_bin.214495' at 3161501542, the last event read from '/data/logs01/mysql_bin.214495' at 123, the last byte read from '/data/logs01/mysql_bin.214495' at 3161501561.'
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 4136
Master_UUID: 
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 220323 20:18:54
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)

1、在从库尝试修改max_allowed_packet参数为1G,开启复制,依然报上面的错。

2、然后,检查分析主库binlog,使用mysqlbinlog -vv 解析mysql 发生错误时的binlog,找到错误pos位置3161501542,发现这已经到这个binlog的最下面了,此binlog 文件mysql_bin.214495 最大的pos应该就是3161501542。而报错的3161501561在mysql_bin.214495 中是找不到的。

当时的解决方案:由于binlog中,pos位点3161501542已经是最后一个事务的位点了, 恢复同步的方案,直接在从库指向下一个binlog文件的起始位置,开启复制同步就好了。

获取下个binlog文件的起始位置的命令:

show binlog events in 'mysql_bin.214496';
root@(none) 06:10:27>show binlog events in 'mysql_bin.001736' limit 10;
+------------------+-------+----------------+-----------+-------------+--------------------------------------------------------------------------------------------------+
| Log_name         | Pos   | Event_type     | Server_id | End_log_pos | Info                                                                                             |
+------------------+-------+----------------+-----------+-------------+--------------------------------------------------------------------------------------------------+
| mysql_bin.001736 |     4 | Format_desc    |      7104 |         123 | Server ver: 5.7.30-log, Binlog ver: 4                                                            |
| mysql_bin.001736 |   123 | Previous_gtids |      7104 |         154 |                                                                                                  |
| mysql_bin.001736 |   154 | Anonymous_Gtid |     12132 |         219 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'                                                             |
| mysql_bin.001736 |   219 | Query          |     12132 |         282 | BEGIN                                                                                            |
| mysql_bin.001736 |   282 | Table_map      |     12132 |         421 | table_id: 300 (fan.test01) |
| mysql_bin.001736 |   421 | Write_rows     |     12132 |        8618 | table_id: 300                                                                                    |
| mysql_bin.001736 |  8618 | Write_rows     |     12132 |       16815 | table_id: 300                                                                                    |
| mysql_bin.001736 | 16815 | Write_rows     |     12132 |       25012 | table_id: 300                                                                                    |
| mysql_bin.001736 | 25012 | Write_rows     |     12132 |       33209 | table_id: 300                                                                                    |
| mysql_bin.001736 | 33209 | Write_rows     |     12132 |       41406 | table_id: 300                                                                                    |
+------------------+-------+----------------+-----------+-------------+--------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)

那binlog开始位点就是'mysql_bin.001736' 的 154。

也就是 binlog文件的第二个events 的 End_log_pos。

在一个新建的主从集群 验证下:

root@sysbench 06:18:47>show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql_bin.000001 |      154 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+




root@sysbench 06:18:53>show binlog events in 'mysql_bin.000001' limit 10;
+------------------+-----+----------------+-----------+-------------+---------------------------------------+
| Log_name         | Pos | Event_type     | Server_id | End_log_pos | Info                                  |
+------------------+-----+----------------+-----------+-------------+---------------------------------------+
| mysql_bin.000001 |   4 | Format_desc    |    199281 |         123 | Server ver: 5.7.31-log, Binlog ver: 4 |
| mysql_bin.000001 | 123 | Previous_gtids |    199281 |         154 |                                       |
+------------------+-----+----------------+-----------+-------------+---------------------------------------+

总结经验:

每个binlog的开始位点:binlog文件的第二个events 的 End_log_pos。

事后分析:

在binlog中找不到binlog复制的pos位点,觉得有些奇怪,就在mysql官方社区 bug 系统搜索,发现MySQL Bugs: #101795::

MySQL Bugs: #101795: slave io_thread error,Relay log write failure: could not queue event from master

查看对比mysql版本,刚好主库版本是mysql 5.7.28,一个从库版本是mysql 5.7.28;另外一个出问题的从库版本是mysql 5.7.30。5.7.28 版本的没有主从同步中断,而 mysql 5.7.30 却报主从同步中断,对应bug的版本,看来确实是这个bug了。

但是官方在bug回复中,却并没有给出很好的解释和修复。只是说是因为binlog pos大于4G导致的。但是实际binlog并没有大于4G。

建议:尽量减少使用大事务,在使用load data local infile 尽量切分为小文件。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值