中午趁着吃饭时间去拔牙,刚做好准备,老大打来电话,说有台 db server 意外断电,简单沟通下,把机器先起起来,等回来一看,发现问题没那么简单。
断电的是一台 master,看 slave,发现 slave 已经停止正常同步。报错如下:
Slave_IO_State:
Master_Host: 10.207.0.83
Master_User: repl
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-bin.014089
Read_Master_Log_Pos: 52299
Relay_Log_File: relay-log.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.014089
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 1
Exec_Master_Log_Pos: 52299
Relay_Log_Space: 106
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position'
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
190411 13:12:13 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( serv
er_errno=1236)
190411 13:12:13 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to sta
rt replication from impossible position', Error_code: 1236
190411 13:12:13 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.014089', position 52299
常规执行 change master to master_log_file ='mysql-bin.014089' ,master_log_pos = 52299;发现报错并没解除,同步依旧故障。按理说可以的鸭,后来发现,master 没有双1设置。导致意外断电后 binlog 并没有即使完全刷入磁盘。
对应两个参数
sync_binlog = 0
innodb_flush_log_at_trx_commit = 2
在这种情况下,会使得 mysql 找不到正确合适的pos,需要自行查找。
解析 mysql-bin.014089 binglog 找到 ROLLBACK 的位置(也是这个文件最后的位置)。
mysqlbinlog mysql-bin.014089 --base64-output=DECODE-ROWS -vv|less
# at 3851
#190411 12:11:59 server id 11243327 end_log_pos 3910 Query thread_id=71054553 exec_time=4294967295 error_code=0
SET TIMESTAMP=1554955919/*!*/;
BEGIN
/*!*/;
# at 3910
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
紧接着是后面的文件,mysql-bin.014090 ,ok,pos 就是 4 了。
mysqlbinlog mysql-bin.014090 --base64-output=DECODE-ROWS -vv|less
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#190411 13:11:13 server id 613306 end_log_pos 106 Start: binlog v 4, server v 5.1.60-log created 190411 13:11:13 at startup
ROLLBACK/*!*/;
# at 106
#190411 13:11:36 server id 613306 end_log_pos 178 Query thread_id=119 exec_time=0 error_code=0
SET TIMESTAMP=1554959496/*!*/;
SET @@session.pseudo_thread_id=119/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
change master to master_log_file ='mysql-bin.014090' ,master_log_pos = 4;
然后再看下同步状态,已经双YES,到此为止,问题解决。
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Queueing master event to the relay log
Master_Host: 10.207.0.83
Master_User: repl
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-bin.014090
Read_Master_Log_Pos: 733059341
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 10554324
Relay_Master_Log_File: mysql-bin.014090
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 10554179
Relay_Log_Space: 733059635
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 12447
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
紧接着把master设置成双1,避免问题再次发生。
set global innodb_flush_log_at_trx_commit=1 ;
set global sync_binlog=1 ;