备库挂了:slave io/sql thread error
0.关键点:
1.show slave status\G; 查看挂了的原因
2.一般处理完后要tcp校验一下
3.如果是宕机或者实例重启,在主库执行start slave前要在备库上看看有没有回滚进程ps aux | grep magiclog ,如果有 magiclog 进程在执行说明在回滚。回滚过程中不能执行start slave,不然会报各种slave err1.报错:文件中位点>文件大小
Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size; the first event 'mysql-bin.000020' at 307263712, the last event read from '/u01/my3308/log/mysql-bin.000020' at 4, the last byte read from '/u01/my3308/log/mysql-bin.000020' at 4.'
解决:重指位点 //MASTER_LOG_FILE为当前备份Relay_Master_Log_File的下一个,MASTER_LOG_POS=4
stop slave ;
CHANGE MASTER TO MASTER_LOG_FILE=’mysql-bin.016729’, MASTER_LOG_POS=4;
start slave;
select sleep(0.1);
show slave status\G;2.没有报错信息,只是显示 Slave_IO_Running: No Slave_SQL_Running: No
用start slave;启动时报错如下
1.Could not initialize master info structure; more error messages can be found in the MySQL error log
看/u02/my3308/log/alert.log日志,发现错误如下
[ERROR] Error looking for file after /u01/my3308/log/relaylog.000064.
仔细看位点会发现很奇怪的事情
Read_Master_Log_Pos: 95926035
Exec_Master_Log_Pos: 95969857
执行位点比读master的位点还大了解决:没办法,只能reset然后重指位点了 记录好Relay_Master_Log_File: 和 Exec_Master_Log_Pos:
reset slave;
CHANGE MASTER TO MASTER_LOG_FILE=’mysql-bin.001458’, MASTER_LOG_POS=266387470;
start slave;
select sleep(0.1);
show slave status\G;3.Slave failed to initialize relay log info structure from the repository
解决:同样的处理方法4.如下错误
Worker 15 failed executing transaction '' at master log mysql-bin.000012, end_log_pos 468767971; Could not execute Write_rows_v1 event on table cbu_nebulabody_00.short_url_0011; Duplicate entry '5730576' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 468767971
或者
Last_SQL_Errno: 1032
Could not execute Update_rows event on table cbu_vaseq7.vas_eq_enquiries_0102; Can't find record in 'vas_eq_enquiries_0102', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log
mysql-bin.000417, end_log_pos 44444276, Error_code: 1032
或者
Cannot execute the current event group in the parallel mode. Encountered event Xid, relay-log name /u01/my3318/log/relaylog.000030, position 372 which prevents execution of this event group in parallel mode. Reason: the event is a part of a group that is unsupported in the parallel execution mode.
解决:直接跳过:sql_slave_skip_counter=1是指跳过一个事务的意思
stop slave;set global sql_slave_skip_counter=1;start slave;select sleep(0.1);show slave status\G;
5.其他报错,可以选择跳过错误
show slave status\G;
show variables like ‘%slave_parallel_workers’;
set global slave_parallel_workers = 0;
set global slave_exec_mode = smart;
start slave;
select sleep(0.1);
show slave status\G;
stop slave;
set global slave_exec_mode = strict;
set global slave_parallel_workers = 16;
start slave;
select sleep(0.1);
show slave status\G;6.GTID报错