mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 1.1.1.1
Master_User: repl_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000036
Read_Master_Log_Pos: 1072421316
Relay_Log_File: xx_DB_0_127-relay-bin.000107
Relay_Log_Pos: 733952079
Relay_Master_Log_File: mysql-bin.000036
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB: xx
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Error 'Duplicate entry '45489-1' for key 'pk_tbl_UserDeviceProfile'' on query. Default database: 'xx'. Query: 'insert tbl_GroupProfileVer(Uin,type,Ver,CreateDate,UpdateDate) value(45489,1,1,1386571046,1386571046)'
Skip_Counter: 0
Exec_Master_Log_Pos: 733951933
Relay_Log_Space: 1072421668
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '45489-1' for key 'pk_tbl_UserDeviceProfile'' on query. Default database: 'xx'. Query: 'insert tbl_GroupProfileVer(Uin,type,Ver,CreateDate,UpdateDate) value(45489,1,1,1386571046,1386571046)'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
今天业务又出事了,累觉不爱……
原因是代码逻辑:在查询不到时,直接insert。这个操作在master是ok的,但如果在slave执行,那么就会导致从节点duplicate错误。
恢复方法:删除重复键,重启slave。然后坐等Seconds_Behind_Master降为0吧。
mysql> delete from tbl_GroupProfileVer where Uin=123;
mysql> stop slave;
mysql> start slave;
另外,还暴露了“mysql复制状态缺乏监控”的问题, 一直想做,也一直停留在“想”……再次印证了墨菲定律
写个监控脚本,丢到crontab(分分钟搞定的事情,非得拖到出问题):
#!/bin/bash
array=($(mysql -uroot -e "show slave status\G" | grep "Running" | awk '{print $2}'))
if [ "${array[0]}" == "Yes" ] && [ "${array[1]}" == "Yes" ]
then
echo "slave is OK"
else
echo "MySQL Slave Error!!!"
fi