以下数据为模拟数据,反映了主从库出现问题,解决问题的过程
由于主服务器异外重启,导致从库报错,错误如下:
show slave status错误:
[root@slave_server~]# mysql -e "show slave status\G"
***************************1. row ***************************
Slave_IO_State:
Master_Host: 10.10.81.148
Master_User: repl_65
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: log-bin.000005
Read_Master_Log_Pos: 352289681
Relay_Log_File: relay-bin.000010
Relay_Log_Pos: 123992642
Relay_Master_Log_File: log-bin.000005
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Error 'Duplicate entry '825126' for key 'PRIMARY'' on query.Default database: 'db_test'. Query: 'insert intoGAME_LOGIN_LOG(userId,identityId,gamePath,loginTime,ip,serverId,platform)VALUES('xxxxxxxxxxxxx','9418812','AA',1372529757894,'204.101.230.214',1,'BB')'
Skip_Counter: 0
Exec_Master_Log_Pos: 352289681
Relay_Log_Space: 351207991
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert:No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data frombinary log: 'Client requested master to start replication from impossibleposition'
Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '825126' for key 'PRIMARY'' onquery. Default database: 'db_test'. Query: 'insert intoGAME_LOGIN_LOG(userId,identityId,gamePath,loginTime,ip,serverId,platform)VALUES('tgsbuga13701297682959418812','9418812','AA',1372529757894,'204.101.230.214',1,'BB')'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
主库中控拉取备份从库
做个标记,随便建立一个库,用于后面同步使用,记住这句话,马上看下master的status的位置
(root:hostname:Mon Jul 1 14:06:13 2013)[(none)]>create database zstest ; show master status\G
Query OK, 1 row affected (0.01 sec)
*************************** 1. row ***************************
File: log-bin.000006
Position: 25619895
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
备份
[root@master_server mysql]# date
Mon Jul 1 14:00:13 PDT 2013
[root@master_server mysql]# /bin/bash /home/databak/dbbak.sh db_test
rm: cannot remove `/home/databak/db_test/db_test-data-dump-2013051505.sql.gz': Operation not permitted
rm: cannot remove `/home/databak/db_test/db_test-data-dump-2013051505.sql.gz.md5': Operation not permitted
[root@master_server mysql]# ll -ltr /home/databak/db_test/ | tail -4
-rw-r--r-- 1 root root 4827 Jul 1 14:00 db_test-strut-dump-2013070114.sql.gz
-rw-r--r-- 1 root root 31310299 Jul 1 14:00 db_test-data-dump-2013070114.sql.gz
-rw-r--r-- 1 root root 38 Jul 1 14:00 db_test-strut-dump-2013070114.sql.gz.md5
-rw-r--r-- 1 root root 42 Jul 1 14:00 db_test-data-dump-2013070114.sql.gz.md5[@zhongkong_36 ~]# scp 10.10.81.148:/home/databak/db_test/db_test-strut-dump-2013070114* /tmp/
db_test-strut-dump-2013070114.sql.gz 100% 4827 4.7KB/s 00:00
db_test-strut-dump-2013070114.sql.gz.md5 100% 38 0.0KB/s 00:00
[@zhongkong_36 ~]# scp 10.10.81.148:/home/databak/db_test/db_test-data-dump-2013070114* /tmp/
db_test-data-dump-2013070114.sql.gz 100% 30MB 10.0MB/s 00:03
db_test-data-dump-2013070114.sql.gz.md5 100% 42 0.0KB/s 00:00
传到slave上
[@zhongkong_36 ~]# scp /tmp/db_test-* 10.10.81.65:/tmp
db_test-data-dump-2013070114.sql.gz 100% 30MB 10.0MB/s 00:03
db_test-data-dump-2013070114.sql.gz.md5 100% 42 0.0KB/s 00:00
db_test-strut-dump-2013070114.sql.gz 100% 4827 4.7KB/s 00:00
db_test-strut-dump-2013070114.sql.gz.md5 100% 38 0.0KB/s 00:00
[@zhongkong_36 ~]#
从masterbinlog里找到这句话,大概从这句话后面开始执行同步即可,不会损失很多数据(当然如果你的库真的访问量1秒就有数万数据,那么可能会损失与create database时插入的数据)
导出当前位置的文件
[root@master_server tmp]# mysqlbinlog log-bin.000006 > /tmp/mysqlbinlog006.log
[root@master_server tmp]# vim mysqlbinlog006.log
# at 25619808
#130701 14:06:22 server id 1 end_log_pos 25619895 Query thread_id=942 exec_time=0 error_code=0
SET TIMESTAMP=1372712782/*!*/;
SET @@session.sql_mode=0/*!*/;
create database zstest
/*!*/;
# at 25619895
#130701 14:06:23 server id 1 end_log_pos 25619969 Query thread_id=893 exec_time=0 error_code=0
SET TIMESTAMP=1372712783/*!*/;
SET @@session.sql_mode=2097152/*!*/;
BEGIN
/*!*/;
先把之前slave的旧内容清理干净:
1)进入mysql,停止slave
stop slave
2)删除原来的slave相关内容,如下面内容
relay-bin.000002
relay-bin.000003
relay-bin.index
relay-log.info
3)进入mysql,删除需要同步的库,
drop database db_test;
4)重置从库
reset slave;
[root@slave_server tmp]# gunzip *.gz
[root@slave_server mysql]# mysql < db_test-struct-dump-2013070114.sql && echo "ok"
[root@slave_server mysql]# mysql < db_test-data-dump-2013070114.sql && echo "ok"
change master to master_user='repl_65',master_password='dzwMbsHp,VIhx}',master_log_file='log-bin.000006',master_log_pos=25619895;
(root:hostname:Mon Jul 1 14:35:00 2013)[(none)]> change master to master_user='repl_65',master_password='dzwMbsHp,VIhx}',master_log_file='log-bin.000006',master_log_pos=25619895;
Query OK, 0 rows affected (0.21 sec)
(root:hostname:Mon Jul 1 14:35:00 2013)[(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
(root:hostname:Mon Jul 1 14:35:05 2013)[(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.10.81.148
Master_User: repl_65
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: log-bin.000006
Read_Master_Log_Pos: 25851768
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 251
Relay_Master_Log_File: log-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error:Error 'Duplicate entry '887816' for key 'PRIMARY'' on query. Default database: 'db_test'. Query: 'in
Skip_Counter: 0
Exec_Master_Log_Pos: 25619895
Relay_Log_Space: 232274
1 row in set (0.00 sec)
发现sql线程还是no的状态,因为主键冲突了,因为主库备份的时候,比createdatabase zstest时间要往后,在这段时间内(视备份时间长短而定)插入了很多数据,所以会导致slave立马停在了主键冲突的地方。如果主键没有冲突,那么就不好办了,因为有可能一张表插入了两次(如果备份时间较长,例如10分钟,那么有可能slave就多了10分钟的数据),还好这个库设计的比较简单,常常插入的数据都有id限制。
(root:hostname:Mon Jul 1 14:35:56 2013)[db_test]> stop slave;
(root:hostname:Mon Jul 1 14:37:06 2013)[db_test]> select * from GAME_LOGIN_LOG where id < 887816 into outfile '/tmp/ 887816.log';
Query OK, 887815 rows affected (1.79 sec)
(root:hostname:Mon Jul 1 14:38:15 2013)[db_test]> truncate table GAME_LOGIN_LOG;
Query OK, 0 rows affected (0.36 sec)
(root:hostname:Mon Jul 1 14:38:46 2013)[db_test]> LOAD DATA INFILE '/tmp/ 887816.log' INTO TABLE GAME_LOGIN_LOG ;
Query OK, 887815 rows affected (31.02 sec)
Records: 887815 Deleted: 0 Skipped: 0 Warnings: 0
(root:hostname:Mon Jul 1 14:39:54 2013)[db_test]> start slave;
Query OK, 0 rows affected (0.00 sec)
(root:hostname:Mon Jul 1 14:40:46 2013)[db_test]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.10.81.148
Master_User: repl_65
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: log-bin.000006
Read_Master_Log_Pos: 25900473
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 96023
Relay_Master_Log_File: log-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
检查主从库数据量是不是一样的,最好检查变化浮动大的表,一致了,这个issue说明解决完毕。
(root:hostname:Mon Jul 1 14:41:21 2013)[db_test]> select count(*) from GAME_LOGIN_LOG;
+----------+
| count(*) |
+----------+
| 888558 |
+----------+
1 row in set (0.19 sec)(root:hostname:Mon Jul 1 14:42:39 2013)[db_test]> select count(*) from GAME_LOGIN_LOG;
+----------+
| count(*) |
+----------+
| 888558 |
+----------+
1 row in set (0.27 sec)
总结:以上方法适用于主键自增,备份时间不长,库不是很大的情况下