测试原由

随着PXC的逐步上线。线上数据库的同步方式慢慢由之前的STATEMENT模式转换到了ROW模式。由于同步方式的改变引发了一些同步问题。

测试目的

一定程度上解决ROW模式下主从同步的问题。作为以后PXC集群down掉,人工修复的操作文档。


测试环境

masterold02:7301

masterold03:7302

skavetest178:7303

主库操作

          vim my.cnf 加入下一面一句

          binlog_format=ROW  数据库binlog使用ROW模式同步

          分别赋予丛库同步用户的权限

grant all on *.* to okooo_rep@'192.168.%.%' identified by 'Bjfcmlc@Mhxzkhl';

flush privileges;



测试开始

测试基础同步功能

?.让test178作为从去同步old02的数据

CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',

MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;

          ? 查看主从状态,我们看到很快test178就可以和old02保持一致了。

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.8.72

Master_User: okooo_rep

Master_Port: 7301

Connect_Retry: 60

Master_Log_File: logbin.000006

Read_Master_Log_Pos: 332

Relay_Log_File: relay.000007

Relay_Log_Pos: 475

Relay_Master_Log_File: logbin.000006

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

           ?  让test178作为从去同步old03的数据,我们看到很快test178也和old03保持一致了。

stop slave;


CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;



start slave;


mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.8.73

Master_User: okooo_rep

Master_Port: 7302

Connect_Retry: 60

Master_Log_File: logbin.000005

Read_Master_Log_Pos: 332

Relay_Log_File: relay.000006

Relay_Log_Pos: 475

Relay_Master_Log_File: logbin.000005

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

总结:基础同步测试完成,说明在数据库新搭建结束的时候数据库中数据一致的情况下,test178可以正常的和old02和old03中任意主库同步数据。


写入测试

          ? 分别在old02,old03上建立新的数据库和表

create database row_slave;


CREATE TABLE `row_test` (

`id` int(10) unsigned NOT NULL,

`hostname` varchar(20) NOT NULL default '',

`create_time` datetime NOT NULL default '0000-00-00 00:00:00',

`update_time` datetime NOT NULL default '0000-00-00 00:00:00',

PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 ;

            ? old02写入数据

insert into row_test values(1,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');

insert into row_test values(2,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');

insert into row_test values(3,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');

insert into row_test values(4,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');

            ?查看old02,old03,test178 皆可以查出来

mysql> select * from row_test;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

+----+----------+---------------------+---------------------+

             ?old03写入数据,此时old03(主)和test178(丛)在同步

insert into row_test values(5,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');

insert into row_test values(6,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');

             ?查看old03,test178 皆可查出。此时test178和 old02数据已经不一致了,丛库比old02多出2条数据id=5,6。

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

+----+----------+---------------------+---------------------+

             ?old02写入数据 此时主从库还是test178和old03在同步,和old02没有关系

insert into row_test values(7,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');

insert into row_test values(8,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');

            ?查看 old02的binlog 来找到插入id =7,8的 pos点

cd /home/okooo/apps/tmp_slave01/logs

../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007

# at 1399

#131211 11:36:42 server id 1287301 end_log_pos 1472 Query thread_id=5 exec_time=0 error_code=0

SET TIMESTAMP=1386733002/*!*/;

BEGIN

/*!*/;

# at 1472

# at 1529

#131211 11:36:42 server id 1287301 end_log_pos 1529 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 11:36:42 server id 1287301 end_log_pos 1585 Write_rows: table id 33 flags: STMT_END_F

### INSERT INTO row_slave.row_test

### SET

### @1=7 /* INT meta=0 nullable=0 is_null=0 */

### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 1585

#131211 11:36:42 server id 1287301 end_log_pos 1612 Xid = 40

COMMIT/*!*/;

# at 1612

#131211 11:36:43 server id 1287301 end_log_pos 1685 Query thread_id=5 exec_time=0 error_code=0

SET TIMESTAMP=1386733003/*!*/;

BEGIN

/*!*/;

# at 1685

# at 1742

#131211 11:36:43 server id 1287301 end_log_pos 1742 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 11:36:43 server id 1287301 end_log_pos 1798 Write_rows: table id 33 flags: STMT_END_F

### INSERT INTO row_slave.row_test

### SET

### @1=8 /* INT meta=0 nullable=0 is_null=0 */

### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 1798

#131211 11:36:43 server id 1287301 end_log_pos 1825 Xid = 41

COMMIT/*!*/;

DELIMITER ;

# End of log file

            ?改变test178的同步点和old02同步

stop slave;


CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=1399;



start slave;


show slave status\G

           ?发现old02数据改变以后丛库同步了old02的数据,这时候的test178(丛库) 已经拥有全部数据了。 其中id in(1,2,3,4)3库共有的。 id in(5,6 )old03独有的  id in (7,8) odl03独有的。      

mysql> select * from row_test;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 7 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |

| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |

+----+----------+---------------------+---------------------+


总结:确认丛库表比主库表少数据不影响新数据写入



更新测试

            ?改变一条old02和test78都存在的数据 此时test178和old02同步数据,主从依然同步

update row_test set update_time =now() ,hostname ='old021' where id=7;

             ?改变一条old03和test178都有的数据此时test178和old02同步数据,没有和old03同步,改变old03的数据为下面做准备

update row_test set update_time =now() ,hostname ='old031' where id=5;

            ? 查看old03的binlog,寻找要同步的POS点

../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000006

# at 1825

#131211 15:20:16 server id 1807302 end_log_pos 1906 Query thread_id=4 exec_time=0 error_code=0

SET TIMESTAMP=1386746416/*!*/;

SET @@session.time_zone='SYSTEM'/*!*/;

BEGIN

/*!*/;

# at 1906

# at 1963

#131211 15:20:16 server id 1807302 end_log_pos 1963 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 15:20:16 server id 1807302 end_log_pos 2048 Update_rows: table id 33 flags: STMT_END_F

### UPDATE row_slave.row_test

### WHERE

### @1=5 /* INT meta=0 nullable=0 is_null=0 */

### @2='old03' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### SET

### @1=5 /* INT meta=0 nullable=0 is_null=0 */

### @2='old031' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 15:20:16 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 2048

#131211 15:20:16 server id 1807302 end_log_pos 2075 Xid = 32

COMMIT/*!*/;

DELIMITER ;

# End of log file

             ?改变test178的同步点和old03同步

stop slave;



CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000006',MASTER_LOG_POS=1825;



start slave;


show slave status\G

                ?查看test178数据,发现更新成功 (确认修改不同行数据的时候,同时多个主同步数据不会相互牵制。深层理解,主从同步不会校验表数据是否一致和行数据是否一致。之后会继续验证这个观点)

mysql> select * from row_test;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 5 | old031 | 2013-12-11 02:00:00 | 2013-12-11 15:20:16 |

| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 7 | old021 | 2013-12-11 03:00:00 | 2013-12-11 15:15:34 |

| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |

+----+----------+---------------------+---------------------+

              ?修改在3个库上全都有的数据 首先改old03上的 id=1的数据

update row_test set update_time =now() ,hostname ='old032' where id=1;

               ?主丛库同步数据以后 test178和old03在同步数据

mysql> select * from row_test where id=1;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old032 | 2013-12-11 00:00:00 | 2013-12-11 15:49:53 |

+----+----------+---------------------+---------------------+

                ?修改old02上同样的数据。

update row_test set update_time =now() ,hostname ='old022' where id=1;

                ? 查看old02上的binlog

../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007

# at 2075

#131211 15:51:15 server id 1287301 end_log_pos 2156 Query thread_id=9 exec_time=0 error_code=0

SET TIMESTAMP=1386748275/*!*/;

BEGIN

/*!*/;

# at 2156

# at 2213

#131211 15:51:15 server id 1287301 end_log_pos 2213 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 15:51:15 server id 1287301 end_log_pos 2298 Update_rows: table id 33 flags: STMT_END_F

### UPDATE row_slave.row_test

### WHERE

### @1=1 /* INT meta=0 nullable=0 is_null=0 */

### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### SET

### @1=1 /* INT meta=0 nullable=0 is_null=0 */

### @2='old022' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 15:51:15 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 2298

#131211 15:51:15 server id 1287301 end_log_pos 2325 Xid = 73

COMMIT/*!*/;

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

              ?修改test178到old02的同步点(主从和old02同步)

stop slave;


CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=2075;



start slave;


show slave status\G

                ?发现数据可以同步过来(old02的数据 覆盖了old03的数据,在一开始我们分析第一个binlog的时候就已经发现,ROW的同步是一个全行的update操作)

mysql> select * from row_test where id=1;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old022 | 2013-12-11 00:00:00 | 2013-12-11 15:51:15 |

+----+----------+---------------------+---------------------+

总结:同时多个主同步数据不会相互牵制。深层理解,主从同步不会校验表数据是否一致和行数据是否一致。ROW的同步是一个全行的update操作。属于无脑执行,不会判断原始数据内容。


删除测试

              ?删除test178的id=1的数据

delete from row_test where id=1;

              ?更新old02的id=1的数据(主库和old02在同步数据)

update row_test set update_time =now() ,hostname ='old023' where id=1;


mysql> select * from row_test where id=1;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |

+----+----------+---------------------+---------------------+

                ?在test178上看丛库同步状态

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.8.72

Master_User: okooo_rep

Master_Port: 7301

Connect_Retry: 60

Master_Log_File: logbin.000007

Read_Master_Log_Pos: 3078

Relay_Log_File: relay.000002

Relay_Log_Pos: 500

Relay_Master_Log_File: logbin.000007

Slave_IO_Running: Yes

Slave_SQL_Running: No

Replicate_Do_DB:

Replicate_Ignore_DB: mysql

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 1032

Last_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549

Skip_Counter: 0

Exec_Master_Log_Pos: 2325

Relay_Log_Space: 1399

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 1032

Last_SQL_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549

Replicate_Ignore_Server_Ids:

Master_Server_Id: 1287301

错误解释:主从数据库中表的数据不一致导致。进过上面的实验我们发现,只有delete数据才会出现这个错误。

现在为止重现了schedule的PXC倒掉以后的备份库同步失败的现象。

总结:当数据不存在丛库的时候,主库的更新无法执行。


测试总结:当丛库上表的数据和主库不一致的时候,可以执行insert操作。update操作会把最后一次执行的记录覆盖到丛库上。delete的数据如果不存在的话,则detele失败,导致主从不同步。


修复方式

1.暴力的方法,也是对数据重要的方法


stop slave;


SET GLOBAL sql_slave_skip_counter=1; 跳过一句丛库同步


start slave;


2.针对小量数据比较好的方式,手动修改丛库数据。以为在上面我们知道ROW模式检验数据一致性,只是覆盖数据。所以,我们只要补上缺失的数据即可。

insert into row_test values(1,'new_row',now(),now());


mysql> select * from row_test where id=1; 我们加入了一条自己编的数据 hostname=‘new_row’

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | new_row | 2013-12-11 08:49:37 | 2013-12-11 08:49:37 |

+----+----------+---------------------+---------------------+


stop slave;


start slave;


mysql> select * from row_test where id=1;  数据变成了同步以后的数据

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |

+----+----------+---------------------+---------------------+


3.最保险的方式,同时也是数据量比较大的时候。我们可以找到主库上写入id=1的这个时间点的binlog,让数据重头开始同步数据。(这个方式时间比较长,基本是基于时间点的增量数据恢复)