工具:
percona-toolkit的pt-table-checksum和pt-table-sync(version 3.0.13)
模拟场景:
从库插入数据,模拟数据不同步
mysql> insert into test.test values(118,118,118);
Query OK, 1 row affected (0.09 sec)
pt-table-checksum校验数据的一致性:
[root@rhel6 ~]# pt-table-checksum --no-check-binlog-format --replicate=test.checksum --databases=test -uroot -pxxx --recursion-method=hosts
Checking if all tables can be checksummed ...
Starting checksum ...
TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE
07-30T01:39:01 0 0 3 0 1 0 0.187 test.emp1
07-30T01:39:01 0 0 1 0 1 0 0.057 test.heartbeat
07-30T01:39:01 0 0 6 0 1 0 0.036 test.t
07-30T01:39:01 0 0 0 0 1 0 0.037 test.tab_dept2
07-30T01:39:01 0 0 0 0 1 0 0.033 test.tab_dept4
07-30T01:39:01 0 1 27 1 1 0 0.033 test.test
07-30T01:39:01 0 0 0 0 1 0 0.035 test.test0701
07-30T01:39:01 0 0 2 0 1 0 0.034 test.test0705
07-30T01:39:01 0 0 1 0 1 0 0.033 test.test0706
07-30T01:39:01 0 0 6 0 1 0 0.033 test.test_1
执行完也可以通过从数据库的test.checksum表查询结果(由pt-table-checksum指定)
mysql> select * from test.checksum;
+------+-----------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db | tbl | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts |
+------+-----------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| test | emp1 | 1 | 0.033599 | NULL | NULL | NULL | 7afe7c2c | 3 | 7afe7c2c | 3 | 2019-07-30 04:47:25 |
| test | heartbeat | 1 | 0.018314 | NULL | NULL | NULL | ac5c0af5 | 1 | ac5c0af5 | 1 | 2019-07-30 04:47:25 |
| test | t | 1 | 0.004928 | NULL | NULL | NULL | 5341fe78 | 6 | 5341fe78 | 6 | 2019-07-30 04:47:25 |
| test | tab_dept2 | 1 | 0.00501 | NULL | NULL | NULL | 0 | 0 | 0 | 0 | 2019-07-30 04:47:25 |
| test | tab_dept4 | 1 | 0.004651 | NULL | NULL | NULL | 0 | 0 | 0 | 0 | 2019-07-30 04:47:25 |
| test | test | 1 | 0.00477 | NULL | NULL | NULL | a7cc6b0e | 28 | 699c19b5 | 27 | 2019-07-30 04:47:25 |
| test | test0701 | 1 | 0.005383 | NULL | NULL | NULL | 0 | 0 | 0 | 0 | 2019-07-30 04:47:25 |
| test | test0705 | 1 | 0.005006 | NULL | NULL | NULL | 622a65b2 | 2 | 622a65b2 | 2 | 2019-07-30 04:47:25 |
| test | test0706 | 1 | 0.005425 | NULL | NULL | NULL | 7ce7d88c | 1 | 7ce7d88c | 1 | 2019-07-30 04:47:25 |
| test | test_1 | 1 | 0.004818 | NULL | NULL | NULL | 2748e8ae | 6 | 2748e8ae | 6 | 2019-07-30 04:47:25 |
+------+-----------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
10 rows in set (0.05 sec)
pt-table-sync修复不一致数据:
首先使用--print参数打印出需要修改的数据(下面列举了两种查询方式):
[root@rhel6 ~]# pt-table-sync --replicate=test.checksum h=192.169.10.241,u=root,p=xxx --print --recursion-method=hosts
DELETE FROM `test`.`test` WHERE `id`='118' LIMIT 1 /*percona-toolkit src_db:test src_tbl:test src_dsn:h=192.169.10.241,p=...,u=root dst_db:test dst_tbl:test dst_dsn:P=3307,h=192.169.10.241,p=...,u=root lock:1 transaction:1 changing_src:test.checksum replicate:test.checksum bidirectional:0 pid:14556 user:root host:rhel6*/;
[root@rhel6 ~]# pt-table-sync --replicate=test.checksum --sync-to-master h=192.169.10.241,u=root,p=xxx,P=3307 --print
DELETE FROM `test`.`test` WHERE `id`='118' LIMIT 1 /*percona-toolkit src_db:test src_tbl:test src_dsn:P=3306,h=192.169.10.241,p=...,u=root dst_db:test dst_tbl:test dst_dsn:P=3307,h=192.169.10.241,p=...,u=root lock:1 transaction:1 changing_src:test.checksum replicate:test.checksum bidirectional:0 pid:14558 user:root host:rhel6*/;
将--print改成--execute可以执行修改,但是需要足够的权限:
[root@rhel6 ~]# pt-table-sync --replicate=test.checksum --sync-to-master h=192.169.10.241,u=root,p=xxx,P=3307 --execute
DELETE command denied to user 'root'@'192.169.10.241' for table 'test' [for Statement "DELETE FROM `test`.`test` WHERE `id`='118' LIMIT 1 /*percona-toolkit src_db:test src_tbl:test src_dsn:P=3306,h=192.169.10.241,p=...,u=root dst_db:test dst_tbl:test dst_dsn:P=3307,h=192.169.10.241,p=...,u=root lock:1 transaction:1 changing_src:test.checksum replicate:test.checksum bidirectional:0 pid:16086 user:root host:rhel6*/"] at line 10916 while doing test.test on 192.169.10.241
pt-table-sync的语法为:
pt-table-sync [OPTIONS] DSN [DSN]
if DSN has a t part, sync only that table:
if 1 DSN:
if --sync-to-master:
The DSN is a slave. Connect to its master and sync.
if more than 1 DSN:
The first DSN is the source. Sync each DSN in turn.
else if --replicate:
if --sync-to-master:
The DSN is a slave. Connect to its master, find records
of differences, and fix.
else:
The DSN is the master. Find slaves and connect to each,
find records of differences, and fix.
else:
if only 1 DSN and --sync-to-master:
The DSN is a slave. Connect to its master, find tables and
filter with --databases etc, and sync each table to the master.
else:
find tables, filtering with --databases etc, and sync each
DSN to the first.
(参考:https://www.percona.com/doc/percona-toolkit/3.0/pt-table-sync.html)
pt-table-checksum遇到的一个报错:
[root@rhel6 ~]# pt-table-checksum --no-check-binlog-format --replicate=test.checksum --databases=test -uroot -pxxx
Checking if all tables can be checksummed ...
Starting checksum ...
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE
07-30T04:47:25 0 0 3 0 1 0 0.052 test.emp1
07-30T04:47:25 0 0 1 0 1 0 0.034 test.heartbeat
07-30T04:47:25 0 0 6 0 1 0 0.021 test.t
07-30T04:47:25 0 0 0 0 1 0 0.023 test.tab_dept2
07-30T04:47:25 0 0 0 0 1 0 0.020 test.tab_dept4
07-30T04:47:25 0 0 27 0 1 0 0.023 test.test
07-30T04:47:25 0 0 0 0 1 0 0.021 test.test0701
07-30T04:47:25 0 0 2 0 1 0 0.023 test.test0705
07-30T04:47:25 0 0 1 0 1 0 0.023 test.test0706
07-30T04:47:25 0 0 6 0 1 0 0.022 test.test_1
Diffs cannot be detected because no slaves were found.
这个报错的意思是master没有找到slave
遇到这个报错时可以通过指定--recursion-method
recursion-method:
1)processlist(默认),是通过在主库上show processlist找到slave host的值(这个例子中我的主从是在一台服务器上,可能是因为这个原因show processlist的host定位不到从库)
mysql> show processlist;
+-----+------+----------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+------+----------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
| 15 | root | localhost | test | Sleep | 0 | | NULL |
| 42 | repl | 192.169.10.241:65236 | NULL | Binlog Dump GTID | 42840 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 121 | root | localhost | NULL | Query | 0 | starting | show processlist |
+-----+------+----------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
3 rows in set (0.02 sec)
)
2)hosts,是通过在主库上show slave hosts找到slave host的值,前提是从库配置文件里面配置了自己的地址和端口
report_host = xxx
report_port = xxx
mysql> show slave hosts;
+-----------+----------------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+----------------+------+-----------+--------------------------------------+
| 10242 | 192.169.10.241 | 3307 | 10241 | d812b5b4-b13c-11e9-a0a8-000c296ee978 |
+-----------+----------------+------+-----------+--------------------------------------+
1 row in set (0.02 sec)