pt-table-check和pt-table-sync实践

最新推荐文章于 2025-04-21 15:09:36 发布

ora2pg

最新推荐文章于 2025-04-21 15:09:36 发布

阅读量1.7k

点赞数 1

分类专栏： Mysql

本文链接：https://blog.csdn.net/tnndwdl/article/details/78017014

版权

Mysql 专栏收录该内容

23 篇文章

订阅专栏

Mysql 版本：

mysql> select version();
+------------+
| version()  |
+------------+
| 5.6.37-log |
+------------+
1 row in set (0.00 sec)

在主从库执行如下语句：

mysql>GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE,CREATE,DELETE,INSERT,UPDATE ON *.* TO 'USER'@'MASTER_HOST' identified  by 'PASSWORD';
注：创建用户，这些权限都是必须的，否则后续执行时会报错，当然，如果不想授予这么多权限，那就需要把权限对应的活先自己干了或者直接在命令行指定，比如如果不想设create权限的话，需要自己指定库和表

select	查看所有库的表，原理可加explain选项查看
process	show processlist
super	set binlog_format='statement'
replication slave	show slave hosts

首先测试不存在索引的情况：

主库：
mysql> show master status;
+-----------------+----------+--------------+------------------+-------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-----------------+----------+--------------+------------------+-------------------+
| mybinlog.000001 |      120 | AAA          |                  |                   |
+-----------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> select * from helloa;
+------+------+
| id   | name |
+------+------+
|    1 | jaja |
|    2 | haja |
+------+------+
2 rows in set (0.00 sec)

mysql> show create table helloa \G;
*************************** 1. row ***************************
       Table: helloa
Create Table: CREATE TABLE `helloa` (
  `id` int(11) DEFAULT NULL,
  `name` char(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
从库：
mysql> select * from helloa;
+------+------+
| id   | name |
+------+------+
|    1 | jaja |
|    2 | haja |
|    3 | wowo |
+------+------+
3 rows in set (0.00 sec)

mysql> show create table helloa \G:
*************************** 1. row ***************************
       Table: helloa
Create Table: CREATE TABLE `helloa` (
  `id` int(11) DEFAULT NULL,
  `name` char(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

做主从同步：

mysql> change master to
    -> master_host='192.168.18.50',
    -> master_user='repl',
    -> master_password='repl4slave',
    -> master_port=3307,
    -> master_log_file='mybinlog.000001',
    -> master_log_pos=120;
Query OK, 0 rows affected, 2 warnings (0.11 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.12 sec)

mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.18.50
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: mybinlog.000001
          Read_Master_Log_Pos: 120
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 282
        Relay_Master_Log_File: mybinlog.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: AAA.%
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 120
              Relay_Log_Space: 455
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 503307
                  Master_UUID: 12aaa305-9750-11e7-b2e0-080027eb4c97
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

插入几条数据后，进行主从数据验证：

主库：
mysql> select * from helloa;
+------+------+
| id   | name |
+------+------+
|    1 | jaja |
|    2 | haja |
|    3 | name |
+------+------+
3 rows in set (0.00 sec)
从库：
mysql> select * from helloa;
+------+------+
| id   | name |
+------+------+
|    1 | yuyu |
|    2 | haja |
|    3 | wowo |
|    3 | name |
+------+------+
4 rows in set (0.00 sec)
执行pt-table-checksum
[root@test ~]# pt-table-checksum --no-check-binlog-format --no-check-replication-filters --recursion-method='processlist' --create-replicate-table --replicate=AAA.checksum --databases=AAA -h 192.168.18.50 -P 3307 -u checksum -p 123456
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-18T03:50:29      0      1        3       1       0   0.919 AAA.helloa

执行sync时报错
[root@test ~]# pt-table-sync --replicate=AAA.checksum --recursion-method='processlist' --database=AAA --port=3307 h=192.168.18.50,u=checksum,p=123456 --print
Can't make changes on the master because no unique index exists at /usr/bin/pt-table-sync line 10663.  while doing AAA.helloa on 192.168.18.60

给helloa增加primary key

首先删除从库helloa表上的id的重复数据，然后在主库增加helloa的主键
mysql> alter table helloa add primary key (id);
Query OK, 3 rows affected (0.24 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> show index from helloa\G
*************************** 1. row ***************************
        Table: helloa
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 2
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
1 row in set (0.00 sec)
从库：
mysql> show index from helloa \G
*************************** 1. row ***************************
        Table: helloa
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 2
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
1 row in set (0.00 sec)

重新做checksum并sync（做同步前最好备份该表）

[root@test ~]# pt-table-checksum --no-check-binlog-format --no-check-replication-filters --recursion-method='processlist' --create-replicate-table --replicate=AAA.checksum --databases=AAA -h 192.168.18.50 -P 3307 -u checksum -p 123456
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-18T05:42:19      0      1        3       1       0   0.438 AAA.helloa
[root@test ~]# pt-table-sync --replicate=AAA.checksum --recursion-method='processlist' --database=AAA --port=3307 h=192.168.18.50,u=checksum,p=123456 --print
REPLACE INTO `AAA`.`helloa`(`id`, `name`) VALUES ('1', 'jaja') /*percona-toolkit src_db:AAA src_tbl:helloa src_dsn:P=3307,h=192.168.18.50,p=...,u=checksum dst_db:AAA dst_tbl:helloa dst_dsn:P=3307,h=192.168.18.60,p=...,u=checksum lock:1 transaction:1 changing_src:AAA.checksum replicate:AAA.checksum bidirectional:0 pid:9609 user:root host:test*/;
[root@test ~]# pt-table-sync --replicate=AAA.checksum --recursion-method='processlist' --database=AAA --port=3307 h=192.168.18.50,u=checksum,p=123456 --exec

检查主从数据库的helloa表，数据一致

非默认端口DSN配置

如果主库使用非默认端口，--recursion-method默认值为hosts，这时如果从库没有配置report_host参数(注意该参数缺点)，则pt-table-checksum无法自动检测到从库。
如果主库使用的是默认端口，那么--recursion-method默认值为processlist，这时pt-table-checksum只能连上端口为3306的从库，无法连接非默认端口的从库
所以如果如果主库或者从库使用了非默认端口，建议通过dsn指定从库信息
 
在主库创建dsn表，并插入从库信息
CREATE TABLE percona.`dsns` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`dsn` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
);
insert into percona.dsns select 1,1,'h=172.172.178.76,u=checksum,p=root,P=3306';
insert into percona.dsns select 2,2,'h=172.172.178.77,u=checksum,p=root,P=3307';
 
pt-table-checksum --nocheck-binlog-format --replicate=percona.checksums --recursion-method=dsn=h=127.0.0.1,D=percona,t=dsns --set-varsinnodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306 --quiet
 
           TS ERRORS  DIFFS     ROWS CHUNKS SKIPPED    TIME TABLE
11-18T10:47:25      0     1        3       1      0   0.340 test1.test_concat
##上面的结果中我们只看到某些表主从数据不一致，但是确无法判断到底哪个从库和主库数据不一致。我们可以在pt-table-checksum后，再次指定--replicate-check-only来执行pt-table-checksum，显示具体信息，例如：
 
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--replicate-check-only
 
Differences on shao76
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEXLOWER_BOUNDARY UPPER_BOUNDARY
test1.test_concat 1 0 1  
 
1.只检查指定的数据库，或者表
1) 只检查指定数据库下所有表
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--databases=test1,test5 --quiet
 
2) 只检查指定表
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--databases=test1 --tables=test_concat --quiet
或者--tables=database.table
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--tables=test1.test_concat –quiet
或者--tables-regex正则匹配指定表
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--databases=test1 --tables-regex=test_*
 
2.检查时忽略指定的数据库，或者表
--ignore-databases,--ignore-databases-regex,--ignore-tables,--ignore-tables-regex
1) 忽略指定数据库
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--ignore-databases=test5,mysql --quiet
##percona数据库默认被忽略
 
2) 忽略指定表
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306  --ignore-tables-regex=test_*
##注意只能忽略所有库下的test_*表，没法指定只忽略某个库下test_*表，而对其他库下的test_*表进行检查
 
3.只检查指定表的某些列
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--databases=test1 --tables=test_concat --columns=id,name –quiet
 
4.检查时忽略某些列
pt-table-checksum --nocheck-binlog-format--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns--set-vars innodb_lock_wait_timeout=120 -uroot -proot -h127.0.0.1 -P3306--databases=test1 --ignore-columns=id --quiet

参数介绍：

1. 连接主从库的参数： 
--host      --socket      --user    --password    --pid     --port
2.  确定比较范围的参数
(1) 指定库
--databases   /   --ignore-databases                       要比较的库   /   比较过程中忽略这些库
--databases-regex   /   --ignore-databases-regex     	   同上，不过可以用正则匹配
(2) 指定表
--tables   /   --ignore-tables                             要比较的表   /   比较过程中忽略这些表
--tables-regex   /   --ignore-tables-regex                 同上，不过可以用正则匹配
(3) 指定列
--columns   /   --ignore-columns                           要比较的列   /   比较过程中忽略这些列
(4) 直接指定表范围
--where                                                    直接指定表中要比较的范围
(5) 根据引擎选表
 --engines   /   --ignore-engines                          比较指定的引擎表   /   比较过程中忽略含有这些引擎的表
3.  指定连接中断后行为的参数
--resume     如果主从一致性检查中途中断的话，可以用这个参数来使工具从上次中断时检查的最后一个表开始继续检查
--retries      如果在检查过程中有非致命性的中断的话，如被kill或者从库延迟等，指定该参数后，工具会自动尝试重连
4.  需重点关注的参数
(1)  --[no]check-binlog-format  
      默认会检查binlog-format,如果不是statment，就会报错退出，想避免该检查可以设置--no-check-binlog-format
(2)  --recursion-method
    参数有四：processlist/hosts/dsn=DSN/no，默认是processlist,hosts，但最好还是指定一下，建议指定--recursion-method=processlist，no一般不使用
    dsn=DSN方法使用时，需要先去库里创建一个表，比如在percona库中建一个dnsn表

    建表语句是： 
    CREATE TABLE `dsns` (`id` int(11) NOT NULL AUTO_INCREMENT,`parent_id` int(11) DEFAULT NULL,`dsn` varchar(255) NOT NULL,PRIMARY KEY (`id`)); 
    建好后插入主从复制信息数据，如：insert into table dsns(dsn) values(h=slave_host,u=repl_user,p=repl_password,P=port );
    然后就可以使用DSN方法了：命令为：--recursion-method dsn=D=percona,t=dsns.
(3)  --replicate
    用来指定存放计算结果的表名， 默认是percona.checksums，工具会默认自动创建库percona和表checksums并将checksum的检查结果输入到这个表中，如果自己用该参数去指定表的话，表结构必须是：


CREATE TABLE checksums (
   db             char(64)     NOT NULL,
   tbl            char(64)     NOT NULL,
   chunk          int          NOT NULL,
   chunk_time     float            NULL,
   chunk_index    varchar(200)     NULL,
   lower_boundary text             NULL,
   upper_boundary text             NULL,
   this_crc       char(40)     NOT NULL,
   this_cnt       int          NOT NULL,
   master_crc     char(40)         NULL,
   master_cnt     int              NULL,
   ts             timestamp    NOT NULL,
   PRIMARY KEY (db, tbl, chunk),
   INDEX ts_db_tbl (ts, db, tbl)
) ENGINE=InnoDB; 

需要注意的是存储引擎设置，如果检查的表是innodb表，就设置innodb引擎，如果检查的表和checksums表的引擎不一致，如分别是myisam和innodb，会引起复制错误:“different error on master and slave.”!!!
 
5. 其他部分参数详述：
(1)  --[no]check-replication-filters
     默认在检查到在主从复制过程中有被用..ignore..过滤掉的表，检查会中断并退出，如果想避开这个检查可以设置--no-check-replication-filters
(2)  --chunk-index（type: string）
     工具默认在分块时会选取最合适的索引来explain确定chunk的大小，但如果你希望用其他索引来执行，可以用该参数来指定，工具会以FORCE INDEX的形式把指定的索引加进去
(3)  --chunk-index-columns(type: int)
     可以用来指定组合索引中使用前几个列来辅助分块
(4)  --chunk-size
     直接确定chunk的大小，默认1000行数据，但不建议使用，建议使用--chunk-time代替
(5)  --chunk-time
     默认是0.5秒，工具会根据当前系统运行繁忙程度计算出在该指定时间内可以处理的数据行数（即chunk），比较灵活
(6) --[no]empty-replicate-table
     默认yes，每次检查表之前都去把checksums表中已有的该表信息删掉，以利于后续重新插入新检查信息
(7) --float-precision(type: int)
     设置浮点数的四舍五入方式，以避免不同版本间或其他特定情况中，主从间因浮点数四舍五入的方式不同而导致查出不一致，If you specify a value of 2, for example, then the values 1.008 and 1.009 will be rounded to 1.01, and will checksum as equal
(8) --function
     计算checksum值时的函数，默认是CRC32，其他还有FNV1A_64, MURMUR_HASH, SHA1, MD5等
(9)  --max-lag
    默认1S，主从最大延迟，超过这个延迟时间，就会停下来等待从库同步，确定方法是采用Seconds_Behind_Master的值 
(10) --progress
    指定后可以按设定的参数将执行过程中的运行情况输出到STDERR，如主从延迟时从库的等待，等待时间等，指定时后跟两个参数值，默认是 "time,30"，前一个参数有：percentage, time, or iterations;后一个指定百分比，具体时间或者间隔的数目

常见问题：
1.Diffs cannot be detected because no slaves were found
不能自动找到从库，确认processlist或host或dsns方式用对了。
2.Cannot connect to h=slave1.*.com,p=...,u=percona_user
可以在pt-table-checksum命令前加PTDEBUG=1来看详细的执行过程，如端口、用户名、权限错误。
3.Waiting for the --replicate table to replicate to XXX
问题出在 percona.checksums 表在从库不存在，根本原因是没有从主库同步过来，所以看一下从库是否延迟严重。
4.Pausing because Threads_running=25
反复打印出类似上面停止检查的信息。这是因为当前数据库正在运行的线程数大于默认25，pt-table-checksum 为了减少对库的压力暂停检查了。等数据库压力过了就好了，或者也可以直接 Ctrl+C 终端，下一次加上--resume继续执行，或者加大--max-load=值.