Pt-table-checksum原理浅析

本文永久地址:
http://fuxkdb.com/2017/11/24/Pt-table-checksum%E5%8E%9F%E7%90%86%E6%B5%85%E6%9E%90/

Pt-table-checksum原理浅析

主库建一个表

node1> create table fan(id int) engine=innodb;
Query OK, 0 rows affected (0.12 sec)

node1> insert into fan values(1);
Query OK, 1 row affected (0.07 sec)

从库制造不一致

node2> select * from fan;
+------+
| id  |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

node2> update fan set id=2;

两边打开general log,然后

[root@node1 ~]# pt-table-checksum --nocheck-replication-filters --no-check-binlog-format h=172.16.83.17,u=root,p=mysql,P=3307 --replicate=percona.checksums --recursion-method=dsn=h=172.16.83.21,D=percona,t=dsns --databases=sysbench --tables=fan
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
11-06T05:12:58      0      1        1      1      0  0.344 sysbench.fan

先看两边的checksums表

node1> select * from  `percona`.`checksums` where tbl='fan';
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db      | tbl | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts                  |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| sysbench | fan |    1 |  0.019798 | NULL        | NULL          | NULL          | 42981178 |        1 | 42981178  |          1 | 2017-11-06 05:12:58 |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

node2> select * from  `percona`.`checksums` where tbl='fan';
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db      | tbl | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts                  |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| sysbench | fan |    1 |  0.019798 | NULL        | NULL          | NULL          | 40deaf21 |        1 | 42981178  |          1 | 2017-11-06 05:12:58 |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
1 row in set (0.04 sec)


node1的checksums表的this_crc,this_cnt和master_crc,master_cnt肯定是一样的
因为.这边是replace into select 先计算this_crc,this_cnt .

REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sysbench', 'fan', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, CONCAT(ISNULL(`id`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sysbench`.`fan` /*checksum table*/
SHOW WARNINGS

然后再获取this_crc, this_cnt的值

SELECT this_crc, this_cnt FROM `percona`.`checksums` WHERE db = 'sysbench' AND tbl = 'fan' AND chunk = '1'

最后把master_crc,master_cnt更新成这个值

UPDATE `percona`.`checksums` SET chunk_time = '0.019798', master_crc = '42981178', master_cnt = '1' WHERE db = 'sysbench' AND tbl = 'fan' AND chunk = '1'

而到了从库这里.由于主库设置了binlog_format=statement 所以replace into select复制过去到从库还是语句,而不是值.此时从库执行这个语句,在自己的checksums表中根据自己的实际值算出了this_crc,this_cnt

REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sysbench', 'fan', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, CONCAT(ISNULL(`id`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sysbench`.`fan` /*checksum table*/
COMMIT /* implicit, from Xid_log_event */

然后拿主库传过来的主库的this_crc, this_cnt的值更新自己的master_crc,master_cnt

BEGIN
UPDATE `percona`.`checksums` SET chunk_time = '0.019798', master_crc = '42981178', master_cnt = '1' WHERE db = 'sysbench' AND tbl = 'fan' AND chunk = '1'
COMMIT /* implicit, from Xid_log_event */

这样,从库的checksums表中就存储了自己和主库的hash值. 最后查一把,把结果返回输出到屏幕

SELECT MAX(chunk) FROM `percona`.`checksums` WHERE db='sysbench' AND tbl='fan' AND master_crc IS NOT NULL
SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `percona`.`checksums` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc))  AND (db='sysbench' AND tbl='fan')

这个结果为手动查询的. pt-table-checksum根据这个结果自己整理好格式输出到屏幕
+--------------+-------+-------------+----------------+----------------+----------+----------+----------+------------+----------+------------+
| table        | chunk | chunk_index | lower_boundary | upper_boundary | cnt_diff | crc_diff | this_cnt | master_cnt | this_crc | master_crc |
+--------------+-------+-------------+----------------+----------------+----------+----------+----------+------------+----------+------------+
| sysbench.fan |    1 | NULL        | NULL          | NULL          |        0 |        1 |        1 |          1 | 40deaf21 | 42981178  |
+--------------+-------+-------------+----------------+----------------+----------+----------+----------+------------+----------+------------+

关于两边查询一致性的保证

pt-table-checksum 会设置

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ

这样避免幻读,通过gap lock阻塞其他insert.保证肯定是replace into select先commit;这样肯定是replace into select先复制到从库(binlog是谁先提交记录谁)

node1> select * from fan;
+----+------+
| id | name |
+----+------+
|  1 | fan  |
|  5 | fan  |
|  7 | fan  |
|  9 | fan  |
| 10 | fan  |
| 11 | fan  |
| 12 | fan  |
| 13 | fan  |
| 14 | fan  |
| 15 | fan  |
+----+------+
10 rows in set (0.00 sec)

node1> select * from fan2;
+----+------+
| id | name |
+----+------+
|  1 | duzi |
|  5 | duzi |
|  7 | duzi |
|  9 | duzi |
| 10 | duzi |
| 11 | duzi |
| 12 | duzi |
| 13 | duzi |
| 14 | duzi |
| 15 | duzi |
+----+------+
10 rows in set (0.00 sec)

node1> SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Query OK, 0 rows affected (0.00 sec)

node1> begin;
Query OK, 0 rows affected (0.00 sec)

node1> replace into fan2 select * from fan where id<7;
Query OK, 4 rows affected, 1 warning (0.01 sec)
Records: 2  Duplicates: 2  Warnings: 1

node2被锁

node2> SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Query OK, 0 rows affected (0.00 sec)

node2> begin;
Query OK, 0 rows affected (0.00 sec)

node2> insert into fan values(4,'ceshi');
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

show engine innodb status

------------
TRANSACTIONS
------------
Trx id counter 163F4
Purge done for trx's n:o < 163F0 undo n:o < 0
History list length 662
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 46, OS thread handle 0x7f42f81bf700, query id 62676 localhost root
show engine innodb status
---TRANSACTION 163F3, ACTIVE 5 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 45, OS thread handle 0x7f42b44cd700, query id 62675 localhost root update
insert into fan values(4,'ceshi')
------- TRX HAS BEEN WAITING 5 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 49 page no 3 n bits 80 index `PRIMARY` of table `sysbench`.`fan` trx id 163F3 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000005; asc    ;;
1: len 6; hex 0000000163de; asc    c ;;
2: len 7; hex ac00000cc70110; asc        ;;
3: len 3; hex 66616e; asc fan;;
mysql> select * from INNODB_TRX  ;
+--------+-----------+---------------------+-----------------------+---------------------+------------+---------------------+-----------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+
| trx_id | trx_state | trx_started        | trx_requested_lock_id | trx_wait_started    | trx_weight | trx_mysql_thread_id | trx_query                        | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_lock_structs | trx_lock_memory_bytes | trx_rows_locked | trx_rows_modified | trx_concurrency_tickets | trx_isolation_level | trx_unique_checks | trx_foreign_key_checks | trx_last_foreign_key_error | trx_adaptive_hash_latched | trx_adaptive_hash_timeout |
+--------+-----------+---------------------+-----------------------+---------------------+------------+---------------------+-----------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+
| 163F3  | LOCK WAIT | 2017-11-06 05:56:40 | 163F3:49:3:3          | 2017-11-06 05:56:40 |          2 |                  45 | insert into fan values(4,'ceshi') | inserting           |                1 |                1 |                2 |                  376 |              1 |                0 |                      0 | REPEATABLE READ    |                1 |                      1 | NULL                      |                        0 |                    10000 |
| 163F2  | RUNNING  | 2017-11-06 05:56:29 | NULL                  | NULL                |          6 |                  47 | NULL                              | NULL                |                0 |                0 |                4 |                  1248 |              5 |                2 |                      0 | REPEATABLE READ    |                1 |                      1 | NULL                      |                        0 |                    10000 |
+--------+-----------+---------------------+-----------------------+---------------------+------------+---------------------+-----------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值