gh-ost学习笔记
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HVes9Sc2-1641550278470)(https://github.com/WQMysqlDBA/gh-ost/blob/master/doc/images/gh-ost-logo-light-160.png?raw=true)]
"开始吧"
gh-ost github开源的一个online ddl工具
某位业内不太知名,朋友圈内比较知名的同学,关于何时使用gh-ost,做了一个经典的描述
"100万行数据一下,,可以使用online ddl
超过百万,,可以使用gh-ost"
(。。。就这么几个字,他还写个错别字)
"了解一下mysql的ddl的几种方式"
1、MySQL 5.6 InnoDB 存储引擎在线DDL功能:
优点:
添加、删除字段或索引、添加主键、修改字段名字不会锁表。
局限性:
修改列数据类型 、删除表主键、修改字符集、添加全文索引会锁表
Online ddl是一个事务操作(从库延迟会严重)、取消会造成事务回滚影响数据库性能
2、pt-online-schema-change:
1)表不能存在其他触发器,否则报错
2)触发器存在锁竞争问题,对MySQL的性能造成比较大的影响,严重时甚至会拖垮数据库。
3)Rename可能出各种问题:阻塞原始表业务操作过久、丢数据、切换失败等
4)大表copy 造成内存热数据溢出
5)无法暂停:当主库业务负载开始增高时,你可能会想要暂停还没完成的修改表定义的任务。可是基于触发器的方案没办法这么做,只能取消操作,删除临时表、删除触发器。
6)当数据量增多、业务压力增大之后,就会碰到了越来越多的问题,有一些操作只敢在非业务低峰期才敢执行。
3、gh-ost有以下特点:
1)轻量,无触发器,因此对原表只copy,影响小
2)可以测试(在从库执行一遍)
3)提供暂停服务(业务忙了接着暂停,注意binlog清理期限)
4)伪装成从库,流方式读取变更的binlog,异步复制原理,应用到幽灵表
好处
gh-ost 承担一些其他工具留给数据库执行的任务,无触发器。
gh-ost可以更好地控制迁移过程;
可以真正暂停它;
可以真正将迁移的写入负载与主服务器的工作负载分离。
此外,它还提供了许多可操作的特权,使其更安全、可信赖且易于使用。
(异步,都是异步,只要binlog是全的,随便停止随便应用,就是玩)
gh-ost的工作流程
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uE3uspac-1641550278472)(https://github.com/WQMysqlDBA/gh-ost/blob/master/doc/images/gh-ost-general-flow.png?raw=true)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xdcPxxEF-1641550278473)(https://github.com/WQMysqlDBA/gh-ost/blob/master/doc/images/gh-ost-operation-modes.png?raw=true)]
(深颜色表示Priamry节点,浅颜色表示Secondary节点)
gh-ost有三种工作模式
1、连接从库,在主库转换 b
2、连接主库,在主库转换 a
3、在从库上测试和转换 c
-
1、连接从库,在主库转换 (default)
这是gh-ost的默认模式 1、gh-ost 将会检查从库状态,找到集群结构中的主库并连接,接下来进行迁移操作 2、在主库上创建 _xxx_gho(和原表表结构一致)、_xxx_ghc(记录变更日志),并修改 _xxx_gho 表结构 例如: _sbtest1_ghc 变更日志,即binlog应用的日志 _sbtest1_gho 幽灵表 sbtest1 原表 3、业务正常进行,行数据在主库上读写,读取从库的二进制日志,将变更应用到主库的幽灵表上 4、在从库收集表格式,字段&索引,行数等信息 5、在从库上读取内部的变更事件(如心跳事件) 6、在主库切换表 ./gh-ost --max-load=Threads_running=20 \ --critical-load=Threads_running=200 \ --critical-load-interval-millis=5000 \ --chunk-size=10000 \ --user="root" \ --password="xxxx" \ --host='xxxxxxxxxxxx' --port=3306 \ --database="test" --table="sbtest1" \ --verbose --debug --stack \ --alter="add key idx_datetime(datetime)" \ --initially-drop-ghost-table \ --initially-drop-old-table \ --execute 2>&1 | tee rebuild_t1.log
-
2、连接主库,在主库转换
需要使用 --allow-on-master 选项: 在主库上创建 _xxx_gho、_xxx_ghc,并修改 _xxx_gho 表结构; 在主库上读源表的数据写入 _xxx_gho 表中; 从主库上读取二进制日志事件,将变更应用到主库上的 _xxx_gho 表; 在主库上完成表切换。 如果你没有从库,或者不想使用从库,你可以直接在主库上操作。 gh-ost 将会直接在主库上进行所有操作。你需要持续关注复制延迟问题。 你的主库的二进制日志必须是 RBR 格式。 在这个模式中你必须指定 --allow-on-master 参数 ./gh-ost --max-load=Threads_running=20 \ --critical-load=Threads_running=200 \ --critical-load-interval-millis=5000 \ --chunk-size=10000 \ --user="root" --password="taotao" \ --host=127.0.0.1 --port=3306 \ --database="test" --table="sbtest1" \ --verbose --debug --stack --allow-on-master \ --alter="add key idx_createtime(createtime) " --initially-drop-ghost-table --initially-drop-old-table \ --execute 2>&1 | tee rebuild_t1.log
-
3、在从库进行测试和转换
这种模式会在从库上做修改。gh-ost 仍然会连上主库,但所有操作都是在从库上做的,不会对主库产生任何影响。 在操作过程中,gh-ost 也会不时地暂停,以便从库的数据可以保持最新。gh-ost 将控制速度保证从库可以及时的进行数据同步 --migrate-on-replica 选项让 gh-ost 直接在从库上修改表。最终的切换过程也是在从库正常复制的状态下完成的。 --migrate-on-replica 表示 gh-ost 会直接在从库上进行迁移操作。即使在复制运行阶段也可以进行表的切换操作 --test-on-replica 表明操作只是为了测试目的。在进行最终的切换操作之前,复制会被停止。原始表和临时表会相互切换,再切换回来,最终相当于原始表没被动过。主从复制暂停的状态下,你可以检查和对比这两张表中的数据。 --test-on-replica 表示 迁移操作只是为了测试在切换之前复制会停止,然后会进行切换操作,然后在切换回来,你的原始表最终还是原始表。两个表都会保存下来,复制操作是停止的。你可以对这两个表进行一致性检查等测试操作。
gh-ost限制条件 违反限制条件gh-ost会报错,所以工具还是比较安全的。 1.做的过程会间隙的锁部分记录(记录数chunk-size参数定义),:因为copy数据使用了 insert ..select from ... where pk >? and pk< ? lock in share mode 2.Master-master setup is only supported in active-passive setup. Active-active (where table is being written to on both masters concurrently) is unsupported. It may be supported in the future. 这个原因估计时gh-ost代码实现使用了显式加锁的方式。 3.The two before & after tables must share a PRIMARY KEY or other UNIQUE KEY 4.Foreign keys not supported 5.Triggers are not supported.
暂停操作:
#暂停
echo throttle | socat - /tmp/gh-ost.test.t1.sock
#恢复
echo no-throttle | socat - /tmp/gh-ost.test.t1.sock
修改限速参数:
echo chunk-size=100 | socat - /tmp/gh-ost.t1.sock
echo max-lag-millis=200 | socat - /tmp/gh-ost.t1.sock
echo max-load=Thread_running=3 | socat - /tmp/gh-ost.t1.sock
2022-01-07 11:53:33 INFO starting gh-ost 1.1.2
2022-01-07 11:53:33 INFO Migrating `qdump`.`sbtest2`
2022-01-07 11:53:33 INFO inspector connection validated on 10.244.0.43:3306
2022-01-07 11:53:33 INFO User has ALL privileges
2022-01-07 11:53:33 INFO binary logs validated on 10.244.0.43:3306
2022-01-07 11:53:33 INFO Inspector initiated on gh-ost-f4d1c00-0:3306, version 5.7.26-log
2022-01-07 11:53:33 INFO Table found. Engine=InnoDB
2022-01-07 11:53:33 DEBUG Estimated number of rows via STATUS: 1000
2022-01-07 11:53:33 DEBUG Validated no foreign keys exist on table
2022-01-07 11:53:33 DEBUG Validated no triggers exist on table
2022-01-07 11:53:33 INFO Estimated number of rows via EXPLAIN: 1000
2022-01-07 11:53:33 DEBUG Potential unique keys in sbtest2: [PRIMARY (auto_increment): [id]; has nullable: false]
2022-01-07 11:53:33 INFO Recursively searching for replication master
2022-01-07 11:53:33 DEBUG Looking for master on 10.244.0.43:3306
2022-01-07 11:53:33 INFO Master found to be gh-ost-f4d1c00-0:3306
2022-01-07 11:53:33 INFO log_slave_updates validated on 10.244.0.43:3306
2022-01-07 11:53:33 INFO streamer connection validated on 10.244.0.43:3306
2022-01-07 11:53:33 DEBUG Streamer binlog coordinates: mysql-bin.000030:486171970
2022-01-07 11:53:33 INFO Connecting binlog streamer at mysql-bin.000030:486171970
[2022/01/07 11:53:33] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql 10.244.0.43 3306 gh-ost false false <nil> false UTC true 0 0s 0s 0 false}
[2022/01/07 11:53:33] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.000030, 486171970)
[2022/01/07 11:53:33] [info] binlogsyncer.go:203 register slave for master server 10.244.0.43:3306
2022-01-07 11:53:33 DEBUG Beginning streaming
2022-01-07 11:53:33 INFO rotate to next log from mysql-bin.000030:0 to mysql-bin.000030
[2022/01/07 11:53:33] [info] binlogsyncer.go:723 rotate to (mysql-bin.000030, 486171970)
2022-01-07 11:53:33 INFO applier connection validated on 10.244.0.43:3306
2022-01-07 11:53:33 INFO applier connection validated on 10.244.0.43:3306
2022-01-07 11:53:33 INFO will use time_zone='+08:00' on applier
2022-01-07 11:53:33 INFO Examining table structure on applier
2022-01-07 11:53:33 INFO Applier initiated on gh-ost-f4d1c00-0:3306, version 5.7.26-log
2022-01-07 11:53:33 INFO Dropping table `qdump`.`_sbtest2_gho`
2022-01-07 11:53:33 INFO Table dropped
2022-01-07 11:53:33 INFO Dropping table `qdump`.`_sbtest2_del`
2022-01-07 11:53:33 INFO Table dropped
2022-01-07 11:53:33 INFO Dropping table `qdump`.`_sbtest2_ghc`
2022-01-07 11:53:33 INFO Table dropped
2022-01-07 11:53:33 INFO Creating changelog table `qdump`.`_sbtest2_ghc`
2022-01-07 11:53:33 INFO Changelog table created
2022-01-07 11:53:33 INFO Creating ghost table `qdump`.`_sbtest2_gho`
2022-01-07 11:53:33 INFO Ghost table created
2022-01-07 11:53:33 INFO Altering ghost table `qdump`.`_sbtest2_gho`
2022-01-07 11:53:33 DEBUG ALTER statement: alter /* gh-ost */ table `qdump`.`_sbtest2_gho` ADD COLUMN o2 varchar(10)
2022-01-07 11:53:33 INFO Ghost table altered
2022-01-07 11:53:33 INFO Altering ghost table AUTO_INCREMENT value `qdump`.`_sbtest2_gho`
2022-01-07 11:53:33 DEBUG AUTO_INCREMENT ALTER statement: alter /* gh-ost */ table `qdump`.`_sbtest2_gho` AUTO_INCREMENT=1000001
2022-01-07 11:53:34 INFO Ghost table AUTO_INCREMENT altered
2022-01-07 11:53:34 INFO Intercepted changelog state GhostTableMigrated
2022-01-07 11:53:34 INFO Waiting for ghost table to be migrated. Current lag is 0s
2022-01-07 11:53:34 DEBUG ghost table migrated
2022-01-07 11:53:34 INFO Handled changelog state GhostTableMigrated
2022-01-07 11:53:34 DEBUG Potential unique keys in _sbtest2_gho: [PRIMARY (auto_increment): [id]; has nullable: false]
2022-01-07 11:53:34 INFO Chosen shared unique key is PRIMARY
2022-01-07 11:53:34 INFO Shared columns are id,k,c,pad
2022-01-07 11:53:34 INFO Listening on unix socket file: /tmp/gh-ost.qdump.sbtest1.sock
2022-01-07 11:53:34 DEBUG Reading migration range according to key: PRIMARY
2022-01-07 11:53:34 INFO Migration min values: [1]
2022-01-07 11:53:34 DEBUG Reading migration range according to key: PRIMARY
2022-01-07 11:53:34 INFO Migration max values: [1000]
2022-01-07 11:53:34 INFO Waiting for first throttle metrics to be collected
2022-01-07 11:53:34 INFO First throttle metrics collected
2022-01-07 11:53:34 DEBUG Operating until row copy is complete
2022-01-07 11:53:34 DEBUG Getting nothing in the write queue. Sleeping...
# Migrating `qdump`.`sbtest2`; Ghost table is `qdump`.`_sbtest2_gho`
# Migrating gh-ost-f4d1c00-0:3306; inspecting gh-ost-f4d1c00-0:3306; executing on 10-10-30-17
# Migration started at Fri Jan 07 11:53:33 +0800 2022
# chunk-size: 10000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle.qdump.sbtest1
# panic-flag-file: /tmp/gh-ost.panic.qdump.sbtest1.flag
# Serving on unix socket: /tmp/gh-ost.qdump.sbtest1.sock
Copy: 0/1000 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.000030:486175394; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: N/A
Copy: 0/1000 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.000030:486176939; Lag: 0.01s, HeartbeatLag: 0.06s, State: migrating; ETA: N/A
2022-01-07 11:53:35 DEBUG Issued INSERT on range: [1]..[1000]; iteration: 0; chunk-size: 10000
2022-01-07 11:53:35 DEBUG Iteration complete: no further range to iterate
2022-01-07 11:53:35 DEBUG Getting nothing in the write queue. Sleeping...
2022-01-07 11:53:35 INFO Row copy complete
Copy: 1000/1000 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.000030:486211234; Lag: 0.01s, HeartbeatLag: 0.08s, State: migrating; ETA: due
2022-01-07 11:53:35 DEBUG checking for cut-over postpone
2022-01-07 11:53:35 DEBUG checking for cut-over postpone: complete
2022-01-07 11:53:35 INFO Grabbing voluntary lock: gh-ost.77807.lock
2022-01-07 11:53:35 INFO Setting LOCK timeout as 2 seconds
2022-01-07 11:53:35 INFO Looking for magic cut-over table
2022-01-07 11:53:35 INFO Creating magic cut-over table `qdump`.`_sbtest2_del`
2022-01-07 11:53:35 INFO Magic cut-over table created
2022-01-07 11:53:35 INFO Locking `qdump`.`sbtest2`, `qdump`.`_sbtest2_del`
2022-01-07 11:53:35 INFO Tables locked
2022-01-07 11:53:35 INFO Session locking original & magic tables is 77807 ## 锁原表和majic表
2022-01-07 11:53:35 INFO Writing changelog state: AllEventsUpToLockProcessed:1641527615099148655
2022-01-07 11:53:35 INFO Intercepted changelog state AllEventsUpToLockProcessed
2022-01-07 11:53:35 INFO Handled changelog state AllEventsUpToLockProcessed
2022-01-07 11:53:35 INFO Waiting for events up to lock
Copy: 1000/1000 100.0%; Applied: 0; Backlog: 1/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.000030:486256890; Lag: 0.01s, HeartbeatLag: 0.06s, State: migrating; ETA: due
2022-01-07 11:53:36 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1641527615099148655
2022-01-07 11:53:36 INFO Done waiting for events up to lock; duration=988.095699ms
# Migrating `qdump`.`sbtest2`; Ghost table is `qdump`.`_sbtest2_gho`
# Migrating gh-ost-f4d1c00-0:3306; inspecting gh-ost-f4d1c00-0:3306; executing on 10-10-30-17
# Migration started at Fri Jan 07 11:53:33 +0800 2022
2022-01-07 11:53:36 DEBUG Getting nothing in the write queue. Sleeping...
# chunk-size: 10000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle.qdump.sbtest1
# panic-flag-file: /tmp/gh-ost.panic.qdump.sbtest1.flag
# Serving on unix socket: /tmp/gh-ost.qdump.sbtest1.sock
Copy: 1000/1000 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.000030:486257800; Lag: 0.01s, HeartbeatLag: 0.08s, State: migrating; ETA: due
2022-01-07 11:53:36 INFO Setting RENAME timeout as 1 seconds
2022-01-07 11:53:36 INFO Session renaming tables is 77812
2022-01-07 11:53:36 INFO Issuing and expecting this to block: rename /* gh-ost */ table `qdump`.`sbtest2` to `qdump`.`_sbtest2_del`, `qdump`.`_sbtest2_gho` to `qdump`.`sbtest2` ##
2022-01-07 11:53:36 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2022-01-07 11:53:36 INFO Checking session lock: gh-ost.77807.lock
2022-01-07 11:53:36 INFO Connection holding lock on original table still exists
2022-01-07 11:53:36 INFO Will now proceed to drop magic table and unlock tables
2022-01-07 11:53:36 INFO Dropping magic cut-over table
2022-01-07 11:53:36 INFO Releasing lock from `qdump`.`sbtest2`, `qdump`.`_sbtest2_del`
2022-01-07 11:53:36 INFO Tables unlocked
2022-01-07 11:53:36 INFO Tables renamed
2022-01-07 11:53:36 INFO Lock & rename duration: 1.022541989s. During this time, queries on `sbtest2` were blocked
[2022/01/07 11:53:36] [info] binlogsyncer.go:164 syncer is closing...
[2022/01/07 11:53:36] [error] binlogsyncer.go:631 connection was bad
[2022/01/07 11:53:36] [error] binlogstreamer.go:77 close sync with err: Sync was closed
2022-01-07 11:53:36 DEBUG Done streaming
2022-01-07 11:53:36 INFO Closed streamer connection. err=<nil>
2022-01-07 11:53:36 INFO Dropping table `qdump`.`_sbtest2_ghc`
[2022/01/07 11:53:36] [info] binlogsyncer.go:179 syncer is closed
2022-01-07 11:53:36 INFO Table dropped
2022-01-07 11:53:36 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2022-01-07 11:53:36 INFO -- drop table `qdump`.`_sbtest2_del`
2022-01-07 11:53:36 INFO Done migrating `qdump`.`sbtest2`
2022-01-07 11:53:36 INFO Removing socket file: /tmp/gh-ost.qdump.sbtest1.sock
2022-01-07 11:53:36 INFO Tearing down inspector
2022-01-07 11:53:36 INFO Tearing down applier
2022-01-07 11:53:36 DEBUG Tearing down...
2022-01-07 11:53:36 INFO Tearing down streamer
2022-01-07 11:53:36 INFO Tearing down throttler
2022-01-07 11:53:36 DEBUG Tearing down...
2022-01-07T11:55:33.764366+08:00 77850 Query select @@global.version
2022-01-07T11:55:33.764812+08:00 77850 Query select @@global.port
2022-01-07T11:55:33.765143+08:00 77847 Query select @@global.time_zone
2022-01-07T11:55:33.765459+08:00 77847 Query select @@global.hostname, @@global.port
2022-01-07T11:55:33.765754+08:00 77847 Query show columns from `qdump`.`sbtest2`
2022-01-07T11:55:33.766630+08:00 77847 Query drop /* gh-ost */ table if exists `qdump`.`_sbtest2_gho`
2022-01-07T11:55:33.768356+08:00 77847 Query show /* gh-ost */ table status from `qdump` like '_sbtest2_gho'
2022-01-07T11:55:33.769046+08:00 77847 Query drop /* gh-ost */ table if exists `qdump`.`_sbtest2_del`
2022-01-07T11:55:33.774425+08:00 77847 Query show /* gh-ost */ table status from `qdump` like '_sbtest2_del'
2022-01-07T11:55:33.775069+08:00 77847 Query drop /* gh-ost */ table if exists `qdump`.`_sbtest2_ghc`
2022-01-07T11:55:33.777390+08:00 77847 Query create /* gh-ost */ table `qdump`.`_sbtest2_ghc` (
id bigint auto_increment,
last_update timestamp not null DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
hint varchar(64) charset ascii not null,
value varchar(4096) charset ascii not null,
primary key(id),
unique key hint_uidx(hint)
) auto_increment=256
2022-01-07T11:55:33.785513+08:00 77847 Query create /* gh-ost */ table `qdump`.`_sbtest2_gho` like `qdump`.`sbtest2`
2022-01-07T11:55:33.793596+08:00 77847 Query alter /* gh-ost */ table `qdump`.`_sbtest2_gho` ADD COLUMN o1 varchar(10)
2022-01-07T11:55:33.806418+08:00 77847 Query alter /* gh-ost */ table `qdump`.`_sbtest2_gho` AUTO_INCREMENT=1000001
2022-01-07T11:55:33.810057+08:00 77847 Query insert /* gh-ost */ into `qdump`.`_sbtest2_ghc`
###################此处为cut-over的主要流程###################
2022-01-07T11:55:34.965562+08:00 77847 Query START TRANSACTION
2022-01-07T11:55:34.965982+08:00 77847 Query select connection_id()
2022-01-07T11:55:34.966593+08:00 77847 Query select get_lock('gh-ost.77847.lock', 0) ## 确认是否get lock
2022-01-07T11:55:34.967119+08:00 77847 Query set session lock_wait_timeout:=2 rename timeout*2的锁定时间
2022-01-07T11:55:34.967650+08:00 77852 Query show /* gh-ost */ table status from `qdump` like '_sbtest2_del'
2022-01-07T11:55:34.968462+08:00 77852 Query create /* gh-ost */ table `qdump`.`_sbtest2_del` (
id int auto_increment primary key
) engine=InnoDB comment='ghost-cut-over-sentry'
2022-01-07T11:55:34.975440+08:00 77847 Query lock /* gh-ost */ tables `qdump`.`sbtest2` write, `qdump`.`_sbtest2_del` write ------加锁保证原表不再产生event 此时任何对于表的修改都会被锁 有rename timeout*2的锁定时间 (代码里设计的)
间隔1s
2022-01-07T11:55:35.935563+08:00 77853 Connect gh-ost@10.244.0.32 on qdump using TCP/IP
2022-01-07T11:55:35.935869+08:00 77853 Query SET autocommit=true
2022-01-07T11:55:35.936087+08:00 77853 Query SET NAMES utf8mb4
2022-01-07T11:55:35.936404+08:00 77853 Query select id
2022-01-07T11:55:35.933865+08:00 77852 Query START TRANSACTION
2022-01-07T11:55:35.934154+08:00 77852 Query select connection_id()
2022-01-07T11:55:35.934490+08:00 77852 Query set session lock_wait_timeout:=1
2022-01-07T11:55:35.934848+08:00 77852 Query rename /* gh-ost */ table `qdump`.`sbtest2` to `qdump`.`_sbtest2_del`, `qdump`.`_sbtest2_gho` to `qdump`.`sbtest2` ------这个是原子做操作 不受事务影响 此时是被卡住的
22-01-07T11:55:35.937625+08:00 77853 Query select is_used_lock('gh-ost.77847.lock')
## 77853算是个中间人
## 立刻查看,当rename出现在了processlist中,立刻释放锁
2022-01-07T11:55:35.938004+08:00 77847 Query drop /* gh-ost */ table if exists `qdump`.`_sbtest2_del`
2022-01-07T11:55:35.951323+08:00 77847 Query unlock tables
此时rename 被完成
2022-01-07T11:55:35.951627+08:00 77847 Query ROLLBACK --
2022-01-07T11:55:35.975045+08:00 77852 Query ROLLBACK --
2022-01-07T11:55:36.076006+08:00 77853 Query drop /* gh-ost */ table if exists `qdump`.`_sbtest2_ghc`
###################此处为cut-over的主要流程###################
cut-over
2,3session配合操作
session1创建majic table
session1 锁住原表和majic table 只要完成了lock table 和创建majic表,此时就无所谓了,后面的rename操作由其他session配合完成
session 2 来确认session1是否持有锁 是-->观察processlist中是否出现rename
被阻止RENAME的总是优先于被阻止的INSERT/UPDATE/DELETE,无论谁先来
出现的话立刻释放session1的锁
让session3 rename成功 完成rename 原表--> del , gho幽灵表 --> 原表
(其实是两个goroutine ,goroutine2 里发起了AtomicCutoverRename,此函数发起session3)
成功 -- > 皆大欢喜
失败 -- > 对原表不会有任何影响,可以再来一次
问题1: session1对表加锁之后,在rename之前的这段时间有dml试图获取锁,rename发起锁请求的时间晚?是谁先得到锁?
rename相比dml,rename总是会先得到锁,与mysql获得锁的机制有关
代码:
Migrate 执行完整的迁移逻辑。 这是*主要的 gh-ost 功能。
migrator := logic.NewMigrator(migrationContext)
migrator.Migrate()
cutover
提供了两种方式 atomic|twosteps
if this.migrationContext.CutOverType == base.CutOverAtomic {
// Atomic solution: we use low timeout and multiple attempts. But for
// each failed attempt, we throttle until replication lag is back to normal
// 原子解决方案:我们使用低超时和多次尝试。
// 但是对于每次失败的尝试,我们都会进行节流,直到复制延迟恢复正常
err := this.atomicCutOver()
this.handleCutOverResult(err)
return err
}
if this.migrationContext.CutOverType == base.CutOverTwoStep {
err := this.cutOverTwoStep()
this.handleCutOverResult(err)
return err
}
return this.migrationContext.Log.Fatalf("Unknown cut-over type: %d; should never get here!", this.migrationContext.CutOverType)
atomic
// atomicCutOver
func (this *Migrator) atomicCutOver() (err error) {
atomic.StoreInt64(&this.migrationContext.InCutOverCriticalSectionFlag, 1)
defer atomic.StoreInt64(&this.migrationContext.InCutOverCriticalSectionFlag, 0)
okToUnlockTable := make(chan bool, 4)
var dropCutOverSentryTableOnce sync.Once
//sync.Once 是 Golang package 中使方法只执行一次的对象实现,作用与 init 函数类似。但也有所不同。
//init 函数是在文件包首次被加载的时候执行,且只执行一次
//sync.Onc 是在代码运行中需要的时候执行,且只执行一次
defer func() {
okToUnlockTable <- true
dropCutOverSentryTableOnce.Do(func() {
this.applier.DropAtomicCutOverSentryTableIfExists()
})
}()
atomic.StoreInt64(&this.migrationContext.AllEventsUpToLockProcessedInjectedFlag, 0)
lockOriginalSessionIdChan := make(chan int64, 2)
tableLocked := make(chan error, 2)
tableUnlocked := make(chan error, 2)
// 起一个goutine 假如称为session1
// 负责 set session lock_wait_timeout=migrationContext.CutOverLockTimeoutSeconds * 2
// 创建majic table CreateAtomicCutOverSentryTable()
// lock /* gh-ost */ tables 原表 write , majic table write ?是否是原子语句?
// 之后这个的主要工作就完成了 等待 okToUnlockTable 信号让他删除majic table 和 释放lock
// <-okToUnlockTable ( okToUnlockTable <-chan bool 一个只读的channel )
go func() {
if err := this.applier.AtomicCutOverMagicLock(lockOriginalSessionIdChan, tableLocked, okToUnlockTable, tableUnlocked, &dropCutOverSentryTableOnce); err != nil {
this.migrationContext.Log.Errore(err)
}
}()
if err := <-tableLocked; err != nil {
return this.migrationContext.Log.Errore(err)
}
lockOriginalSessionId := <-lockOriginalSessionIdChan
this.migrationContext.Log.Infof("Session locking original & magic tables is %+v", lockOriginalSessionId)
// At this point we know the original table is locked. 此时我们知道原始表被锁定。
// We know any newly incoming DML on original table is blocked. 我们知道原始表上任何新传入的 DML 都被阻止。
if err := this.waitForEventsUpToLock(); err != nil {
return this.migrationContext.Log.Errore(err)
}
// Step 2
// We now attempt an atomic RENAME on original & ghost tables, and expect it to block.
// 我们现在尝试在原始表和幽灵表上进行原子重命名,并期望它阻塞。
this.migrationContext.RenameTablesStartTime = time.Now()
var tableRenameKnownToHaveFailed int64
renameSessionIdChan := make(chan int64, 2)
tablesRenamed := make(chan error, 2)
// 起一个goroutine
// 如果错误了直接unlock table,对主表无影响
// 如果没有出错 --即实现了这两个go routine的通信,renameSessionIdChan被更新成发起rename的session的id,tablesRenamed为nil
go func() {
if err := this.applier.AtomicCutoverRename(renameSessionIdChan, tablesRenamed); err != nil {
// Abort! Release the lock
atomic.StoreInt64(&tableRenameKnownToHaveFailed, 1)
okToUnlockTable <- true
}
}()
renameSessionId := <-renameSessionIdChan
this.migrationContext.Log.Infof("Session renaming tables is %+v", renameSessionId)
waitForRename := func() error {
if atomic.LoadInt64(&tableRenameKnownToHaveFailed) == 1 {
// We return `nil` here so as to avoid the `retry`. The RENAME has failed,
// it won't show up in PROCESSLIST, no point in waiting
// 我们在这里返回 `nil` 以避免 `retry`。 RENAME 失败,它不会出现在 PROCESSLIST 中,无需等待
// 如果失败了,那就是失败了 返回空,下方的retryOperation就会失败 okToUnlockTable
// 被阻止RENAME的总是优先于被阻止的INSERT/UPDATE/DELETE,无论谁先来
return nil
}
// 正常返回renameSessionId
return this.applier.ExpectProcess(renameSessionId, "metadata lock", "rename")
}
// Wait for the RENAME to appear in PROCESSLIST
// 等待 RENAME 出现在 PROCESSLIST 中 ,立刻释放锁
if err := this.retryOperation(waitForRename, true); err != nil {
// Abort! Release the lock
okToUnlockTable <- true
return err
}
// rename成功
if atomic.LoadInt64(&tableRenameKnownToHaveFailed) == 0 {
this.migrationContext.Log.Infof("Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)")
}
if err := this.applier.ExpectUsedLock(lockOriginalSessionId); err != nil {
// Abort operation. Just make sure to drop the magic table. 中止操作。 只要确保删除majic table
return this.migrationContext.Log.Errore(err)
}
this.migrationContext.Log.Infof("Connection holding lock on original table still exists")
// Now that we've found the RENAME blocking, AND the locking connection still alive, 现在我们已经发现 RENAME 阻塞,并且锁定连接仍然存在,
// we know it is safe to proceed to release the lock 我们知道继续释放锁是安全的
okToUnlockTable <- true
// BAM! magic table dropped, original table lock is released BAM! 魔术表被删除,原来的表锁被释放
// -> RENAME released -> queries on original are unblocked. 重命名已发布 -> 原始查询已解除阻止。
if err := <-tableUnlocked; err != nil {
return this.migrationContext.Log.Errore(err)
}
if err := <-tablesRenamed; err != nil {
return this.migrationContext.Log.Errore(err)
}
this.migrationContext.RenameTablesEndTime = time.Now()
// ooh nice! We're actually truly and thankfully done 不错哦! 我们真的很幸运地完成了
lockAndRenameDuration := this.migrationContext.RenameTablesEndTime.Sub(this.migrationContext.LockTablesStartTime)
this.migrationContext.Log.Infof("Lock & rename duration: %s. During this time, queries on %s were blocked", lockAndRenameDuration, sql.EscapeName(this.migrationContext.OriginalTableName))
return nil
}
cutOverTwoStep
// cutOverTwoStep will lock down the original table, execute
// what's left of last DML entries, and **non-atomically** swap original->old, then new->original.
// There is a point in time where the "original" table does not exist and queries are non-blocked
// and failing.
// cutOverTwoStep 纯在的问题
// cutOverTwoStep 将锁定原始表,执行最后一个 DML 条目的剩余部分,并**非原子地**交换原始->旧,然后新->原始。有一个时间点“原始”表没有 存在且查询未阻塞且失败。
func (this *Migrator) cutOverTwoStep() (err error) {
atomic.StoreInt64(&this.migrationContext.InCutOverCriticalSectionFlag, 1)
defer atomic.StoreInt64(&this.migrationContext.InCutOverCriticalSectionFlag, 0)
atomic.StoreInt64(&this.migrationContext.AllEventsUpToLockProcessedInjectedFlag, 0)
if err := this.retryOperation(this.applier.LockOriginalTable); err != nil {
return err
}
if err := this.retryOperation(this.waitForEventsUpToLock); err != nil {
return err
}
if err := this.retryOperation(this.applier.SwapTablesQuickAndBumpy); err != nil {
return err
}
if err := this.retryOperation(this.applier.UnlockTables); err != nil {
return err
}
lockAndRenameDuration := this.migrationContext.RenameTablesEndTime.Sub(this.migrationContext.LockTablesStartTime)
renameDuration := this.migrationContext.RenameTablesEndTime.Sub(this.migrationContext.RenameTablesStartTime)
this.migrationContext.Log.Debugf("Lock & rename duration: %s (rename only: %s). During this time, queries on %s were locked or failing", lockAndRenameDuration, renameDuration, sql.EscapeName(this.migrationContext.OriginalTableName))
return nil
}
参考
https://www.cnblogs.com/zhoujinyi/p/9187421.html
https://github.com/github/gh-ost/issues/82
[https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html