问题简介
ddl_manager 在ddl进入到commit阶段时,发现了大量的mdl锁等待,此时选择了kill掉ddl操作,但是,进入commit阶段的ddl操作是不可以被kill的,
由于MySQL的kill机制,executor的socket链接是立马被关闭的,导致executor错误的认为ddl已经失败,并且结束。
ddl_manager日志如下
[2018-08-28 10:46:39.982] [manager] [info] task info: 192.168.30.129,13307,ashe,ashe,alter table ashe add index(name)
[2018-08-28 10:46:39.982] [manager] [info] security check list: drop,DROP,rename,RENAME,CONSTRAINT,constraint
[2018-08-28 10:46:40.000] [manager] [info] start explaner
[2018-08-28 10:46:40.000] [explainer] [info] start to explain ddl: alter table ashe add index(name)
[2018-08-28 10:46:40.004] [explainer] [info] table engine type: InnoDB
[2018-08-28 10:46:40.005] [explainer] [info] table structure:
CREATE TABLE `ashe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4
[2018-08-28 10:46:40.048] [manager] [info] explain ddl successfully
[2018-08-28 10:46:40.048] [manager] [info] mysqld internal method: inplace_no_rebuild,lock type: lock_none_after_prepare,suggestion: direct
[2018-08-28 10:46:40.054] [monitor] [info] monitor ready
[2018-08-28 10:46:40.055] [monitor] [info] waiting for executor start to execute ddl query....
[2018-08-28 10:46:40.061] [executor] [info] ddl thread id is 16
[2018-08-28 10:46:40.061] [executor] [info] start to execute ddl: alter table ashe add index(name)
[2018-08-28 10:47:10.151] [executor] [warning] ddl is killed
[2018-08-28 10:47:10.151] [monitor] [warning] number of kill ddl 1 time[s]
[2018-08-28 10:47:10.268] [monitor] [warning] number of kill ddl 2 time[s]
[2018-08-28 10:47:10.378] [monitor] [warning] number of kill ddl 3 time[s]
[2018-08-28 10:47:10.493] [monitor] [warning] number of kill ddl 4 time[s]
[2018-08-28 10:47:10.608] [monitor] [warning] number of kill ddl 5 time[s]
[2018-08-28 10:47:10.722] [monitor] [warning] number of kill ddl 6 time[s]
[2018-08-28 10:47:10.835] [monitor] [warning] number of kill ddl 7 time[s]
[2018-08-28 10:47:10.944] [monitor] [warning] number of kill ddl 8 time[s]
[2018-08-28 10:47:11.060] [monitor] [warning] number of kill ddl 9 time[s]
[2018-08-28 10:47:11.174] [monitor] [warning] number of kill ddl 10 time[s]
[2018-08-28 10:47:11.286] [monitor] [warning] number of kill ddl 11 time[s]
[2018-08-28 10:47:11.401] [monitor] [warning] number of kill ddl 12 time[s]
[2018-08-28 10:47:11.516] [monitor] [warning] number of kill ddl 13 time[s]
[2018-08-28 10:47:11.627] [monitor] [warning] number of kill ddl 14 time[s]
[2018-08-28 10:47:11.738] [monitor] [warning] number of kill ddl 15 time[s]
[2018-08-28 10:47:11.854] [monitor] [warning] number of kill ddl 16 time[s]
[2018-08-28 10:47:11.968] [monitor] [warning] number of kill ddl 17 time[s]
[2018-08-28 10:47:12.079] [monitor] [warning] number of kill ddl 18 time[s]
[2018-08-28 10:47:12.156] [executor] [info] ddl is killed by monitor, max_killed_times_by_monitor: 1,current_killed_times_by_monitor: 18
[2018-08-28 10:47:12.180] [monitor] [warning] executor failed
[2018-08-28 10:47:12.180] [manager] [error] ddl failed
此问题已经在线下复现
修复方法
monitor进行kill操作之后,检测线程是否退出,并且通过对表结构的检测来验证ddl操作是否完成。
对于表结构的验证