一 前言
死锁其实是一个很有意思也很有挑战的技术问题,大概每个DBA和部分开发朋友都会在工作过程中遇见。关于死锁我会持续写一个系列的案例分析,希望能够对想了解死锁的朋友有所帮助。(以前案例都基于Mysql 5.7.12 事务级别RR)
二 背景知识
2.1 insert 锁机制 在分析死锁案例之前,我们先学习一下背景知识 insert 语句的加锁策略。我们先来看看官方定义:
"An insert intention lock is a type of gap lock set by INSERT operations prior to row insertion. This lock signals the intent to insert in such a way that multiple transactions inserting into the same index gap need not wait for each other if they are not inserting at the same position within the gap. Suppose that there are index records with values of 4 and 7. Separate transactions that attempt to insert values of 5 and 6, respectively, each lock the gap between 4 and 7 with insert intention locks prior to obtaining the exclusive lock on the inserted row, but do not block each other because the rows are nonconflicting."
相信大部分的DBA同行都知道在事务执行insert的时候会申请一把插入意向锁(Insert Intention Lock)。在多事务并发写入不同数据记录至同一索引间隙的时候,并不需要等待其他事务完成,不会发生锁等待。 假设有一个索引记录包含键值4和7,不同的事务分别插入5和6(这里指没有锁冲突的情况下,两个开启两个事务分别插入5和6不会阻塞),每个事务都会产生一个加在4-7之间的插入意向锁,获取在插入行上的排它锁,但是不会被互相锁住,因为数据行并不冲突。 但是如果遇到唯一键呢?
"If a duplicate-key error occurs, a shared lock on the duplicate index record is set."
对于insert操作来说,若发生唯一约束冲突,则需要对冲突的唯一索引加上S Next-key Lock。从这里会发现,即使是RC事务隔离级别,也同样会存在Next-Key Lock锁,从而阻塞并发。然而,文档没有说明的是,对于检测到冲突的唯一索引,等待线程在获得S Lock之后,还需要对下一个记录进行加锁,在源码中由函数row_ins_scan_sec_index_for_duplicate进行判断。via (MySQL REPLACE死锁问题深入剖析 )。我们可以通过如下例子进行验证。
2.2 验证
准备环境 默认事务隔离级别为RR模式(Mysql 5.7.12)。
CREATE TABLE t8 (
a int AUTO_INCREMENT PRIMARY KEY,
b int,
c int,
unique key ub(b)
) engine=InnoDB;
insert into t8 values (NULL,1,2)
sess1 | sess2 | |
T1 | begin;delete from t8 where b = 1; | |
T2 | begin;insert into t8 values (NULL,1,1); | |
T3 | commit; | |
T4 | UPDATE t8 SET c =13 WHERE b =1; |
2.3 过程分析
在每次执行一条语句之后都执行show innodb engine status查看事务的状态, 执行完 delete 语句,事务相关日志显示如下:
---TRANSACTION 488466, ACTIVE 9 sec
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 28, OS thread handle 9636, query id 3601 localhost ::1 root
TABLE LOCK table `test`.`t8` trx id 488466 lock mode IX
RECORD LOCKS space id 1085 page no 4 n bits 72 index ub of table `test`.`t8` trx id 488466 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000004; asc ;;
RECORD LOCKS space id 1085 page no 3 n bits 72 index PRIMARY of table `test`.`t8` trx id 488466 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
0: len 4; hex 80000004; asc ;;
1: len 6; hex 000000077412; asc t ;;
2: len 7; hex 5d000001b407f1; asc ] ;;
3: len 4; hex 80000001; asc ;;
4: len 4; hex 80000002; asc ;;
从日志中我们可以看到 delete语句获取了唯一索引ub和主键两个行级锁(lock_mode X locks rec but not gap) 。 执行完 insert 之后 再查看innodb engine status,事务2相关日志显示如下:
---TRANSACTION 488467, ACTIVE 12 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 24, OS thread handle 5388, query id 3609 localhost ::1 root update
insert into t8 values (NULL,1,1)
------- TRX HAS BEEN WAITING 12 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1085 page no 4 n bits 72 index ub of table `test`.`t8` trx id 488467 lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000004; asc ;;
根据官方的介绍,并结合日志,我们可以看到事务2的insert into t8 values (NULL,1,1)在申请一把S Next-key-Lock (最小值,1] , 显示lock mode S waiting。这里想给大家说明的是在innodb 日志中如果提示 lock mode S /lock mode X ,其实都是gap锁,如果是行记录锁会提示but not gap ,请读者朋友们在自己分析死锁日志的时候注意。 sess1 delete语句提交之后,sess2的insert 不要提交,不要提交,不要提交。再次查看innodb engine status,事务相关日志显示如下:
---TRANSACTION 488467, ACTIVE 459 sec
3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 24, OS thread handle 5388, query id 3616 localhost ::1 root
TABLE LOCK table `test`.`t8` trx id 488467 lock mode IX
RECORD LOCKS space id 1085 page no 4 n bits 72 index ub of table `test`.`t8` trx id 488467 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
RECORD LOCKS space id 1085 page no 4 n bits 72 index ub of table `test`.`t8` trx id 488467 lock mode S locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000005; asc ;;
sess1中的事务因为提交已经结束。innodb中的事务列表中只剩下sess2 中的insert 的事务了。从获取锁的状态上看insert获取一把S Next-key Lock 锁 (1,supremum](为啥变成了supremum的临键锁,而不是1的临键锁(本来是申请1的临键锁读锁,事务1释放后,申请到主键supremum的临键锁读锁和1的间隙锁读锁):唯一索引冲突,为什么主键的 Supremum 记录会加 Next-Key 锁?)和插入行之前的S GAP锁。看到这里大家是否有疑惑,官方文档说:
INSERT sets an exclusive lock on the inserted row. This lock is an index-record lock, not a next-key lock (that is, there is no gap lock) and does not prevent other sessions from inserting into the gap before the inserted row.
会对insert成功的记录加上一把X 行锁,为什么看不见呢?我们再在sess1 中执行update t8 set c=13 where b=1; (私以为这种验证方式不正确,因为update会加锁,分不清多的锁是update所加还是insert所加,这里应该改成 select * from t8 where b=1 lock in share mode,这个不会加X锁,这两个语句的事务日志输出是一样的,所以不另做验证)并查看事务日志
---TRANSACTION 488467, ACTIVE 593 sec
5 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 2
MySQL thread id 24, OS thread handle 5388, query id 3648 localhost ::1 root
TABLE LOCK table `test`.`t8` trx id 488467 lock mode IX
RECORD LOCKS space id 1085 page no 4 n bits 72 index ub of table `test`.`t8` trx id 488467 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
RECORD LOCKS space id 1085 page no 4 n bits 72 index ub of table `test`.`t8` trx id 488467 lock mode S locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000005; asc ;;
RECORD LOCKS space id 1085 page no 4 n bits 72 index ub of table `test`.`t8` trx id 488467 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000005; asc ;;
RECORD LOCKS space id 1085 page no 3 n bits 72 index PRIMARY of table `test`.`t8` trx id 488467 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 000000077413; asc t ;;
2: len 7; hex 5e000001d90980; asc ^ ;;
3: len 4; hex 80000001; asc ;;
4: len 4; hex 8000000d; asc
从日志中可以看到 sess2的事务持有的锁多了两把 lock_mode X locks rec but not gap(主键索引和唯一索引各一把,原文写只多了一把写锁,日志缺少唯一索引的记录锁,我换RC级别测试也没这个问题,不知道是不是数据库版本问题),也即是 sess2 对 insert 成功的记录加上的X 行锁。 分析至此,对于并发insert造成唯一键冲突的时候 insert的加锁策略是:
- 第一阶段 唯一性约束检查,先申请LOCK_S + LOCK_ORDINARY
- 第二阶段 获取阶段一的锁并且insert成功之后
- 插入的位置有Gap锁:LOCK_INSERT_INTENTION,为了防止其他insert 唯一键冲突。
- 新数据插入:LOCK_X + LOCK_REC_NOT_GAP (新插入成功还没提交事务的insert会加记录锁写锁)
三 案例分析
本案例是两个事务并发insert 唯一键冲突 和gap锁一起导致的死锁案例。
3.1 环境
create table t7(
id int not null primary key auto_increment,
a int not null ,
unique key ua(a)
) engine=innodb;
insert into t7(id,a) values(1,1),(5,4),(20,20),(25,12);
3.2 测试用例
事务1 | 事务2 | |
T1 | begin; | begin; |
T2 | insert into t7(id,a) values(26,10); | |
T3 | insert into t7(id,a) values(30,10); | |
T4 | insert into t7(id,a) values(40,9); |
3.3 死锁日志
LATEST DETECTED DEADLOCK
------------------------
2024-03-13 21:40:36 0x150c
*** (1) TRANSACTION:
TRANSACTION 488491, ACTIVE 16 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 28, OS thread handle 9636, query id 3675 localhost ::1 root update
INSERT INTO t7(id,a) VALUES(30,10)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1086 page no 4 n bits 72 index ua of table `test`.`t7` trx id 488491 lock mode S waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 8000000a; asc ;;
1: len 4; hex 8000001a; asc ;;
*** (2) TRANSACTION:
TRANSACTION 488490, ACTIVE 20 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 24, OS thread handle 5388, query id 3680 localhost ::1 root update
insert into t7(id,a) values(40,9)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1086 page no 4 n bits 72 index ua of table `test`.`t7` trx id 488490 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 8000000a; asc ;;
1: len 4; hex 8000001a; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1086 page no 4 n bits 72 index ua of table `test`.`t7` trx id 488490 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 8000000a; asc ;;
1: len 4; hex 8000001a; asc ;;
*** WE ROLL BACK TRANSACTION (1)
日志分析 我们从时间线维度分析:
- 事务T2 insert into t7(id,a) values(26,10)语句insert 成功,持有a=10 的X 行锁(X locks rec but not gap)
- 事务T1 insert into t7(id,a) values(30,10),因为T2 的第一条insert已经插入a=10的记录,事务T1的 insert a=10 则发生唯一约束冲突,需要申请对冲突的唯一索引加上S Next-key Lock (也即是 lock mode S waiting ) 这是一个临键锁锁会申请锁住(4,10]的区域,临键锁与记录锁冲突,故事务1阻塞。从这里会发现,即使是RC事务隔离级别,也同样会存在Next-Key Lock锁,从而阻塞并发(我案例是RR,所以没验证这点)。
- 事务T2 insert into t7(id,a) values(40,9) 该语句插入的a=9 的值在事务T1申请的gap锁(4,10)之间,故需事务T2的第二条insert语句要等待事务T1的S-Next-key Lock锁释放,在日志中显示lock_mode X locks gap before rec insert intention waiting(意向插入锁与间隙锁读锁冲突)。
四 总结
描述下死锁场景:事务1和事务2先后插入一条数据库中没有的相同的唯一索引值的数据,假设事务1先插入,事务2后插入,然后事务1再插入一条相同的唯一索引值的数据,会发生死锁(这三条insert的唯一索引所在列的值都相同,且在数据库中还不存在)。
死锁过程:
- 事务1先插入数据行成功,假设唯一索引列是C=N,则事务1会对C=N索引加记录X锁(案例2.2验证了插入成功会加记录锁写锁)
- 然后事务2执行insert发现有冲突,会申请对C=N加临键锁读锁,由于临键锁读锁和记录锁写锁冲突,事务2阻塞等待事务1释放C=N的写锁;
- 最后事务1执行insert时,申请意向插入锁,事务1的意向插入锁与事务2的临键锁读锁冲突而阻塞,互相等待死锁。
本文案例和知识点一方面从官方文档获取,另一方面是根据何登成和姜承尧两位MySQL技术大牛的技术分享整理,算是站在巨人的肩膀上的学习总结。在研究分析死锁案例的过程中,insert 的意向锁 和 gap 锁这种类型的锁是比较难分析的,相信通过上面的分析总结大家能够学习到 insert的锁机制 ,如何加锁,如何进行 insert 方面死锁分析。