Mysql数据库 Innodb在 read commit 级别下发生的死锁
现象及原因
今天在公司调试代码过程中,无意发现了一个for update 引起的死锁问题。
死锁问题通常是ABBA问题,但问题比较奇怪,回家后我本地建立了一张kv表。
create table if not exists kv
(
k bigint not null,
value varchar(50) null,
r int null,
constraint kv_key_uindex
unique (k)
);
alter table kv
add primary key (k);
线程A 执行
begin
select k from kv where value=4 for update; #步骤1
select k from kv where value=5 for update; #步骤3
commit;
线程B执行
select k from kv where value=2 for update; #步骤2
当按 “#” 后标注顺序执行过程中,数据库发生了死锁。
[40001][1213] Deadlock found when trying to get lock; try restarting transaction
执行命令 show engine innodb status;
*** (1) TRANSACTION:
TRANSACTION 13323, ACTIVE 4 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 9, OS thread handle 16216, query id 43 localhost 127.0.0.1 root executing
/* ApplicationName=IntelliJ IDEA 2019.3.1 */ select k from kv where value=2 for update
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 15 page no 4 n bits 88 index PRIMARY of table `fast_stock`.`kv` trx id 13323 lock_mode X locks rec but not gap
Record lock, heap no 20 PHYSICAL RECORD: n_fields 5; compact format; info bits 128
0: len 8; hex 8000000000000003; asc ;;
1: len 6; hex 000000003026; asc 0&;;
2: len 7; hex 010000011908a1; asc ;;
3: len 1; hex 32; asc 2;;
4: len 4; hex 80000004; asc ;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 15 page no 4 n bits 88 index PRIMARY of table `fast_stock`.`kv` trx id 13323 lock_mode X locks rec but not gap waiting
Record lock, heap no 15 PHYSICAL RECORD: n_fields 5; compact format; info bits 128
0: len 8; hex 8000000000000004; asc ;;
1: len 6; hex 000000003018; asc 0 ;;
2: len 7; hex 01000001070b41; asc A;;
3: len 1; hex 34; asc 4;;
4: len 4; hex 80000004; asc ;;
*** (2) TRANSACTION:
TRANSACTION 13322, ACTIVE 7 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 8, OS thread handle 12864, query id 57 localhost 127.0.0.1 root executing
/* ApplicationName=IntelliJ IDEA 2019.3.1 */ select k from kv where value=5 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 15 page no 4 n bits 88 index PRIMARY of table `fast_stock`.`kv` trx id 13322 lock_mode X locks rec but not gap
Record lock, heap no 15 PHYSICAL RECORD: n_fields 5; compact format; info bits 128
0: len 8; hex 8000000000000004; asc ;;
1: len 6; hex 000000003018; asc 0 ;;
2: len 7; hex 01000001070b41; asc A;;
3: len 1; hex 34; asc 4;;
4: len 4; hex 80000004; asc ;;
确实发生了死锁。
在普遍的认知里, 对于非索引键的查询,通常上的是表锁,但真实情况,线程A只锁了1行。
再次模拟执行步骤1和步骤2,查询performance_schema.data_lock_waits 表
ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INNODB | 2638249558176:1072:2638212656120 | 13325 | 50 | 68 | fast_stock | kv | NULL | NULL | NULL | 2638212656120 | TABLE | IX | GRANTED | NULL |
INNODB | 2638249558176:15:4:20:2638212653336 | 13325 | 50 | 68 | fast_stock | kv | NULL | NULL | PRIMARY | 2638212653336 | RECORD | X,REC_NOT_GAP | GRANTED | 3 |
INNODB | 2638249558176:15:4:15:2638212653680 | 13325 | 50 | 68 | fast_stock | kv | NULL | NULL | PRIMARY | 2638212653680 | RECORD | X,REC_NOT_GAP | WAITING | 4 |
INNODB | 2638249557344:1072:2638212651144 | 13324 | 49 | 72 | fast_stock | kv | NULL | NULL | NULL | 2638212651144 | TABLE | IX | GRANTED | NULL |
INNODB | 2638249557344:15:4:15:2638212648360 | 13324 | 49 | 72 | fast_stock | kv | NULL | NULL | PRIMARY | 2638212648360 | RECORD | X,REC_NOT_GAP | GRANTED | 4 |
可以发现,线程A 获取了主键4的行锁,线程B 获取了主键3的行锁,并请求主键4的行锁。
再查询隔离级别,发现 select @@transaction_isolation;
数据库的隔离级别是提交读,提交读在执行更新(SELECT …… FOR UPADTE)过程中,对于非索引列,会全表扫描,但会在扫描后释放不符合查询条件的列。而在扫描过程中,发现已经被锁定的列则会继续等待当前锁的释放。
因此,死锁的原因是线程A无法获取主键为3的行锁,线程B无法获取主键为4的行锁。
解决方案
- 隔离级别 升级为 Repeat Read 后可以修复该问题。
ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INNODB | 1471039987872:1072:1473149191064 | 13838 | 50 | 54 | fast_stock | kv | NULL | NULL | NULL | 1473149191064 | TABLE | IX | GRANTED | NULL |
INNODB | 1471039987872:15:4:20:1473149188280 | 13838 | 50 | 54 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149188280 | RECORD | X | WAITING | 3 |
INNODB | 1471039988704:1072:1473149196040 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | NULL | 1473149196040 | TABLE | IX | GRANTED | NULL |
INNODB | 1471039988704:15:4:1:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | supremum pseudo-record |
INNODB | 1471039988704:15:4:8:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 7 |
INNODB | 1471039988704:15:4:9:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 8 |
INNODB | 1471039988704:15:4:10:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 9 |
INNODB | 1471039988704:15:4:11:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 10 |
INNODB | 1471039988704:15:4:12:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 11 |
INNODB | 1471039988704:15:4:13:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 12 |
INNODB | 1471039988704:15:4:14:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 13 |
INNODB | 1471039988704:15:4:15:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 4 |
INNODB | 1471039988704:15:4:16:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 5 |
INNODB | 1471039988704:15:4:17:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 6 |
INNODB | 1471039988704:15:4:18:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 15 |
INNODB | 1471039988704:15:4:19:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 14 |
INNODB | 1471039988704:15:4:20:1473149193256 | 13837 | 51 | 22 | fast_stock | kv | NULL | NULL | PRIMARY | 1473149193256 | RECORD | X | GRANTED | 3 |
升级后,可以发现,执行步骤1中,INNODB对 所有行都加了行锁,依次执行步骤1、步骤2、步骤3,不会发生死锁。
- 对该列添加索引,执行查询中,INNODB 只会直接通过索引找到当前列,避免锁定一些不必要的列。
之前一直以为INNODB的加锁是一个事务的过程,SELECT …… FOR UPDATE 只有加锁成功和加锁失败两个过程,但实际上,INNODB的加锁过程不是原子操作,而是根据主键顺序加锁,当然,因为是顺序加锁的缘故,所以在Repeated Read 下导致死锁,而在 Read Commit级别下,则会因为锁的释放导致死锁,通常在生产环境下,这样的死锁问题也难以排查 ,所以最好的选择还是 Repeated Read。