Select for update就安全了么？间隙锁导致死锁

思想者0001

已于 2022-05-17 17:52:59 修改

阅读量6.2k

点赞数 5

分类专栏：锁数据库文章标签： mysql 数据库

于 2020-09-25 10:22:10 首次发布

本文链接：https://blog.csdn.net/zhouwenjun0820/article/details/108790922

版权

数据库同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

锁

7 篇文章 0 订阅

订阅专栏

背景

有一个短信表:

CREATE TABLE `t_msg` (
	`ID` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
	`Mobile` VARCHAR(20) NOT NULL COMMENT '手机号',
	`Content` VARCHAR(1000) NOT NULL COMMENT '短信内容',
	`Status` SMALLINT(6) NOT NULL COMMENT '短信状态；1：待发送；2：发送中；3：成功；4：失败',
	PRIMARY KEY (`ID`),
	INDEX `Status` (`Status`)
)
ENGINE=InnoDB
;

有服务往该表中插入待发送的短信，然后通过服务A来发送，服务A是多实例部署，其发送逻辑如下：

1、在同一RR事务中执行：

begin;
select id from t_msg where status = 1 or status = 4 limit 10  for update;
-- for each id:
update t_msg set status = 2 where id = #{id};
commit;

2、并发执行：

// http调用短信运营商发送短信
if(发送成功){
	update t_msg set status = 3 where id = #{id};
}else{
	update t_msg set status = 4 where id = #{id};
}

可以开出已考虑到了多线程和多实例部署可能带来的线程安全问题，如果不通过select for update来加锁，可能多个实例会同时发送同一条短信，而通过select for update可以使得后查询的实例被阻塞；在将状态更新为2事务提交，则释放锁，然后其他实例会查询到新的记录，如果发送失败的则重试。

这样看似乎没啥问题，但是运行一段时间后，MySQL发现了死锁：

LATEST DETECTED DEADLOCK
------------------------
2020-09-07 14:54:37 0x7f059c0f5700
*** (1) TRANSACTION:
TRANSACTION 23429369, ACTIVE 35 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 922, OS thread handle 139662114862848, query id 9408791 localhost root Sending data
select * from t_msg where  status =1 or status=4  limit 1 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 108669 page no 4 n bits 80 index Status of table `sms`.`t_msg` trx id 23429369 lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 2; hex 8004; asc   ;;
 1: len 8; hex 0000000000000067; asc        g;;

*** (2) TRANSACTION:
TRANSACTION 23429368, ACTIVE 44 sec updating or deleting
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 1
MySQL thread id 928, OS thread handle 139662069815040, query id 9408798 localhost root updating
update t_msg set Status=2
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 108669 page no 4 n bits 80 index Status of table `sms`.`t_msg` trx id 23429368 lock_mode X
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 2; hex 8004; asc   ;;
 1: len 8; hex 0000000000000067; asc        g;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 108669 page no 4 n bits 80 index Status of table `sms`.`t_msg` trx id 23429368 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 2; hex 8003; asc   ;;
 1: len 8; hex 0000000000000065; asc        e;;

*** WE ROLL BACK TRANSACTION (1)

日志解读

事务1

在执行【select * from t_msg where status =1 or status=4 limit 1 for update】时等待索引Status上的锁：

 0: len 2; hex 8004; asc   ;;
 1: len 8; hex 0000000000000067; asc        g;;

即Status=4，id=103的行锁，这里的hex 0000000000000067 是16进制，可在linux系统中执行如下命令转换为10进制：

echo $((0x67))
103

事务2

在执行【update t_msg set status = 2 where id = 103】时，持有索引Status上的行锁：

 0: len 2; hex 8004; asc   ;;
 1: len 8; hex 0000000000000067; asc        g;;

即Status=4，id=103的行锁，同时等待索引Status上的间隙锁（gap before rec insert intention waiting）：

 0: len 2; hex 8003; asc   ;;
 1: len 8; hex 0000000000000066; asc        f;;

即Status=3,id=102前的间隙锁。

让人不解的两个SQL均不涉及状态3，为什么事务2会等待状态3前的间隙锁呢？如果您能解释这个问题，那么您应该已经知道死锁发生的原因了。

场景复现

实验环境：

mysql> show variables like "%iso%";
+---------------+-----------------+
| Variable_name | Value           |
+---------------+-----------------+
| tx_isolation  | REPEATABLE-READ |
+---------------+-----------------+
mysql> desc test;
+-------+---------+------+-----+---------+----------------+
| Field | Type    | Null | Key | Default | Extra          |
+-------+---------+------+-----+---------+----------------+
| id    | int(11) | NO   | PRI | NULL    | auto_increment |
| value | int(11) | NO   | MUL | 0       |                |
+-------+---------+------+-----+---------+----------------+
mysql> select * from test;      
+----+-------+
| id | value |
+----+-------+
|  1 |     3 |
|  2 |     3 |
|  3 |     3 |
|  4 |     4 |
+----+-------+

开启两个Session。

会话1执行：

mysql> begin;select * from test where value=1 or value=4 for update;  
Query OK, 0 rows affected (0.00 sec)

+----+-------+
| id | value |
+----+-------+
|  4 |     4 |
+----+-------+

会话2执行：

mysql> select * from test where value=1 or value=4 for update; 
Query OK, 0 rows affected (0.00 sec)

可以发现会话2的事务被阻塞；

这时会话1执行：

mysql> update test set value=2 where id=4;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

然后发现会话2打印出现内容：

mysql> select * from test where value=1 or value=4 for update;      
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

这时执行show engine innodb status发现死锁日志：

LATEST DETECTED DEADLOCK
------------------------
2020-09-16 20:53:02 0x7f059c609700
*** (1) TRANSACTION:
TRANSACTION 23471780, ACTIVE 72 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 3 row lock(s)
MySQL thread id 2056, OS thread handle 139662076204800, query id 10165403 localhost root Sending data
select * from test where value=1 or value=4 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 108664 page no 4 n bits 80 index value of table `test`.`test` trx id 23471780 lock_mode X waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000004; asc     ;;
 1: len 4; hex 80000004; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 23471779, ACTIVE 128 sec updating or deleting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 1
MySQL thread id 2057, OS thread handle 139662075139840, query id 10165405 localhost root updating
update test set value=2 where id=4
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 108664 page no 4 n bits 80 index value of table `test`.`test` trx id 23471779 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000004; asc     ;;
 1: len 4; hex 80000004; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 108664 page no 4 n bits 80 index value of table `test`.`test` trx id 23471779 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 4; hex 80000001; asc     ;;

*** WE ROLL BACK TRANSACTION (1)

可以发现和生产告警是一样的死锁！

原理解析

总结来说是：间隙锁不互斥导致死锁！

会话1在执行 select * from test where value=1 or value=4 for update时，首先执行**select * from test where value = 1 ，这时因为value=1本身不存在，所以首先定位到 value = 3 的叶节点，此时对（无穷小， 3）加间隙锁，然后发现该节点不等于1则停止向右遍历，然后再执行select * from test where value = 4 **，这时定位到并返回 value=4 的叶节点。

会话2也执行上述select查询语句，同样也能在（无穷小， 3）加间隙锁（不互斥），然后会等待 value = 4 这个叶节点上的行锁。

然后会话1 执行update status = 2 where id = 4，则需要在（无穷小， 3）插入value=2的叶节点，所以等待会话2持有的间隙锁！

可以看到，会话1和会话2 互相等待！

备注：INNDB加锁首先定位到等于或者第一个大于目标值的叶子节点！

在这里插入图片描述

解决方案

select和update语句的事务隔离级别改为RC隔离级别。

实际上根本不需要RR隔离级别，因为我们想要的是锁住当前所有value=1和value=4的行，我们不需要在事务执行期间禁止其他事务插入value=1或value=4的行。

经验总结

之所以出现上述问题，还是因为在写SQL时未考虑需要的隔离级别，通常来说RC更符合人类思维习惯，只有真的需要防止其他事务插入时才需要RR。

问题思考

假设短信发送肯定不会失败，即不会出现status=4的行，还会发生死锁么？

参考答案：不会！

如果当前存在status=1的行，则加锁情况如下：

在这里插入图片描述

如果事务1先执行，则占有上面全部的锁，

然后事务2将只能占有（无穷小，1）间隙锁，等到1上的行锁，

因此事务1执行update status = 2，将向（1,2）或者（2,3）之间插入，不会遇到阻碍！

如果当前不存在status=1等行，则压根不会执行update。

思想者0001

关注

5
点赞
踩
14

收藏

觉得还不错? 一键收藏
2
评论
Select for update就安全了么？间隙锁导致死锁

Select for update就安全了么？背景有一个短信表:CREATE TABLE `t_msg` ( `ID` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT, `Mobile` VARCHAR(20) NOT NULL COMMENT '手机号', `Content` VARCHAR(1000) NOT NULL COMMENT '短信内容', `Status` SMALLINT(6) NOT NULL COMMENT '短信状态；1：待发送；
复制链接

扫一扫