刚接手离职同事业务没几天, 线上就出现了死锁现象。 其业务是接收商品(sku_id)促销消息mq,促销状态有新增、删除、过期等状态,同一sku_id,这些状态可能在同一时间袭来
一、直击死锁现场
发现死锁,第一时间查看日志现场,死锁现场日志如下:
发现如下死锁关键词
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
死锁语句如下:
insert into student_sku (sku_id, sku_state, prom_id, prom_price, prom_state, start_time, end_time, create_time, update_time ) values (?, ?, ?, ?, ?, ?, ?, ?, ? ) ON DUPLICATE KEY UPDATE prom_id = ?, prom_price = ?, prom_state = ?, start_time = ?, end_time = ?, update_time = ?
表结构如下,其中有两个唯一索引uniq_promotion_id,uniq_sku_id
CREATE TABLE `student_sku` (
`id` BIGINT ( 20 ) NOT NULL AUTO_INCREMENT COMMENT '主键',
`sku_id` BIGINT ( 20 ) NOT NULL DEFAULT '0' COMMENT '商品id',
`sku_state` TINYINT ( 5 ) NOT NULL DEFAULT '-1' COMMENT '商品上下柜状态,1:上架(可搜索,可购买),0:下架(可通过skuid搜索,不可购买),2:可上架(可通过skuid搜索,不可购买),10:pop 删除(不可搜索,不可购买))',
`prom_id` BIGINT ( 20 ) NOT NULL DEFAULT '0' COMMENT '促销id',
`prom_price` DECIMAL ( 15, 2 ) NOT NULL DEFAULT '0.00' COMMENT '促销价格',
`prom_state` TINYINT ( 5 ) NOT NULL DEFAULT '0' COMMENT '促销状态,0默认值,1促销有效,2无效(包含删除自然结束等)',
`start_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '促销开始时间',
`end_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '促销结束时间',
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '记录创建时间',
`update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '记录更新时间',
PRIMARY KEY ( `id` ),
UNIQUE KEY `uniq_promotion_id` ( `prom_id` ) USING BTREE,
UNIQUE KEY `uniq_sku_id` ( `sku_id` ) USING BTREE,
KEY `idx_update_time` ( `update_time` ) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 2797210 DEFAULT CHARSET = utf8 COMMENT = '学生商品表'
二、死锁分析
现象很明显,当执行更新的时候,发现了要更新的数据被别的事务锁了,拿不到了锁,于是就死锁了。
我们再来看SQL语句,student_sku表中两个唯一索引uniq_promotion_id,uniq_sku_id,经过实验,有多个唯一索引,在ON DUPLICATE KEY UPDATE 更新时,最后一个唯一索引作为更新条件,这里uniq_sku_id是更新条件
insert into student_sku (sku_id, sku_state, prom_id, prom_price, prom_state, start_time, end_time, create_time, update_time ) values (?, ?, ?, ?, ?, ?, ?, ?, ? ) ON DUPLICATE KEY UPDATE prom_id = ?, prom_price = ?, prom_state = ?, start_time = ?, end_time = ?, update_time = ?
由于没有开启锁日志记录,只能模拟本地再现
1、开启MySQL锁记录
set GLOBAL innodb_status_output_locks=ON;
set GLOBAL innodb_status_output=ON;
2、模拟再现
事务1
mysql> begin;
Query OK, 0 rows affected (0.02 sec)mysql> insert into student_sku (sku_id, sku_state, prom_id,prom_price, prom_state, start_time, end_time, create_time, update_time ) values (1112, 2, 1231, 2, 2, now(), now(), now(), now() ) ON DUPLICATE KEY UPDATE prom_id = 1231, prom_price = 2, prom_state = 2, start_time = now(), end_time = now(), update_time = now();
Query OK, 2 rows affected (0.03 sec)mysql>
事务2
mysql> begin;
Query OK, 0 rows affected (0.02 sec)mysql> insert into student_sku (sku_id, sku_state, prom_id,prom_price, prom_state, start_time, end_time, create_time, update_time ) values (1114, 2, 1232, 2, 2, now(), now(), now(), now() ) ON DUPLICATE KEY UPDATE prom_id = 1232, prom_price = 2, prom_state = 2, start_time = now(), end_time = now(), update_time = now();
Query OK, 1 row affected (0.02 sec)mysql>
事务3
mysql> begin;
Query OK, 0 rows affected (0.02 sec)mysql> insert into student_sku (sku_id, sku_state, prom_id,prom_price, prom_state, start_time, end_time, create_time, update_time ) values (1115, 2, 1233, 2, 2, now(), now(), now(), now() ) ON DUPLICATE KEY UPDATE prom_id = 1233, prom_price = 2, prom_state = 2, start_time = now(), end_time = now(), update_time = now();
1205 - Lock wait timeout exceeded; try restarting transaction
mysql>
查看此时锁日志:show engine innodb status
------------
TRANSACTIONS
------------
Trx id counter 225695
Purge done for trx's n:o < 225693 undo n:o < 0 state: running but idle
History list length 39
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421367021271328, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421367021266768, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 225694, ACTIVE 16 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 6211, OS thread handle 139891716085504, query id 714060 100.124.26.230 dev update
insert into student_sku (sku_id, sku_state, prom_id,prom_price, prom_state, start_time, end_time, create_time, update_time ) values (1115, 2, 1233, 2, 2, now(), now(), now(), now() ) ON DUPLICATE KEY UPDATE prom_id = 1233, prom_price = 2, prom_state = 2, start_time = now(), end_time = now(), update_time = now()
------- TRX HAS BEEN WAITING 16 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 108 page no 4 n bits 72 index uniq_promotion_id of table `commoditypools`.`student_sku` trx id 225694 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;------------------
TABLE LOCK table `commoditypools`.`student_sku` trx id 225694 lock mode IX
RECORD LOCKS space id 108 page no 4 n bits 72 index uniq_promotion_id of table `commoditypools`.`student_sku` trx id 225694 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;RECORD LOCKS space id 108 page no 4 n bits 72 index uniq_promotion_id of table `commoditypools`.`student_sku` trx id 225694 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;---TRANSACTION 225693, ACTIVE 2630 sec
5 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 1
MySQL thread id 6210, OS thread handle 139891166349056, query id 714053 100.124.26.230 dev
TABLE LOCK table `commoditypools`.`student_sku` trx id 225693 lock mode IX
RECORD LOCKS space id 108 page no 4 n bits 72 index uniq_promotion_id of table `commoditypools`.`student_sku` trx id 225693 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;RECORD LOCKS space id 108 page no 4 n bits 72 index uniq_promotion_id of table `commoditypools`.`student_sku` trx id 225693 lock_mode X locks gap before rec
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 80000000000004d0; asc ;;
1: len 8; hex 800000000000000b; asc ;;RECORD LOCKS space id 108 page no 5 n bits 72 index uniq_sku_id of table `commoditypools`.`student_sku` trx id 225693 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;RECORD LOCKS space id 108 page no 5 n bits 72 index uniq_sku_id of table `commoditypools`.`student_sku` trx id 225693 lock_mode X locks gap before rec
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 800000000000045a; asc Z;;
1: len 8; hex 800000000000000b; asc ;;---TRANSACTION 225688, ACTIVE 2691 sec
4 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 6208, OS thread handle 139891587360512, query id 714049 100.124.26.230 dev
TABLE LOCK table `commoditypools`.`student_sku` trx id 225688 lock mode IX
RECORD LOCKS space id 108 page no 4 n bits 72 index uniq_promotion_id of table `commoditypools`.`student_sku` trx id 225688 lock_mode X
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 80000000000004cf; asc ;;
1: len 8; hex 8000000000000008; asc ;;RECORD LOCKS space id 108 page no 5 n bits 72 index uniq_sku_id of table `commoditypools`.`student_sku` trx id 225688 lock_mode X
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 8000000000000458; asc X;;
1: len 8; hex 8000000000000008; asc ;;RECORD LOCKS space id 108 page no 3 n bits 72 index PRIMARY of table `commoditypools`.`student_sku` trx id 225688 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 8; hex 8000000000000008; asc ;;
1: len 6; hex 000000037198; asc q ;;
2: len 7; hex 38000002610f7d; asc 8 a };;
3: len 8; hex 8000000000000458; asc X;;
4: len 1; hex 82; asc ;;
5: len 8; hex 80000000000004cf; asc ;;
6: len 7; hex 80000000000200; asc ;;
7: len 1; hex 82; asc ;;
8: len 5; hex 99a9720264; asc r d;;
9: len 5; hex 99a9720264; asc r d;;
10: len 5; hex 99a9717ea5; asc q~ ;;
11: len 5; hex 99a9720264; asc r d;;
3、加锁分析
表中已有数据
Transaction1
insert into student_sku (sku_id, sku_state, prom_id,prom_price, prom_state, start_time, end_time, create_time, update_time ) values (1112, 2, 1231, 2, 2, now(), now(), now(), now() )
ON DUPLICATE KEY UPDATE prom_id = 1231, prom_price = 2, prom_state = 2, start_time = now(), end_time = now(), update_time = now();
Transaction2
insert into student_sku (sku_id, sku_state, prom_id,prom_price, prom_state, start_time, end_time, create_time, update_time ) values (1114, 2, 1232, 2, 2, now(), now(), now(), now() )
ON DUPLICATE KEY UPDATE prom_id = 1232, prom_price = 2, prom_state = 2, start_time = now(), end_time = now(), update_time = now();
Query OK, 1 row affected (0.02 sec)
Transaction3
insert into student_sku (sku_id, sku_state, prom_id,prom_price, prom_state, start_time, end_time, create_time, update_time ) values (1115, 2, 1233, 2, 2, now(), now(), now(), now() )
ON DUPLICATE KEY UPDATE prom_id = 1233, prom_price = 2, prom_state = 2, start_time = now(), end_time = now(), update_time = now();
number | Transaction1 | Transaction2 | Transaction3 |
1 | uniq_promotion_id加锁 | uniq_promotion_id加锁 | uniq_promotion_id等待加锁,最终超时 |
2 | uniq_sku_id加锁 | uniq_sku_id加锁 | |
3 | primary加锁 |
事物1首先到来,事物1的两个唯一索引uniq_promotion_id和uniq_sku_id插入的列值分别为1112,1231,首先对uniq_promotion_id加锁,然后对uniq_sku_id加锁,由于1112,1231,已经在表中已经存在,故此需要更新该行,更新时候需要根据主键更新,便对primary加锁。事务1对接下来的死锁没有什么影响,这里可以忽略。
事务2到来,事务2的唯一索引uniq_promotion_id插入的列值1232,在表中不存在,因此事务2执行插入操作,接着便对uniq_promotion_id加锁,这里的锁是间隙锁,加锁区间是(1231,+supernum]。
事务3到来,事务3的uniq_promotion_id唯一索引插入的列值1233,在表中也不存在,因此事务3执行插入操作,开始对uniq_promotion_id加锁,而此时发现要插入的值1233,已经被事务2锁定,即事务2占有uniq_promotion_id的锁,事务2没有提交,没有被释放该锁,于是事务3等待锁超时了。此时,若事务2在执行一条插入语句,就死锁了
三、解决方案
(1)方案一
先根据条件查询,再判断是插入或者根据主键更新。
此方案,先根据已知条件查询,若数据存在,再根据主键更新数据,从缩小锁粒度,减少并发等待时间方面考虑解决死锁问题。
但是先查询,再插入或更新,在并发情况下,并不能保证原子性。比如,有同一个sku_id有两条促销消息,且促销状态不同,同一时刻到来并执行,这个时候查询表中并不存在该sku_id,两条消息都被插入表中,这显然不是我们想要的。
因此,该方案适合,同一条业务数据,不会同一时刻出现多条的情况,即多个相同的数据不存在并发现象
(2)方案二
增加分布式锁,保证有序执行。此方案,从去除事务并发方面考虑解决死锁问题
由于方案一,在并发执行时候,同一个sku_id,可能会insert进表中多条数据,并不能保证其原子性。因此,无需修改SQL,增加分布式锁(Redis、Zookeeper等)即可,保证其有序性,从而解决事务并发问题
但是,加锁,多少会降低系统的处理速度,但是这也是分布式系统必须考虑的问题
参考:https://blog.csdn.net/li563868273/article/details/105213266/