一次线上数据库死锁问题的排查

最新推荐文章于 2024-08-11 13:30:00 发布

bohu83

最新推荐文章于 2024-08-11 13:30:00 发布

阅读量853

点赞数

分类专栏：数据库 MYSQL 文章标签： mysql 死锁分析加锁过程索引

本文链接：https://blog.csdn.net/bohu83/article/details/97652487

版权

数据库同时被 2 个专栏收录

87 篇文章 13 订阅

订阅专栏

MYSQL

84 篇文章 23 订阅

订阅专栏

一背景：

数据库阿里云RDS mysql 5.6 隔离级别: RC

听小伙伴说系统最近在报死锁。奇怪的是update 还执行了。

二相关准备

看表，主要是看表的所以。数据脱敏处理下，我们称为log表。

发现有索引userid,orderid. 两个单独索引。

再看下数据量 count下。真大已经1亿多条了。

看日志，SHOW ENGINE INNODB STATUS ，没权限，找dba执行完了导出来给一份。

相关知识：首先关于innodb的锁，推荐官网：

其次，加锁过程，推荐大神何博士的文章：http://hedengcheng.com/?p=771

贴一下结论：结论：若id列上有非唯一索引，那么对应的所有满足SQL查询条件的记录，都会被加锁。同时，这些记录在主键索引上的记录，也会被加锁。网上找个示意图

这里，我们有个概念，就是表锁跟行锁的区别，对于innodb的行锁不同情况不一样，今天要分析的case就是锁住了锁索引。

一些数据结构：

以下引自阿里数据库团队（阿里云数据库内核团队真心厉害，因为你懂原理是一会事，看懂源码是另一会事，能改内核源码做优化是顶级的）

在 Innodb 内部用一个 unsiged long 类型数据表示锁的类型, 如图所示，最低的 4 个 bit 表示 lock_mode, 5-8 bit 表示 lock_type, 剩下的高位 bit 表示行锁的类型。

record_lock type

lock_type

lock_mode

lock_mode 描述了锁的基本类型，在代码中的定义如下：

/* Basic lock modes */
enum lock_mode {
	LOCK_IS = 0,	/* intention shared */
	LOCK_IX,	/* intention exclusive */
	LOCK_S,		/* shared */
	LOCK_X,		/* exclusive */
	LOCK_AUTO_INC,	/* locks the auto-inc counter of a table
			in an exclusive mode */
	LOCK_NONE,	/* this is used elsewhere to note consistent read */
	LOCK_NUM = LOCK_NONE, /* number of lock modes */
	LOCK_NONE_UNSET = 255
};
#define LOCK_MODE_MASK 0xFUL /* mask used to extact lock type from the
						type_mode field in a lock*/

lock_type 占用 5-8 bit 位，目前只用了 5 和 6 位，大小为 16 和 32 ，表示 LOCK_TABLE 和 LOCK_REC，使用宏定义 #define LOCK_TYPE_MASK 0xF0UL 来获取值。

record_lock_type 对于 LOCK_TABLE 类型来说都是空的，对于 LOCK_REC 目前值有：

#define LOCK_WAIT   256		/* 表示正在等待锁 */
#define LOCK_ORDINARY 0 	/* 表示 next-key lock ，锁住记录本身和记录之前的 gap*/
#define LOCK_GAP    512		/* 表示锁住记录之前 gap（不锁记录本身） */
#define LOCK_REC_NOT_GAP 1024	/* 表示锁住记录本身，不锁记录前面的 gap */
#define LOCK_INSERT_INTENTION 2048	/* 插入意向锁 */
#define LOCK_CONV_BY_OTHER 4096		/* 表示锁是由其它事务创建的(比如隐式锁转换) */

使用位操作来设置和判断是否设置了对应的值。

静态数据结构

对于每个锁对象，有两个存在的纬度：一个是事务纬度，每个事务都可以获得锁结构和等待某些锁。另一个是全局纬度，所有的锁都保存在 Lock_sys->hash 哈希表中。无论是表锁还是行锁，都是用结构 lock_t 来描述：

/** Lock struct; protected by lock_sys->mutex */
struct lock_t {
    trx_t*      trx;        /*!< transaction owning the
                    lock */
    UT_LIST_NODE_T(lock_t)
            trx_locks;  /*!< list of the locks of the
                    transaction */
    ulint       type_mode;  /*!< lock type, mode, LOCK_GAP or
                    LOCK_REC_NOT_GAP,
                    LOCK_INSERT_INTENTION,
                    wait flag, ORed */
    hash_node_t hash;       /*!< hash chain node for a record
                    lock */
    dict_index_t*   index;      /*!< index for a record lock */
    union {
        lock_table_t    tab_lock;/*!< table lock */
        lock_rec_t  rec_lock;/*!< record lock */
    } un_member;            /*!< lock details */
};

对于每个变量的意义注释已经说的比较清楚了，其中 type_mode 就是第一小节中 lock_type | type_mode，两个锁是否冲突就是使用它们各自的 type_mode 根据锁兼容矩阵来判断的，后面会详细说。

变量 hash 是 Inodb 中构造 Hash 表需要，当锁插入到 Lock_sys->hash 中，Hash 值相同就形成链表，使用变量 hash 相连。

un_member 表示 lock_t 不是表锁就是行锁，看下行锁的结构：

/** Record lock for a page */
struct lock_rec_t {
    ulint   space;          /*!< space id */
    ulint   page_no;        /*!< page number */
    ulint   n_bits;         /*!< number of bits in the lock
                    bitmap; NOTE: the lock bitmap is
                    placed immediately after the
                    lock struct */
};

[space, page_no] 可以确定锁对应哪个页，参考下上个月月报最后两个小节，页上每行数据紧接着存放，内部使用一个 heap_no 来表示是第几行数据。因此[space, page_no, heap_no]可以唯一确定一行。Innodb 使用位图来表示锁具体锁住了那几行，在函数 lock_rec_create 中为 lock_t 分配内存空间的时候，会在对象地址后分配一段内存空间(当前行数 + 64)用来保存位图。n_bits 表示位图大小。

锁创建完成后首先会插入到全局 Hash 表中，然后放到对应的事务的锁链表中。相同(space,page_no)的锁会被 Hash 到同一个桶里，使用 lock_t->hash 串成链表。

imag

实际产生的锁是在事务中，对应每个行锁或者表锁，有一个锁结构lock_t。有两种途径查询锁：第一种方式通过trx_t对象的trx_locks链表进行遍历可得到某个事务所持有或者正在等待的锁信息；另一种方式中全局变量lock_sys本质是一个哈希表，根据记录所在页进行哈希查询得到lock_rec_t，再扫描lock位图判断，最终得到该行记录是否有锁。

三分析

先看下死锁日志：

------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-07-28 16:55:55 7fa4396ba700
*** (1) TRANSACTION:
TRANSACTION 77124411358, ACTIVE 0.015 sec fetching rows
mysql tables in use 3, locked 3
LOCK WAIT 520 lock struct(s), heap size 63016, 3 row lock(s)
LOCK BLOCKING MySQL thread id: 4034245 block 4034802
MySQL thread id 4034802, OS thread handle 0x7fa4105e7700, query id 113785089726 10.10.10.10 djsycrm Searching rows for update
UPDATE log SET flush=2  WHERE  uid=457861679367545 and orderid=6840
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1335 page no 1049191 n bits 576 index `idx_uid` of table `XXXX`.`log` trx id 77124411358 lock_mode X locks rec but not gap waiting
Record lock, heap no 462 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 8001a06c3ac14579; asc    l: Ey;;
 1: len 8; hex 0000000006452bee; asc      E+ ;;

*** (2) TRANSACTION:
TRANSACTION 77124411351, ACTIVE 0.022 sec fetching rows
mysql tables in use 3, locked 3
531 lock struct(s), heap size 63016, 8 row lock(s)
MySQL thread id 4034245, OS thread handle 0x7fa4396ba700, query id 113785089680 10.10.10.10 djsycrm Searching rows for update
UPDATE log SET  flush=2  WHERE  uid=457861679367545 and orderid=5392 
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1335 page no 1049191 n bits 576 index `idx_uid` of table `XXXXX`.`log` trx id 77124411351 lock_mode X locks rec but not gap
Record lock, heap no 462 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 8001a06c3ac14579; asc    l: Ey;;
 1: len 8; hex 0000000006452bee; asc      E+ ;;

Record lock, heap no 466 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 8001a06c3ac14579; asc    l: Ey;;
 1: len 8; hex 00000000064532bd; asc      E2 ;;

Record lock, heap no 474 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 8001a06c3ac14579; asc    l: Ey;;
 1: len 8; hex 0000000006454275; asc      EBu;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1335 page no 1256651 n bits 232 index `PRIMARY` of table `XXXXX`.`log` trx id 77124411351 lock_mode X locks rec but not gap waiting
Record lock, heap no 11 PHYSICAL RECORD: n_fields 15; compact format; info bits 0
 0: len 8; hex 0000000006454275; asc      EBu;;
 1: len 6; hex 0011f4d34089; asc     @ ;;
 2: len 7; hex eb0002c0320110; asc     2  ;;
 3: len 8; hex 8001a06c3ac14579; asc    l: Ey;;
 4: len 8; hex 82a5c7d505c7d108; asc         ;;
 5: len 4; hex 5d3d4fda; asc ]=O ;;
 6: len 4; hex 5d3d4fda; asc ]=O ;;
 7: len 4; hex 80000000; asc     ;;
 8: len 4; hex 80000001; asc     ;;
 9: len 4; hex 80000000; asc     ;;
 10: len 2; hex 7b7d; asc {};;
 11: len 4; hex 80000001; asc     ;;
 12: len 4; hex 80000000; asc     ;;
 13: len 0; hex ; asc ;;
 14: len 1; hex 30; asc 0;;

*** WE ROLL BACK TRANSACTION (1)

数据本身被处理过，但是对于理解问题还是有帮助的，能定位主要信息有 1.只显示最近两条事务的信息 2.只显示事务最近执行的一条语句。如文中推荐的做法，配合 general log 和 binlog 进行排查。

先看下日志信息：

有两个事务，对应的SQL，索引信息，锁模式，页面信息及对应的行号（你能找到记录），等待的锁：

UPDATE log SET flush=2 WHERE uid=457861679367545 and orderid=6840

UPDATE log SET flush=2 WHERE uid=457861679367545 and orderid=5392

where条件的 uid 相同，orderid不同。都是索引。

我们看下执行计划：explain

id: 1
select_type: SIMPLE
table: log
type: index_merge
possible_keys: idx_orderid,idx_uid
key: idx_orderid,idx_uid
key_len: 8,8
ref: NULL
rows: 1
Extra: Using intersect(idx_orderid,idx_uid); Using where

是两个索引的交集。后来我查库发现：通过索引复合条件有多条，就是说update锁住的记录数也就不是一条。

那么在并发的情况下，会产生这种情况。

事务1 针对orderid的索引加锁成功，等待uid对应的数据再加锁。

事务2针对uid 索引加锁成功，等待orderid。加锁顺序不同导致的互相等待产生死锁。

上面的死锁日志，可以看出：事务1 等待Record lock, heap no 462 PHYSICAL RECORD:

恰恰是事务2 持有的锁。

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1335 page no 1049191 n bits 576 index `idx_uid` of table `XXXXX`.`log` trx id 77124411351 lock_mode X locks rec but not gap
Record lock, heap no 462 PHYSICAL RECORD: n_fields 2; compact format; info bits 0

至于事务2 所期待的锁：这里日志信息还是不足。应该结合binlog 来排查。

最后的日志是mysql 进行了牺牲，放弃了事务1 *** WE ROLL BACK TRANSACTION (1)

根据事务1 的SQL去查一下，的确没执行，所以说小伙伴的描述不一定准确，要实地验证才行。

**********************************************************************************

这里谈论MySQL的死锁检测，目前仅讨论InnoDB的处理。

当InnoDB事务尝试获取(请求)加一个锁，并且需要等待时，InnoDB会进行死锁检测.

正常的流程如下:

1.InnoDB的初始化一个事务，当事务尝试获取（请求）加一个锁，并且需要等待时(wait_lock)，innodb会开始进行死锁检测(deadlock_mark)

2.进入到lock_deadlock_check_and_resolve ，名字很明显了，要检测死锁和解决死锁

3.检测死锁过程中，也是有计数器来进行限制的

4.死锁检测的逻辑之一是等待图的处理过程，如果通过锁的信息和事务等待链构造出一个图，如果图中出现回路，就认为发生了死锁。

5.死锁的回滚，内部代码的处理逻辑之一是比较undo的数量

阿里云有个英文版的，过程介绍更加详细：

https://topic.alibabacloud.com/a/talk-about-mysql-deadlock-one_1_41_30042817.html

我补充一点，整理自mysql 5.6.45版本。主要是lock_rec_lock 函数调用到死锁检测函数lock_deadlock_check_and_resolve。对于行数据的加锁是由函数 lock_rec_lock 完成，简单点来看，主要的参数是 mode(锁类型)，block(包含该行的 buffer 数据页)，heap_no(具体哪一行)。就可以确定加什么样的锁，以及在哪一行加。前面的何博士哪一篇介绍清楚了。

源码如下：

/*********************************************************************//**
Tries to lock the specified record in the mode requested. If not immediately
possible, enqueues a waiting lock request. This is a low-level function
which does NOT look at implicit locks! Checks lock compatibility within
explicit locks. This function sets a normal next-key lock, or in the case
of a page supremum record, a gap type lock.
@return	DB_SUCCESS, DB_SUCCESS_LOCKED_REC, DB_LOCK_WAIT, DB_DEADLOCK,
or DB_QUE_THR_SUSPENDED */
static
dberr_t
lock_rec_lock(
/*==========*/
	ibool			impl,	/*!< in: if TRUE, no lock is set
					if no wait is necessary: we
					assume that the caller will
					set an implicit lock */
	ulint			mode,	/*!< in: lock mode: LOCK_X or
					LOCK_S possibly ORed to either
					LOCK_GAP or LOCK_REC_NOT_GAP */
	const buf_block_t*	block,	/*!< in: buffer block containing
					the record */
	ulint			heap_no,/*!< in: heap number of record */
	dict_index_t*		index,	/*!< in: index of record */
	que_thr_t*		thr)	/*!< in: query thread */
{
	ut_ad(lock_mutex_own());
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_S
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IS));
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_X
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
	ut_ad((LOCK_MODE_MASK & mode) == LOCK_S
	      || (LOCK_MODE_MASK & mode) == LOCK_X);
	ut_ad(mode - (LOCK_MODE_MASK & mode) == LOCK_GAP
	      || mode - (LOCK_MODE_MASK & mode) == LOCK_REC_NOT_GAP
	      || mode - (LOCK_MODE_MASK & mode) == 0);
	ut_ad(dict_index_is_clust(index) || !dict_index_is_online_ddl(index));

	/* We try a simplified and faster subroutine for the most
	common cases */
	switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {
	case LOCK_REC_SUCCESS:
		return(DB_SUCCESS);
	case LOCK_REC_SUCCESS_CREATED:
		return(DB_SUCCESS_LOCKED_REC);
	case LOCK_REC_FAIL:
		return(lock_rec_lock_slow(impl, mode, block,
					  heap_no, index, thr));
	}

	ut_error;
	return(DB_ERROR);
}
/*********************************************************************//**
This is a fast routine for locking a record in the most common cases:
there are no explicit locks on the page, or there is just one lock, owned
by this transaction, and of the right type_mode. This is a low-level function
which does NOT look at implicit locks! Checks lock compatibility within
explicit locks. This function sets a normal next-key lock, or in the case of
a page supremum record, a gap type lock.
@return whether the locking succeeded */
UNIV_INLINE
enum lock_rec_req_status
lock_rec_lock_fast(
/*===============*/
	ibool			impl,	/*!< in: if TRUE, no lock is set
					if no wait is necessary: we
					assume that the caller will
					set an implicit lock */
	ulint			mode,	/*!< in: lock mode: LOCK_X or
					LOCK_S possibly ORed to either
					LOCK_GAP or LOCK_REC_NOT_GAP */
	const buf_block_t*	block,	/*!< in: buffer block containing
					the record */
	ulint			heap_no,/*!< in: heap number of record */
	dict_index_t*		index,	/*!< in: index of record */
	que_thr_t*		thr)	/*!< in: query thread */
{
	lock_t*			lock;
	trx_t*			trx;
	enum lock_rec_req_status status = LOCK_REC_SUCCESS;

	ut_ad(lock_mutex_own());
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_S
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IS));
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_X
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
	ut_ad((LOCK_MODE_MASK & mode) == LOCK_S
	      || (LOCK_MODE_MASK & mode) == LOCK_X);
	ut_ad(mode - (LOCK_MODE_MASK & mode) == LOCK_GAP
	      || mode - (LOCK_MODE_MASK & mode) == 0
	      || mode - (LOCK_MODE_MASK & mode) == LOCK_REC_NOT_GAP);
	ut_ad(dict_index_is_clust(index) || !dict_index_is_online_ddl(index));

	DBUG_EXECUTE_IF("innodb_report_deadlock", return(LOCK_REC_FAIL););

	lock = lock_rec_get_first_on_page(block);

	trx = thr_get_trx(thr);

	if (lock == NULL) {// 对应的没有锁
		if (!impl) {//需要加锁
			/* Note that we don't own the trx mutex. */
			 /*创建一个锁*/
			lock = lock_rec_create(
				mode, block, heap_no, index, trx, FALSE);

		}
		//返回成功
		status = LOCK_REC_SUCCESS_CREATED;
	} else {
		trx_mutex_enter(trx);

		if (lock_rec_get_next_on_page(lock) /*页上只有一个锁*/
		     || lock->trx != trx  /*拥有锁的事务不是当前事务*/
		     || lock->type_mode != (mode | LOCK_REC) /* 已有锁和要加的锁模式是否相同 */
		     || lock_rec_get_n_bits(lock) <= heap_no) { /* 已有锁的 n_bits 是否满足 heap_no */

			status = LOCK_REC_FAIL;
		} else if (!impl) {
			/* If the nth bit of the record lock is already set
			then we do not set a new lock bit, otherwise we do
			set  如果之前没有设置，那么只需要设置一下 bitmap 就可以了*/
			if (!lock_rec_get_nth_bit(lock, heap_no)) {
				lock_rec_set_nth_bit(lock, heap_no);
				status = LOCK_REC_SUCCESS_CREATED;
			}
		}

		trx_mutex_exit(trx);
	}

	return(status);
}

/*********************************************************************//**
This is the general, and slower, routine for locking a record. This is a
low-level function which does NOT look at implicit locks! Checks lock
compatibility within explicit locks. This function sets a normal next-key
lock, or in the case of a page supremum record, a gap type lock.
@return	DB_SUCCESS, DB_SUCCESS_LOCKED_REC, DB_LOCK_WAIT, DB_DEADLOCK,
or DB_QUE_THR_SUSPENDED */
static
dberr_t
lock_rec_lock_slow(
/*===============*/
	ibool			impl,	/*!< in: if TRUE, no lock is set
					if no wait is necessary: we
					assume that the caller will
					set an implicit lock */
	ulint			mode,	/*!< in: lock mode: LOCK_X or
					LOCK_S possibly ORed to either
					LOCK_GAP or LOCK_REC_NOT_GAP */
	const buf_block_t*	block,	/*!< in: buffer block containing
					the record */
	ulint			heap_no,/*!< in: heap number of record */
	dict_index_t*		index,	/*!< in: index of record */
	que_thr_t*		thr)	/*!< in: query thread */
{
	trx_t*			trx;
	dberr_t			err = DB_SUCCESS;

	ut_ad(lock_mutex_own());
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_S
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IS));
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_X
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
	ut_ad((LOCK_MODE_MASK & mode) == LOCK_S
	      || (LOCK_MODE_MASK & mode) == LOCK_X);
	ut_ad(mode - (LOCK_MODE_MASK & mode) == LOCK_GAP
	      || mode - (LOCK_MODE_MASK & mode) == 0
	      || mode - (LOCK_MODE_MASK & mode) == LOCK_REC_NOT_GAP);
	ut_ad(dict_index_is_clust(index) || !dict_index_is_online_ddl(index));

	DBUG_EXECUTE_IF("innodb_report_deadlock", return(DB_DEADLOCK););

	trx = thr_get_trx(thr);
	trx_mutex_enter(trx);

	if (lock_rec_has_expl(mode, block, heap_no, trx)) {

		/* The trx already has a strong enough lock on rec: do
		nothing */

	} else if (lock_rec_other_has_conflicting(
			static_cast<enum lock_mode>(mode),
			block, heap_no, trx)) {

		/* If another transaction has a non-gap conflicting
		request in the queue, as this transaction does not
		have a lock strong enough already granted on the
		record, we have to wait. */

		err = lock_rec_enqueue_waiting(
			mode, block, heap_no, index, thr);

	} else if (!impl) {
		/* Set the requested lock on the record, note that
		we already own the transaction mutex. */

		lock_rec_add_to_queue(
			LOCK_REC | mode, block, heap_no, index, trx, TRUE);

		err = DB_SUCCESS_LOCKED_REC;
	}

	trx_mutex_exit(trx);

	return(err);
}

关于死锁检测的代码：

/********************************************************************//**
Checks if a joining lock request results in a deadlock. If a deadlock is
found this function will resolve the dadlock by choosing a victim transaction
and rolling it back. It will attempt to resolve all deadlocks. The returned
transaction id will be the joining transaction id or 0 if some other
transaction was chosen as a victim and rolled back or no deadlock found.

@return id of transaction chosen as victim or 0 */
static
trx_id_t
lock_deadlock_check_and_resolve(
/*============================*/
	const lock_t*	lock,	/*!< in: lock the transaction is requesting */
	const trx_t*	trx)	/*!< in: transaction */
{
	trx_id_t	victim_trx_id;

	ut_ad(trx != NULL);
	ut_ad(lock != NULL);
	ut_ad(lock_mutex_own());
	assert_trx_in_list(trx);

	/* Try and resolve as many deadlocks as possible. */
	do {
		lock_deadlock_ctx_t	ctx;

		/* Reset the context. */
		ctx.cost = 0;
		ctx.depth = 0;
		ctx.start = trx;
		ctx.too_deep = FALSE;
		ctx.wait_lock = lock;
		ctx.mark_start = lock_mark_counter;
		//死锁检测
		victim_trx_id = lock_deadlock_search(&ctx);

		/* Search too deep, we rollback the joining transaction. 如果死锁检测的深度太深，需要回滚当前的事务 */
		if (ctx.too_deep) {

			ut_a(trx == ctx.start);
			ut_a(victim_trx_id == trx->id);

			if (!srv_read_only_mode) {
				lock_deadlock_joining_trx_print(trx, lock);
			}

			MONITOR_INC(MONITOR_DEADLOCK);
	// 如果需要回滚的是其他事务，那么调用lock_deadlock_trx_rollback进行回
		} else if (victim_trx_id != 0 && victim_trx_id != trx->id) {

			ut_ad(victim_trx_id == ctx.wait_lock->trx->id);
			lock_deadlock_trx_rollback(&ctx);

			lock_deadlock_found = TRUE;

			MONITOR_INC(MONITOR_DEADLOCK);
		}

	} while (victim_trx_id != 0 && victim_trx_id != trx->id);

	/* If the joining transaction was selected as the victim.   */
	if (victim_trx_id != 0) {
		ut_a(victim_trx_id == trx->id);
		//打印回滚事务
		lock_deadlock_fputs("*** WE ROLL BACK TRANSACTION (2)\n");

		lock_deadlock_found = TRUE;
	}

	return(victim_trx_id);
}

四方案

死锁的发生与加锁顺序关联密切。而加锁顺序又跟使用的索引密切相关。 innodb在RR级别下，容易产生gap锁的问题。

针对本文分析的索引导致并发死锁问题。

两种方案：1.程序加锁。外面拦住并发。key:uid+orderid.

2 ,调整索引结构，从两个独立索引改为联合索引，保证一致性。

还有个相关的事，尽量让dba归档数据。

每次看mysql都觉得博大精深。如果能让你有所收获，就值得码字。

这篇分析的也很好：

https://colinback.github.io/szyblogs/database/2018/05/20/innodb-kernel-8/

mysql 官网源码下载地址：

https://dev.mysql.com/downloads/mysql/5.6.html#downloads

bohu83

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
一次线上数据库死锁问题的排查

一背景：数据库阿里云RDS mysql 5.6 隔离级别: RC 听小伙伴说系统最近在报死锁。奇怪的是update 还执行了。二相关准备看表，主要是看表的所以。数据脱敏处理下，我们称为log表。发现有索引userid,orderid. 两个单独索引。再看下数据量 count下。真大已经1亿多条了。看日志，SHOW ENGINE INNOD...
复制链接

扫一扫