一 问题描述
执行一个普通的update报错:
ERROR 1205 (HY000): Lock wait timeout exceeded
二 排查思路
查看是否有阻塞:
SELECT
p2.`HOST` Blockedhost,
p2.`USER` BlockedUser,
r.trx_id BlockedTrxId,
r.trx_mysql_thread_id BlockedThreadId,
TIMESTAMPDIFF(
SECOND,
r.trx_wait_started,
CURRENT_TIMESTAMP
) WaitTime,
r.trx_query BlockedQuery,
l.lock_table BlockedTable,
m.`lock_mode` BlockedLockMode,
m.`lock_type` BlockedLockType,
m.`lock_index` BlockedLockIndex,
m.`lock_space` BlockedLockSpace,
m.lock_page BlockedLockPage,
m.lock_rec BlockedLockRec,
m.lock_data BlockedLockData,
p.`HOST` blocking_host,
p.`USER` blocking_user,
b.trx_id BlockingTrxid,
b.trx_mysql_thread_id BlockingThreadId,
b.trx_query BlockingQuery,
l.`lock_mode` BlockingLockMode,
l.`lock_type` BlockingLockType,
l.`lock_index` BlockingLockIndex,
l.`lock_space` BlockingLockSpace,
l.lock_page BlockingLockPage,
l.lock_rec BlockingLockRec,
l.lock_data BlockingLockData,
IF (p.COMMAND = 'Sleep', CONCAT(p.TIME,' seconds'), 0) idel_in_trx
FROM
information_schema.INNODB_LOCK_WAITS w
INNER JOIN information_schema.INNODB_TRX b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.INNODB_TRX r ON r.trx_id = w.requesting_trx_id
INNER JOIN information_schema.INNODB_LOCKS l ON w.blocking_lock_id = l.lock_id AND l.`lock_trx_id`=b.`trx_id`
INNER JOIN information_schema.INNODB_LOCKS m ON m.`lock_id`=w.`requested_lock_id` AND m.`lock_trx_id`=r.`trx_id`
INNER JOIN information_schema. PROCESSLIST p ON p.ID = b.trx_mysql_thread_id
INNER JOIN information_schema. PROCESSLIST p2 ON p2.ID = r.trx_mysql_thread_id
ORDER BY
WaitTime DESC
若有结果输出,说明有阻塞,可以看到是谁阻塞了谁
若为了紧急恢复业务,可通过登录数据库,执行kill 阻塞方线程号来杀掉阻塞方的sql。
示例:
阻塞方线程号是9,则这样杀死该线程:
kill 9;
若杀掉该sql后,还是不断有类似sql阻塞了其他sql,则找相应开发同事,看看其业务逻辑是否包含事务,是否是其中一个环节有问题,导致事务无法提交。
#检查下是否有元数据锁
select * from information_schema.PROCESSLIST where state='Waiting for table metadata lock';
如果有输出结果,说明有元数据锁,该锁影响范围比较大,需要紧急处理。
#查询当前事务
SELECT trx_id,trx_state, trx_started, trx_mysql_thread_id, trx_query FROM information_schema.innodb_trx
一般情况下,事务很快就会执行完的(秒级),我们主要关注下执行时间长的事务(分钟级,小时级)。trx_started表示事务启动的时间。
记录下执行时间比较长的事务的trx_mysql_thread_id值,可通过kill 该值杀死该会话。
# 查看当前运行的sql
SELECT * FROM information_schema.processlist WHERE info IS NOT NULL ORDER BY TIME DESC;
假如有报错-磁盘空间不足,则需要清理binlog以释放磁盘空间。
/*
若mariadb服务器磁盘空间不足,mariadb错误日志(/var/lib/mysql/error.log)里也会有相应报错:
Disk is full writing './mysql-bin.000177'(Errcode: 28). Waiting for someone to free space..
*/
清理binlog。示例:
#查看有哪些binlog
SHOW BINARY LOGS;
#清理slave-bin.000395之前的binlog
PURGE BINARY LOGS TO 'slave-bin.000395';