一、问题描述:

     同事反馈线上一个表有其中一条数据无法删除,其他都正常,我拿到删数据的sql,尝试执行,报错如下:

mysql> delete from facebook_posts where id = 7048962;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

二、问题处理

从报错信息看,应该是关于这条数据有事物未提交,锁等待超时了,下面我们就开始验证并解决问题

1、在sql执行期间,通过information_schema.innodb_trx表找到这个sql的事物ID(5316933097

mysql> select trx_id,trx_started,trx_requested_lock_id,trx_mysql_thread_id,trx_query from information_schema.innodb_trx  where trx_query='delete from facebook_posts where id = 7048962';
+------------+---------------------+------------------------+---------------------+-----------------------------------------------+
| trx_id     | trx_started         | trx_requested_lock_id  | trx_mysql_thread_id | trx_query                                     |
+------------+---------------------+------------------------+---------------------+-----------------------------------------------+
| 5316933097 | 2017-08-15 07:31:57 | 5316933097:923:24693:6 |          1798850878 | delete from facebook_posts where id = 7048962 |
+------------+---------------------+------------------------+---------------------+-----------------------------------------------+
1 row in set (0.00 sec)

关于innodb_trx表字段含义的解释:

mysql> desc information_schema.innodb_trx;       
+----------------------------+---------------------+------+-----+---------------------+-------+
| Field                      | Type                | Null | Key | Default             | Extra |
+----------------------------+---------------------+------+-----+---------------------+-------+
| trx_id                     | varchar(18)         | NO   |     |                     |       |#事务ID
| trx_state                  | varchar(13)         | NO   |     |                     |       |#事物状态
| trx_started                | datetime            | NO   |     | 0000-00-00 00:00:00 |       |#事物开始时间
| trx_requested_lock_id      | varchar(81)         | YES  |     | NULL                |       |#事物请求锁ID
| trx_wait_started           | datetime            | YES  |     | NULL                |       |#事物开始等待时间
| trx_weight                 | bigint(21) unsigned | NO   |     | 0                   |       |#
| trx_mysql_thread_id        | bigint(21) unsigned | NO   |     | 0                   |       |#事物线程ID,即show processlist看到ID
| trx_query                  | varchar(1024)       | YES  |     | NULL                |       |#具体SQL
| trx_operation_state        | varchar(64)         | YES  |     | NULL                |       |#事物当前操作状态
| trx_tables_in_use          | bigint(21) unsigned | NO   |     | 0                   |       |#事物中有多少个表被使用
| trx_tables_locked          | bigint(21) unsigned | NO   |     | 0                   |       |#使用拥有多少个锁
| trx_lock_structs           | bigint(21) unsigned | NO   |     | 0                   |       |#
| trx_lock_memory_bytes      | bigint(21) unsigned | NO   |     | 0                   |       |#事物锁住的内存大小
| trx_rows_locked            | bigint(21) unsigned | NO   |     | 0                   |       |#事物锁住的行数
| trx_rows_modified          | bigint(21) unsigned | NO   |     | 0                   |       |#使用修改的行数
| trx_concurrency_tickets    | bigint(21) unsigned | NO   |     | 0                   |       |#事物并发票数
| trx_isolation_level        | varchar(16)         | NO   |     |                     |       |#事物隔离级别
| trx_unique_checks          | int(1)              | NO   |     | 0                   |       |#是否唯一性检查
| trx_foreign_key_checks     | int(1)              | NO   |     | 0                   |       |#是否外键检查
| trx_last_foreign_key_error | varchar(256)        | YES  |     | NULL                |       |#最后的外键错误
| trx_adaptive_hash_latched  | int(1)              | NO   |     | 0                   |       |#
| trx_adaptive_hash_timeout  | bigint(21) unsigned | NO   |     | 0                   |       |#
| trx_is_read_only           | int(1)              | NO   |     | 0                   |       |#
| trx_autocommit_non_locking | int(1)              | NO   |     | 0                   |       |#
+----------------------------+---------------------+------+-----+---------------------+-------+
24 rows in set (0.00 sec)

2、通过上面步骤1找到的事物ID,找到占有锁的事物ID(5316888834  )

mysql> select * from information_schema.innodb_lock_waits  where requesting_trx_id=5316933097;
+-------------------+------------------------+-----------------+------------------------+
| requesting_trx_id | requested_lock_id      | blocking_trx_id | blocking_lock_id       |
+-------------------+------------------------+-----------------+------------------------+
| 5316933097        | 5316933097:923:24693:6 | 5316888834      | 5316888834:923:24693:6 |
+-------------------+------------------------+-----------------+------------------------+
1 row in set (0.00 sec)

关于innodb_lock_waits 表的字段含义的解释:

mysql> desc information_schema.innodb_lock_waits;
+-------------------+-------------+------+-----+---------+-------+
| Field             | Type        | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+-------+
| requesting_trx_id | varchar(18) | NO   |     |         |       |#请求锁的事物ID
| requested_lock_id | varchar(81) | NO   |     |         |       |#请求锁的锁ID
| blocking_trx_id   | varchar(18) | NO   |     |         |       |#当前拥有锁的事物ID
| blocking_lock_id  | varchar(81) | NO   |     |         |       |#当前拥有锁的锁ID
+-------------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

3、通过步骤2找到的占有锁的事物ID,找到占有锁的事物线程ID(1790259884

mysql> select * from  information_schema.innodb_trx where trx_id=5316888834  \G
*************************** 1. row ***************************
                    trx_id: 5316888834
                 trx_state: RUNNING
               trx_started: 2017-08-15 06:00:21
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 6
       trx_mysql_thread_id: 1790259884
                 trx_query: NULL
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 6
     trx_lock_memory_bytes: 1184
           trx_rows_locked: 10
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 10000
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)

4、通过步骤3找的事物ID,可以查看下这个事物发起的账号和主机信息,提供给开发人员查找异常的真正原因,并kill这个事物ID,这条数据就可以正常删除了

#查看下这个事物发起的账号和主机信息
mysql> select * from  information_schema.processlist where ID=1790259884;
+------------+----------+---------------------+--------+---------+------+-------+------+
| ID         | USER     | HOST                | DB     | COMMAND | TIME | STATE | INFO |
+------------+----------+---------------------+--------+---------+------+-------+------+
| 1790259884 | spider_w | 172.31.11.143:46120 | db_mta | Sleep   | 1319 |       | NULL |
+------------+----------+---------------------+--------+---------+------+-------+------+
1 row in set (0.01 sec)
#kill 这个未提交的事物线程ID
mysql> CALL mysql.rds_kill(1790259884);
Query OK, 0 rows affected (0.00 sec)
#删除数据
mysql> delete from facebook_posts where id = 7041232;
Query OK, 1 row affected (0.02 sec)