作者:高鹏
文章末尾有他著作的《深入理解 MySQL 主从原理 32 讲》,深入透彻理解 MySQL 主从,GTID 相关技术知识。源码版本:5.7.22 水平有限,如有误请谅解
一、问题来源
这是一个朋友问我的典型案例。整个故障现象表现为,MySQL 数据库频繁的出现大量的请求不能响应。下面是一些他提供的证据:
1. show processlist从状态信息来看出现如下情况:
- insert 操作:状态为 update
- update/delete 操作:状态为 updating
- select 操作:状态为 sending data
因此可以推断应该是语句执行期间出现了问题,由于篇幅原因只给出一部分,并且我将语句部分也做了相应截断:
show processlist----------------------------
......
11827639 root dbmis Execute 9 updating UPDATE
17224594 root dbmis Execute 8 Sending data SELECT sum(exchange_coin) as exchange_coin FROM
17224595 root dbmis Execute 8 update INSERT INTO
17224596 root dg Execute 8 update INSERT INTO
17224597 root dbmis Execute 8 update INSERT INTO
17224598 root dbmis Execute 7 update INSERT INTO
17224599 root dbmis Execute 7 Sending data SELECT COUNT(*) AS tp_count FROM
17224600 root dg Execute 7 update INSERT INTO
17224601 root dbmis Execute 6 update INSERT INTO
17224602 root dbmis Execute 6 Sending data SELECT sum(exchange_coin) as exchange_coin FROM
17224606 root dbmis Execute 5 update INSERT INTO
17224619 root dbmis Execute 2 update INSERT INTO
17224620 root dbmis Execute 2 update INSERT INTO
17224621 root dbmis Execute 2 Sending data SELECT sum(exchange_coin) as exchange_coin
17224622 root dg Execute 2 update INSERT INTO
17224623 root dbmis Execute 1 update INSERT INTO
17224624 root dbmis Execute 1 update INSERT INTO
17224625 root dg Execute 1 update INSERT INTO
17224626 root dbmis Execute 0 update INSERT INTO
2. 系统 IO/CPU
从 vmstat 来看,CPU 使用不大,而 IO 也在可以接受的范围内(vmstat wa% 不高且 b 列为 0 )如下:
vmstat--------------------------------------
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 927300 3057100 0 53487316 0 0 5 192 0 0 3 1 96 0 0
iostat--------------------------------------
Linux 3.10.0-693.el7.x86_64 (fang-data1) 09/23/2019 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.72 0.00 0.52 0.45 0.00 96.31
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 9.73 11.28 3.93 264.54 415.23 2624.20 22.64 0.25 0.93 3.25 0.90 0.80 21.61
sda 10.13 11.59 6.34 264.22 450.68 2624.20 22.73 0.01 0.05 2.55 1.00 0.93 25.19
sdc 11.60 11.36 5.03 263.12 453.02 2592.44 22.71 0.17 0.62 5.08 0.53 0.81 21.60
sde 0.01 0.10 0.11 160.45 6.69 920.23 11.55 0.16 1.01 1.80 1.01 0.83 13.32
sdd 11.26 11.30 2.23 263.18 412.90 2592.44 22.65 0.17 0.65 10.37 0.56 0.82 21.78
md126 0.00 0.00 11.30 468.80 164.79 5216.64 22.42 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.11 58.80 6.69 920.23 31.47 0.15 2.56 1.96 2.56 2.16 12.74
dm-1 0.00 0.00 0.06 0.08 0.24 0.31 8.00 0.01 41.80 1.20 72.78 0.83 0.01
dm-2 0.00 0.00 11.24 408.66 164.55 5216.33 25.63 0.14 0.32 1.02 0.30 0.46 19.29
这就比较奇怪了,一般来说数据库不能及时响应请求很大可能是由于系统负载过高。如果说 DML 还可能是 Innodb 锁造成的堵塞,但是大量 sending data 状态下的 select 操作一般可能都和系统负载过高有联系,但是这里系统负载还在可以接受的范围内。
二、pstack 分析
借助 pstack 查看线程的栈帧,查看 pstack 发现如下(由于篇幅限制只给出部分说明问题的部分):
1. insert 线程:
Thread 85 (Thread 0x7fbb0d42b700 (LWP 20174)):
#0 0x00007fbfae164c73 in select () from /lib64/libc.so.6
#1 0x0000000000987c0f in os_thread_sleep (tm=<optimized out>) at /home/install/lnmp1.5/src/mysql-5.6.40/storage/innobase/os/os0thread.cc:287
#2 0x00000000009e4dea in srv_conc_enter_innodb_with_atomics (trx=trx@entry=0x7fba4802f9c8) at /home/install/lnmp1.5/src/mysql-5.6.40/storage/innobase/srv/srv0conc.cc:276
#3 srv_conc_enter_innodb (trx=trx@entry=0x7fba4802f9c8) at /home/install/lnmp1.5/src/mysql-5.6.40/storage/innobase/srv/srv0conc.cc:511
#4 0x000000000093b948 in innobase_srv_conc_enter_innodb (trx=0x7fba4802f9c8) at /home/install/lnmp1.5/src/mysql-5.6.40/storage/innobase/handler/ha_innodb.cc:1280
#5 ha_innobase::write_row (this=0x7fb8440ab260, record=0x7fb8440ab650 "") at /home/install/lnmp1.5/src/mysql-5.6.40/storage/innobase/handler/ha_innodb.cc:6793
#6 0x00000000005b440f in handler::ha_write_row (this=0x7fb8440ab260, buf=0x7fb8440ab650 "") at /home/install/lnmp1.5/src/mysql-5.6.40/sql/handler.cc:7351
#7 0x00000000006dd3a8 in write_record (thd=thd@entry=0x1d396c90, table=table@entry=0x7fb8440aa970, info=info@entry=0x7fbb0d429400, update=update@entry=0x7fbb0d429480) at /home/install/lnmp1.5/src/mysql-5.6.40/sql/sql_insert.cc:1667
#8 0x00000000006e2541 in mysql_insert (thd=thd@entry=0x1d396c90, table_list=<optimized out>, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_REPLACE, ignore=false) at /home/install/lnmp1.5/src/mysql-5.6.40/sql/sql_insert.cc:1072
#9 0x00000000006fa90a in mysql_execute_command (thd=thd@entry=0x1d396c90) at /home/install/lnmp1.5/src/mysql-5.6.40/sql/sql_parse.cc:3500
2. update 线程
Thread 81 (Thread 0x7fbb24b67700 (LWP 27490)):
#0 0x00007fbfae164c73 in select () from /lib64/libc.so.6
#1 0x0000000000987c0f in os_thread_sleep (tm=<optimized out>) at /home/install/lnmp1.5/src/mysql-5.6.40/storage/innobase/os/