系统不能使用,数据库上99%的会话等待是Waiting FOR TABLE FLUSH,找到等待时间最长的会话kill,系统恢复正常。是什么原因造成的呢?
日志分析:
Time: 2018-12-15T10:45:50.116723+08:00
User@Host: gg[gg] @ [10.92.221.131] Id: 9435656
Query_time: 58475.741298 Lock_time: 0.000561 Rows_sent: 0 Rows_examined: 32858598152
SET TIMESTAMP=1544841950;
CALL PRO_COUNT_XXXXX
();
Time: 2018-12-15T10:45:50.128296+08:00
User@Host: root[root] @ [10.92.221.125] Id: 9440022
Query_time: 31548.800341 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
SET TIMESTAMP=1544841950;
FLUSH /*!40101 LOCAL */ TABLES;
从日志上分析猜测是flush tables跟慢的SQL相互影响导致的问题。因为flush table会关闭所有的表,包括正在使用的表,并且会flush query cache,正在执行的SQL是正在打开表,所以flush tables被堵塞,它进而堵塞了后面的查询。实验验证:
session1:
INSERT INTO test1 SELECT a.* FROM test a,test b; #test表是一张几十万数据的表
session2:
FLUSH TABLES;
session3:
SELECT * FROM test;
session4:
SELECT * FROM test;
session5:
SHOW PROCESSLIST
Id USER HOST db Command TIME State Info
1 root localhost:48463 test QUERY 48 Waiting FOR TABLE FLUSH FLUSH TABLES
2 root localhost:48480 test SLEEP 124 (NULL)
3 root localhost:48490 test QUERY 73 Sending DATA INSERT INTO test1 SELECT a.* FROM test a,test b
4 root localhost:48491 test QUERY 37 Waiting FOR TABLE FLUSH SELECT * FROM test
6 root localhost:48496 test QUERY 0 init SHOW PROCESSLIST
8 root localhost:48504 test QUERY 8 Waiting FOR TABLE FLUSH SELECT * FROM test