作为一名全职运维,随时会碰到各种问题,今天晚上收到紧报警,一台数据库服务器磁盘空间使用快速从80%使用率到90%。我们的数据库都是>2T的磁盘,意识到这肯定是比较严重问题马上上线处理。

状况如下:

[root@mysql-node1 tmp]# ls
#sql_8cc3_0.MYD #sql_8cc3_0.MYI  #sql_8cc3_10.MYD   #sql_8cc3_10.MYI  #sql_8cc3_5.MYD  #sql_8cc3_5.MYI
[root@mysql-node1 tmp]# du -sh *
36Khsperfdata_root
346G#sql_8cc3_0.MYD
4.0K#sql_8cc3_0.MYI
336G#sql_8cc3_10.MYD
4.0K#sql_8cc3_10.MYI
340G#sql_8cc3_5.MYD
4.0K#sql_8cc3_5.MYI

根据尝试判定这是mysql生成的文件,查看了数据库:

mysql> show processlist;
+------------+-----------------+----------------------+-----------+-------------+----------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id         | User            | Host                 | db        | Command     | Time     | State                                                                 | Info                                                                                                 |
+------------+-----------------+----------------------+-----------+-------------+----------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
|          1 | event_scheduler | localhost            | NULL      | Daemon      | 54024745 | Waiting on empty queue                                                | NULL                                                                                                 |
| 2912394659 | nginxs_rw       | 172.17.11.99:12936  | nginxs    | Execute     |    12508 | Sending data                                                          | select 
  month(a.blog_date), 
  count(distinct b.usname) as android, 
  count(distinct c.usname) |
| 2912395083 | nginxs_rw       | 172.17.11.99:34020  | nginxs    | Execute     |    12051 | Sending data                                                          | select 
  month(a.blog_date), 
  count(distinct b.usname) as android, 
  count(distinct c.usname) |
| 2912402122 | root            | localhost            | nginxs    | Query       |        0 | init                                                                  | show processlist                                                                                     |
+------------+-----------------+----------------------+-----------+-------------+----------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)



解决方法:

根据经验快速判定上面两条sql join执行有问题,立即联系相关人员确定sql可以杀掉,

mysql> kill 2912394659;
Query OK, 0 rows affected (0.00 sec)
mysql> kill 2912395083;
Query OK, 0 rows affected (0.00 sec)


杀掉这两个sql以后,数据库立即开始释放临时文件,磁盘空间恢复正常。


总结:

   在线上数据库使用时,尽量给一些临时文件限制上限,下面是几个常见的参数

tmp_table_size          = 256M
max_heap_table_size     = 256M
thread_cache_size       = 64
myisam_sort_buffer_size         = 32M
myisam_max_sort_file_size       = 10G
max_join_size              = 268435456
innodb_online_alter_log_max_size = 134217728
innodb_sort_buffer_size   = 1048576
max_allowed_packet = 128M
max_binlog_size         = 256M



在线修改方法:

mysql> set max_join_size=268435456;
Query OK, 0 rows affected (0.00 sec)