mysql 负载过高的解决方法及验证

(一)

发现问题
top命令 查看服务器负载,发现 mysql竟然经常百分之九十几的cpu,引起Mysql 负载这么高的原因,估计主要应该是索引问题和某些变态SQL语句.

排查思路
1. 确定高负载的类型,top命令看负载高是CPU还是IO。 (CPU占用时间长,IO查询交互较多)
2. 检查慢查询日志,如果前端上了新代码,可能是慢查询引起负载高。
3. 检查硬件问题,是否磁盘故障问题造成的。
4. 检查监控平台,对比此机器不同时间的负载。

记录慢查询
编辑Mysql 配置文件(my.cnf),在[mysqld]字段添加以下几行:
log_slow_queries = /usr/local/mysql/var/slow_queries.log #慢查询日志路径
long_query_time = 5 #记录SQL查询超过5s的语句
log-queries-not-using-indexes = 1 #记录没有使用索引的sql

查看慢查询日志

  1. tail /usr/local/mysql/var/slow_queries.log

# Time: 130308 7:56:39
# User@Host: user[xxxx] @ xxxx[]
# Query_time: 0.001118 Lock_time: 0.000301 Rows_sent: 26 Rows_examined: 52
SET timestamp=1362711399;
SELECT * FROM com WHERE iSUSEd = 1 AND category_id IN ('1') ORDER BY vieworder ASC;
4个参数
Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 54
分别意思为:查询时间 锁定时间查询结果行数 扫描行数
主要看扫描行数多的语句,然后去数据库加上对应的索引,再优化下变态的sql 语句。

极端情况kill sql进程
找出占用cpu时间过长的sql,在mysql 下执行如下命令:

  1. show processlist;

确定后一条sql处于Query状态,且Time时间过长,锁定它的ID,执行如下命令:

  1. kill QUERY 269815764;

注意:杀死 sql进程,可能导致数据丢失,所以执行前要衡量数据的重要性

验证:

MariaDB [(none)]> show processlist;
+----+------+-----------+------+---------+------+------------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State      | Info             | Progress |
+----+------+-----------+------+---------+------+------------+------------------+----------+
|  3 | root | localhost | NULL | Query   |    0 | NULL       | show processlist |    0.000 |
|  5 | root | localhost | NULL | Query   |    3 | User sleep | select sleep(50) |    0.000 |
+----+------+-----------+------+---------+------+------------+------------------+----------+
2 rows in set (0.00 sec)

MariaDB [(none)]> kill query 5;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> select sleep(50);
+-----------+
| sleep(50) |
+-----------+
|         1 |
+-----------+
1 row in set (11.02 sec)
(二)

思路:
1、确定高负载的类型 htop,dstat命令看负载高是CPU还是IO
2、监控具体的sql语句,是insert update 还是 delete导致高负载
3、检查mysql日志
4、检查硬件问题

dstat

可以看到具体是哪个用户哪个进程占用了相关系统资源,当前CPU、内存谁在使用

[root@cc ~]# dstat -l -m -r -c  --top-io --top-mem --top-cpu

--io/total- ------memory-usage----- --most-expensive- ----most-expensive---- -most-expensive-
 read  writ| used  buff  cach  free|  memory process |     i/o process      |  cpu process   
1.90   267 |3399M  178M 3892M  400M|php-fpm: poo 372M|init       1682k  647k|flush-202:0 0.1 0 72.0 |3399M 178M 3892M 400M|php-fpm: poo 372M|php-fpm: po 10k 143k|php-fpm: pool2.0 0 8.00 |3399M 178M 3892M 399M|php-fpm: poo 372M|nginx: work 228k 229k|php-fpm: pool0.5 0 88.0 |3399M 178M 3892M 399M|php-fpm: poo 372M|nginx: work 102k 166k|php-fpm: pool 11 0 38.0 |3399M 178M 3892M 399M|php-fpm: poo 372M|php-fpm: po 787k 650B|php-fpm: pool4.8 0 0 |3399M 178M 3892M 399M|php-fpm: poo 372M|php-fpm: po 788k 723B|php-fpm: pool1.8 0 140 |3400M 178M 3892M 399M|php-fpm: poo 372M|nginx: work 38k 154k|php-fpm: pool1.2 0 12.0 |3400M 178M 3892M 399M|php-fpm: poo 372M|nginx: work 178k 364k|php-fpm: pool1.5 0 0 |3400M 178M 3892M 399M|php-fpm: poo 372M|nginx: work 758k 639k|php-fpm: pool1.5 0 12.0 |3400M 178M 3892M 399M|php-fpm: poo 372M|nginx: work 773k 616k|php-fpm: pool2.0 6.00 0 |3401M 178M 3892M 398M|php-fpm: poo 372M|nginx: work 994k 688k|nginx: worker1.5 0 272 |3401M 178M 3892M 398M|php-fpm: poo 372M|nginx: work 388k 422k|php-fpm: pool1.5 0 0 |3400M 178M 3893M 398M|php-fpm: poo 372M|nginx: work 483k 548k|php-fpm: pool1.8 0 4.00 |3400M 178M 3893M 398M|php-fpm: poo 372M|php-fpm: po 787k 650B|php-fpm: pool1.5 0 12.0 |3400M 178M 3893M 398M|php-fpm: poo 372M|nginx: work 223k 323k|php-fpm: pool1.5 0 0 |3400M 178M 3893M 398M|php-fpm: poo 372M|nginx: work 371k 474k|php-fpm: pool7.8

htop

htop是top的增强版,更直观

[root@cc ~]# htop

  1  [|||||||||||                                                                    12.4%]
  2  [|||||||||                                                                       9.5%]
  3  [|                                                                               1.0%]
  4  [||                                                                              1.9%]
  Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||3394/7869MB]
  Swp[||||||||||||||                                                              75/478MB]
        
Tasks: 71, 12 thr; 2 running
Load average: 0.39 0.39 0.31 Uptime: 526 days(!), 17:36:38 PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 1 root 20 0 19232 396 248 S 0.0 0.0 0:01.86 /sbin/init 30752 root 20 0 52532 72 56 S 0.0 0.0 0:00.16 ├─ /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf 24301 root 20 0 193M 3268 1600 S 0.0 0.0 1:41.43 ├─ /usr/sbin/snmpd -LS0-6d -Lf /dev/null -p /var/run/snmpd.pid 21361 root 20 0 902M 6500 1308 S 0.0 0.1 0:07.16 ├─ php-fpm: master process (/etc/php-fpm.conf) 28627 www 20 0 962M 202M 138M S 0.0 2.6 0:34.46 │ ├─ php-fpm: pool www-c 27537 www 20 0 965M 236M 171M R 1.4 3.0 1:19.64 │ ├─ php-fpm: pool www-c 27449 www 20 0 961M 251M 189M S 0.0 3.2 1:35.54 │ ├─ php-fpm: pool www-a 26442 www 20 0 962M 280M 217M S 0.0 3.6 2:29.71 │ ├─ php-fpm: pool www-a 26310 www 20 0 917M 251M 234M S 1.9 3.2 2:46.45 │ ├─ php-fpm: pool www-a 26162 www 20 0 962M 297M 233M S 0.0 3.8 2:37.50 │ ├─ php-fpm: pool www-b 26147 www 20 0 924M 258M 233M S 0.0 3.3 2:38.37 │ ├─ php-fpm: pool www-c 25717 www 20 0 965M 302M 238M S 0.0 3.8 2:54.50 │ ├─ php-fpm: pool www-c 24585 www 20 0 964M 324M 260M S 0.0 4.1 4:15.20 │ ├─ php-fpm: pool www-b 

tcpdump

抓取mysql包分析,一般抓3306端口的数据

[root@cc ~]# tcpdump -i eth0 -A -s 3000 port 3306 > ~/sql.log

然后使用awk,sort,wc 等命令进行分析

或者

[root@cc ~]# tcpdump -i eth0 -s 0 -l -w - dst port 3306 | strings | perl -e '
while(<>) { chomp; next if /^[^ ]+[ ]*$/; if(/^(SELECT|UPDATE|DELETE|INSERT|SET|COMMIT|ROLLBACK|CREATE|DROP|ALTER)/i) { if (defined $q) { print "$qn"; } $q=$_; } else { $_ =~ s/^[ t]+//; $q.=" $_"; } }'

就可以看出最繁忙的sql语句了

strace

查看系统调用是否有问题,进程是否堵塞,是否有Broken pipe

[root@cc ~]# strace -p 26578

pt-query-digest

分析mysql慢日志,查看哪些sql语句最耗时

[root@cc ~]#  pt-query-digest slow.logs

# 390ms USER TIME, 10ms system TIME, 15.67M rss, 105.84M vsz  
# CURRENT DATE: Thu DEC 29 13:22:42 2014  
# Hostname: test # Files: slow.log # Overall: 776 total, 11 UNIQUE, 0.00 QPS, 0.00x concurrency _____________ # TIME range: 2011-09-10 04:03:19 TO 2011-12-29 05:02:51 # Attribute total MIN MAX avg 95% stddev median # ============ ======= ======= ======= ======= ======= ======= ======= # EXEC TIME 5657s 2s 33s 7s 23s 6s 5s # LOCK TIME 33s 0 19s 43ms 98us 715ms 38us # ROWS sent 323.38k 0 107.36k 426.73 0.99 6.35k 0 # ROWS examine 323.39k 0 107.36k 426.74 0 6.35k 0 # Query SIZE 217.95k 38 562 287.61 420.77 81.78 284.79 

show processlist

查看系统到底在干什么


mysql> show full processlist;
+-----------+---------------+---------------------+---------------------+---------+------+---------------+---------------------------+
| Id        | User          | Host                | db                  | Command | Time | State         | Info                      |
+-----------+---------------+---------------------+---------------------+---------+------+---------------+---------------------------+
| 184498848 | testdb_rr1356 | 10.11.211.120:61343 | testdb_rr1356_db121 | Sleep | 1384 | | NULL | | 184508740 | testdb_rr1356 | 10.11.211.120:11809 | testdb_rr1356_db121 | Sleep | 87 | | NULL | | 184509415 | testdb_rr1356 | 10.11.211.120:12760 | testdb_rr1356_db121 | Query | 0 | NULL | show full processlist | | 184509451 | testdb_rr1356 | 10.11.211.120:12804 | testdb_rr1356_db121 | Sleep | 10 | | NULL | | 184509528 | testdb_rr1356 | 10.11.211.120:12919 | testdb_rr1356_db121 | Query | 0 | freeing items | DESCRIBE test_channel |

检查mysql配置参数是否有问题,引起大量的IO或者高CPU操作

innodb_flush_log_at_trx_commit 、innodb_buffer_pool_size 、key_buffer_size 等重要参数

mysql> show variables like '%innodb%';
+---------------------------------+----------------------------+
| Variable_name                   | Value                      |
+---------------------------------+----------------------------+
| have_innodb                     | YES                        |
| ignore_builtin_innodb           | ON                         |
| innodb_adaptive_flushing        | ON                         |
| innodb_adaptive_hash_index      | ON                         |
| innodb_additional_mem_pool_size | 2097152                    |
| innodb_autoextend_increment     | 8 | | innodb_autoinc_lock_mode | 1 | | innodb_buffer_pool_size | 2013265920 | | innodb_change_buffering | inserts | | innodb_checksums | ON |

通过show engine innodb status查看当前事务,内存使用

mysql> show engine innodb status  \G

LATEST DETECTED DEADLOCK
------------------------
150731 10:36:50
*** (1) TRANSACTION: TRANSACTION EBFBBEC, ACTIVE 0 sec, process no 20691, OS thread id 47345217033984 inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 2 MySQL thread id 143249904, query id 1286731854 10.135.21.120 tybuser2014 update #此处具体sql省略 ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 2058485760; in additional pool allocated 0 Dictionary memory allocated 819282 Buffer pool size 122879 Free buffers 97599 Database pages 24313 Old database pages 8954 Modified db pages 7 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 6, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 1049, created 41853, written 30401604 0.00 reads/s, 0.00 creates/s, 1.75 writes/s Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000 Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 24313, unzip_LRU len: 0 I/O sum[45]:cur[0], unzip sum[0]:cur[0]

最后通过zabbix或者cacti等监控来查看IO、CPU、MEMORY、磁盘等是否有异常

这样基本上就可以把问题找出来了

仅供参考,当然,不同的DBA肯定还有其他更好的方式


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值