异常现象
大量的慢SQL导致8核CPU全部打满,解决前后CPU如图
image.png
实例基础信息:
数据库版本:MariaDB10.0.27
硬件信息:8核cpu+16G内存+500GSSD
数据库中数据量> innodb buffer pool配置
表结构
CREATE TABLE `tablename` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`gid` varchar(36) NOT NULL COMMENT '全局标识',
`create_time` int(11) NOT NULL COMMENT '创建时间',
`update_time` int(11) NOT NULL COMMENT '更新时间',
`biz_type` varchar(16) DEFAULT NULL COMMENT '业务号',
`biz_chnl` int(11) NOT NULL DEFAULT '-1' COMMENT '业务渠道编号',
`user_gid` char(36) NOT NULL COMMENT '用户全局唯一标识',
`trans_gid` char(36) NOT NULL COMMENT '交易唯一标识,消息类型与gid关系为:0:额度变更记录gid, 1:借款gid, 2:还款gid',
`msg_type` int(11) NOT NULL COMMENT '消息类型:0:用户额度变更,1:借款状态变更,2:还款状态变更',
`msg_status` int(11) NOT NULL COMMENT '消息状态:0:处理中,1:处理成功,2:处理失败',
`msg_content` varchar(4096) NOT NULL COMMENT '消息内容JSON格式',
`return_code` varchar(16) DEFAULT NULL COMMENT '返回状态码',
`return_desc` varchar(4000) DEFAULT NULL COMMENT '状态描述',
`trans_return_code` varchar(16) DEFAULT NULL COMMENT '交易状态码',
`trans_return_desc` varchar(255) DEFAULT NULL COMMENT '交易结果描述',
`retry_next_time` int(11) NOT NULL DEFAULT '0' COMMENT '下次重试时间',
`retry_num` int(11) NOT NULL DEFAULT '0' COMMENT '重试次数',
`status` int(11) NOT NULL DEFAULT '0' COMMENT '发送状态:0 准备就绪,1 发送成功, 2 发送失败,3 结果未知',
`is_valid` bit(1) NOT NULL DEFAULT b'1' COMMENT '是否有效',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_gid` (`gid`),
KEY `idx_create_time` (`create_time`),
KEY `idx_update_time` (`update_time`),
KEY `idx_status` (`status`),
KEY `idx_bizt_chnl` (`biz_type`,`biz_chnl`),
KEY `idx_return_trans_code` (`return_code`,`trans_return_code`),
KEY `idx_trans_gid` (`trans_gid`) USING BTREE,
KEY `idx_user_gid` (`user_gid`)
) ENGINE=InnoDB AUTO_INCREMENT=9630777 DEFAULT CHARSET=utf8 COMMENT='x x x x'
表数据量
MariaDB [sailfish]> select count(*) from tablename;
+----------+
| count(*) |
+----------+
| 9630917 |
+----------+
修改SQL
-开发原始SQL
MariaDB [sailfish]> SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id limit 500;
Empty set (15.40 sec)
MariaDB [sailfish]> explain SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id limit 500;
+------+-------------+--------------------+-------+--------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+-------+--------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | tablename | index | idx_status,idx_bizt_chnl | PRIMARY | 4 | NULL | 6578 | Using where |
+------+-------------+--------------------+-------+--------------------------+---------+---------+------+------+-------------+
-修改SQL去掉limit 500
MariaDB [sailfish]> SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id;
Empty set (1.86 sec)
MariaDB [sailfish]> explain SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id;
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+----------------------------------------------------+
| 1 | SIMPLE | tablename | ref | idx_status,idx_bizt_chnl | idx_bizt_chnl | 51 | const | 708040 | Using index condition; Using where; Using filesort |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+----------------------------------------------------+
修改SQL去掉order by limit 500(第一次解决方案,刚开始有效果,一周后查询时间又逐渐增长,cpu又打满,和之前一样)
MariaDB [sailfish]> SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 );
Empty set (2.18 sec)
MariaDB [sailfish]> explain SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 );
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+------------------------------------+
| 1 | SIMPLE | tablename | ref | idx_status,idx_bizt_chnl | idx_bizt_chnl | 51 | const | 708040 | Using index condition; Using where |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+------------------------------------+
索引更改
根据where条件增加组合索引前
MariaDB [test]> SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
Empty set (13.34 sec)
MariaDB [test]> explain SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
+------+-------------+--------------------+------+----------------------------------------------+---------------+---------+-------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+----------------------------------------------+---------------+---------+-------+--------+------------------------------------+
| 1 | SIMPLE | tablename | ref | idx_status,idx_bizt_chnl,idx_retry_next_time | idx_bizt_chnl | 51 | const | 770126 | Using index condition; Using where |
+------+-------------+--------------------+------+----------------------------------------------+---------------+---------+-------+--------+------------------------------------+
1 row in set (0.06 sec)
根据where条件增加组合索引后
pt-online-schema-change -uroot -pxxxxxx --charset=utf8 --alter="add index idx_multi(`status`,`biz_type`)" --dry-run --nocheck-replication-filters --recursion-method=none --print D=sailfish,t= tablename
MariaDB [test]> SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
Empty set (0.06 sec)
MariaDB [test]> explain SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
+------+-------------+--------------------+------+--------------------------------------------------------+-----------+---------+-------------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+--------------------------------------------------------+-----------+---------+-------------+------+------------------------------------+
| 1 | SIMPLE | tablename | ref | idx_status,idx_bizt_chnl,idx_retry_next_time,idx_multi | idx_multi | 55 | const,const | 1 | Using index condition; Using where |
+------+-------------+--------------------+------+--------------------------------------------------------+-----------+---------+-------------+------+------------------------------------+
1 row in set (0.01 sec)
增加索引时遇到的问题
使用pt-online-schema-change更改表结构时,因为当时有40多条慢查询跑着,pt-online-schema-change被阻塞,等待2分钟效果很差,中建表只有几百条数据,停止pt-online-schema-change并删除中间表、触发器
与开发沟通这40多条慢SQL可以临时注释,结合开发、运维注释掉该功能
使用pt-online-schema-change更改表结构,用时10分钟,重新恢复业务,cpu趋于稳定,如第一张图
定位问题使用的工具
pmm监控慢SQL、用户cpu、mysql user statics
mariadb server audit审计日志
elk显示每秒慢SQL查询量
linux:top
mysql 信息统计:select * from information_schema.processlist where command='query' order by time\G