阿里云RDS MySQL迁移到同地域同规格的RDS MySQL之后,性能明显不如迁移前的性能。最开始业务只是用户侧感觉系统卡顿,但服务器方面性能没有任何异常,故把问题指向数据库。以下是对这个问题的分析及处理。
排查步骤
- 查看内存、CPU使用情况
发现cpu使用率极高,经常出现尖刺
- 排查慢SQL
- 对业务sql进行分析,使用explain 查看sql执行计划
explain select id,is_delete from tmp where id = '17';
-- 输出结果
+--+------------+------+------------+------+---------------+------+---------+------+-----+----------+------+
|id| select_type| table| partitions | type | possible_keys | key | key_len | ref | rows| filtered | Extra|
+--+------------+------+------------+------+-------+------+-------+------+-------+----------+--------------+
|1 | SIMPLE | temp | NULL | ALL | NULL | NULL | NULL | NULL |3642858| 1 | Using where |
+--+------------+------+------------+------+-------+------+-------+------+-------+----------+--------------+
可以看到 key这一列是空,但是从理论上这个值不应该为null,因为该表的ID列存在一个索引
- 使用show index from tablename 查看这张表索引情况
show index from tmp\G
*************************** 1. row ***************************
Table: tmp
Non_unique: 1
Key_name: idx_id
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 0
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
可以发现索引的Cardinality值为0,这代表这这个索引是无效的。也就是说执行器在执行的时候不会使用这个索引。这个值越趋近1代表选择性越高,那么在执行语句时,使用该索引的可能性将变高。
- 使用 analyze 重新计算该值
analyze table temp;
+-----------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.temp | analyze | status | OK |
+-----------+---------+----------+----------+
-- 此时再查看索引信息
show index from tmp\G;
*************************** 1. row ***************************
Table: tmp
Non_unique: 1
Key_name: idx_id
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 1
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
-- 此时再查看解释计划,发现已经使用索引了
explain select id,is_delete from tmp where id = '17';
+---+------------+------+-----------+-----+--------------+---------+--------+------+------+--------+-------+
| id| select_type| table| partitions| type| possible_keys| key | key_len| ref | rows | filtered| Extra|
+---+------------+------+-----------+-----+--------------+---------+--------+------+------+--------+-------+
| 1 | SIMPLE | temp | NULL | ref | indx_id | index_id | 2683 | sonst| 1 | 1 | null |
+---+------------+------+-----------+-----+--------------+---------+--------+------+------+--------+-------+
- 不确定库中还有其他表统计信息缺失,使用以下语句查询缺少统计信息的表,,然后再统一执行 analyze
select * from mysql.innodb_index_stats where index_name = 'PRIMARY' and stat_name = 'n_diff_pfx01' and stat_value = 0
更新统计信息后,业务sql恢复正常。使用analyze不会同步到备库,故需要重搭备库。造成该现象原因:在进行dts操作过程中,源库进行了HA切换,导致某些表的统计信息还未进行持久化,姑迁到新库的索引统计信息为0,进而导致业务sql不走索引,最后导致cpu增高。
- CPU使用率还是比较高,时常出现“尖刺”
- 查看业务中的慢sql对慢sql进行分析
set profiling = 1 -- 开启qury profiling
select * from temp -- 执行慢sql
show profile -- 查看诊断id Duration:执行时间
+----------+------------+--------------------------------+
| Query_ID | Duration | Query |
+----------+------------+--------------------------------+
| 1 | 0.00028600 | select * from a where id = '1' |
| 2 | 0.00005300 | show profiling |
+----------+------------+--------------------------------+
show profile [cpu, block io] for query 1[Query_ID]; -- 查看对应的sql诊断结果,根据具体的信息进行分析
+----------------------+----------+----------+------------+--------------+---------------+
| Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out |
+----------------------+----------+----------+------------+--------------+---------------+
| starting | 0.000074 | 0.000063 | 0.000000 | 0 | 0 |
| checking permissions | 0.000007 | 0.000007 | 0.000000 | 0 | 0 |
| Opening tables | 0.000017 | 0.000017 | 0.000000 | 0 | 0 |
| init | 0.000025 | 0.000024 | 0.000000 | 0 | 0 |
| System lock | 0.000006 | 0.000006 | 0.000000 | 0 | 0 |
| optimizing | 0.000009 | 0.000008 | 0.000000 | 0 | 0 |
| statistics | 0.000051 | 0.000051 | 0.000000 | 0 | 0 |
| preparing | 0.000009 | 0.000009 | 0.000000 | 0 | 0 |
| executing | 0.000003 | 0.000003 | 0.000000 | 0 | 0 |
| Sending data | 0.000037 | 0.000036 | 0.000000 | 0 | 0 |
| end | 0.000004 | 0.000004 | 0.000000 | 0 | 0 |
| query end | 0.000006 | 0.000006 | 0.000000 | 0 | 0 |
| closing tables | 0.000006 | 0.000005 | 0.000000 | 0 | 0 |
| freeing items | 0.000022 | 0.000022 | 0.000000 | 0 | 0 |
| cleaning up | 0.000014 | 0.000014 | 0.000000 | 0 | 0 |
+----------------------+----------+----------+------------+--------------+---------------+
同时查看一下实例的cpu及物理读写(和内存相关),若读写较高,则调整 innodb_buffer_pool_size 参数,建议为内存的3/4
- cpu依旧偶尔存在“尖刺”
- 查看表碎片率
select table_name,(data_free/(data_length + index_length + data_free))*100 free from information_schema.tables where table_name = 'a' limit 1;
+------------+---------+
| table_name | free |
+------------+---------+
| a | 0.0000 |
+------------+---------+
若业务表碎片率较高的话,则使用optimize table 优化一下表
optimize table a;
+--------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------+----------+----------+-------------------------------------------------------------------+
| test.a | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| test.a | optimize | status | OK |
+--------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.05 sec)
-- 报错可忽略,只要status 返回 ok 即可
欢迎各位伙伴在评论、留言指出不足之处。
联系方式:mr_xuansu@163.com
更多内容请关注微信公众号:萱蘓的运维日常