码字不易,转载请附原链,搬砖繁忙回复不及时见谅,技术交流请加QQ群:909211071
抛出问题
今天运营同学说我们mis后台的一个列表页打不开了,经过排查每一条 sql 语句,几条都是一秒多,sql 大概都一样,拿出来其中一条:
select count(*) as count from t_price where day_time >= '2020-02-08' and status = 2;
+--------+
| count |
+--------+
| 780711 |
+--------+
1 row in set (2.21 sec)
count(*) 看一下表里数据:
select count(*) from t_price;
+----------+
| count(*) |
+----------+
| 1815023 |
+----------+
1 row in set (0.26 sec)
才一百多万条数据,也不多啊,估计大概率是索引失效了,看一下联合索引的顺序:
CREATE TABLE `t_price` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
......
`created_time` int(11) NOT NULL DEFAULT '0' COMMENT '创建时间',
`updated_time` int(11) NOT NULL DEFAULT '0' COMMENT '更新时间',
PRIMARY KEY (`id`),
KEY `product` (`day_time`,`status`,`product_id`) USING BTREE,
KEY `market` (`day_time`,`status`,`market_id`) USING BTREE,
KEY `day_time` (`day_time`) USING BTREE,
KEY `status` (`status`) USING BTREE,
KEY `province_id` (`province_id`) USING BTREE,
KEY `customer_id` (`customer_id`) USING BTREE,
KEY `dn_market` (`day_time`,`status`,`dn_market_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1799748 DEFAULT CHARSET=utf8 COMMENT='行情价格' |
没有任何问题啊,用 explain 看一下执行计划:
explain select count(*) as count from t_price where day_time >= '2020-02-08' and status = 2;
+----+-------------+---------+------+------------------------------------------+--------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+------------------------------------------+--------+---------+-------+--------+-------------+
| 1 | SIMPLE | t_price | ref | product,market,day_time,status,dn_market | status | 1 | const | 839884 | Using where |
+----+-------------+---------+------+------------------------------------------+--------+---------+-------+--------+-------------+
1 row in set (0.01 sec)
发现问题了,实际使用的索引是 status,而我们的语句希望使用的是 product ,可是为什么它明明能够满足联合索引 product 的最左原则,却使用单列索引 status 呢?
原理分析
突然想到 《高性能MySQL》中的一句话:优化器会根据需要扫描的行数选择合适的索引,当行数相近时,优化器会选择开销最小的索引。那么何为开销最小呢?
首先记住一个公式:Cost = CPU Cost + IO Cost
- I/O成本:从磁盘到内存这个加载的过程损耗的时间称之为I/O成本
- CPU成本:读取以及检测记录是否满足对应到搜索条件、对结果进行排序等操作损耗的时间称之为CPU成本
页是磁盘和内存之间交互等基本单位,读取一个页花费的成本默认是1.0,读取及检测一条记录是否符合搜索条件的成本默认是0.2。
优化器的优化有两种,一种是基于成本的优化,另一种是基于索引统计数据的成本计算。
基于成本的优化
SQL执行的代价分为server层和engine层。server层主要是cpu的代价,而engine层主要是io的代价。5.7 引入了两个系统表mysql.server_cost和mysql.engine_cost来分别配置这两个层的代价,具体的值可以通过这两张表查询。
一条查询经历以下4个步骤:
- 根据搜索条件,找出所有可能使用的索引
- 计算全表扫描的代价
- 计算使用不同索引执行查询的代价
- 对比各种执行方案的代价,找出成本最低的那个
show table status like 't_price'\G;
*************************** 1. row ***************************
Name: t_price
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1679769
Avg_row_length: 631
Data_length: 1061158912
Max_data_length: 0
Index_length: 290586624
Data_free: 7340032
Auto_increment: 1827554
Create_time: 2020-06-18 15:25:19
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment: 行情价格
1 row in set (0.00 sec)
计算全表扫描的代价
Data_length = 聚簇索引的页面数量 * 每个页面的大小
聚簇索引的页面数量 = 1061158912 / 16 / 1024 = 64768
记录行数:1690802
Cost = (64768 * 1.0 + 1.1) + (1690802 * 0.2 + 1.0)
使用二级索引的成本分析
- where 查询:读取一个索引范围区间的 I/O 成本和读取一个页面是相同的
- in区间查询:in列表大小 * 1.0
Cost = (索引范围区间数 * 1.0 + 1.0) + (读取二级索引记录数 * 0.2 + 回表检测的行数 * 0.2)
使用覆盖索引的成本分析
Cost = 区间数 * 1.0 + (读取二级索引记录数 * 0.2)
引出一个问题:当 in 查询条件很多的时候怎么办,比如3000个,那明显会无限拉大 I/O 开销,这种情况下需要用下面的计算方法
基于索引统计数据的成本计算
一个值的重复次数 ≈ 记录行数(rows) / 使用的索引基数(Cardinality)
需要回表的记录数:in条件个数 * 一个值的重复次数
Cost = I/O开销(和基于成本的计算方式相同) + 需要回表的记录数 * 0.2
计算方式选择方式
show variables like '%dive%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 10 |
+---------------------------+-------+
1 row in set (0.01 sec)
当 in 中的条件个数大于该配置值时,则采取基于索引统计数据计算 CPU 开销,否则用基于成本的计算。
回归问题并解决
回到我们的问题中来,status 的索引长度远小于 product 索引长度,所以优化器认为 status 索引 I/O 开销比 product 索引要小,而最终返回的记录数都是相同,所以综合起来,status 索引开销更低,优化器分析的结果如下:
"considered_execution_plans": [
{
"plan_prefix": [
],
"table": "`t_price` FORCE INDEX (`product`)",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "range",
"rows": 826097,
"cost": 334857,
"chosen": true
}
]
},
"cost_for_plan": 334857,
"rows_for_plan": 826097,
"chosen": true
}
explain select count(*) as count from t_price force index (product) where day_time >= '2020-02-08' and status = 2;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | t_price | range | product | product | 36 | NULL | 839884 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)
explain select count(*) as count from t_price force index (product) where day_time >= '2020-02-08' and status = 2;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | t_price | range | product | product | 36 | NULL | 839884 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
我们可以看到 status 索引的查询 type 为 ref,cost 为 335949;product 索引的查询 type 为 range,cost 为340441,所以优化器选择了 status 索引。
优化很简单,加 hint 即可:
select count(*) as count from t_price force index (product) where day_time >= '2020-02-08' and status = 2;
+--------+
| count |
+--------+
| 780711 |
+--------+
1 row in set (0.50 sec)
多场景实验
优化完之后回想起之前也在网上查过索引失效的问题,不过文章之间都有相互矛盾的地方,本着“实验出真知”的原则,自己特意针对不同情况做了一些实验,将结果做个总结。
准备数据
CREATE TABLE `test` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL DEFAULT '',
`age` int(11) NOT NULL DEFAULT 0,
`nick` varchar(200) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT 0 COMMENT '状态',
PRIMARY KEY (`id`),
KEY `name` (`name`) USING BTREE,
KEY `age` (`age`) USING BTREE,
KEY `nick` (`nick`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
其中nick字段有索引,但是nick字段允许为NULL(用于验证判断null是否走索引的情况)
for ($i = 1; $i <= 30000; $i++){
$tmp = [
'name' => 'abc' . rand(0, 10000),
'age' => rand(0, 10000),
'status' => $i % 2,
];
if ($i % 5 != 0){
$tmp['nick'] = 'xyz' . rand(0, 10000);
}
$table->insert($tmp);
}
当字段允许为NULL时,is null和is not null
首先设置数据分布,这个数据比例一共测试了2次,这两次当查询 * 时都不会走索引,而只查询nick字段,也就是是满足覆盖索引条件时,可以走索引:
- 当 null 行数为5136时,查询 nick is not null 查询不走索引
- 当 null 行数为5148时,查询 nick is not null 查询不走索引
update test set nick = null where id >= 35150;
update test set nick = 'abc' where id < 35150;
mysql> select count(*) from test where nick is null;
+----------+
| count(*) |
+----------+
| 24852 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from test where nick is not null;
+----------+
| count(*) |
+----------+
| 5148 |
+----------+
1 row in set (0.03 sec)
查询所有字段时,索引失效:
mysql> explain select * from test where nick is not null;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | test | ALL | nick | NULL | NULL | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
改成只查询 nick 字段,试一下满足覆盖索引的情况:
mysql> explain select nick from test where nick is not null;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | range | nick | nick | 603 | NULL | 5146 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.02 sec)
我们再改变临界的一行,继续 select * 发现可以走索引了,结果如下:
update test set nick = null where id >= 35149;
update test set nick = 'abc' where id < 35149;
mysql> explain select * from test where nick is not null;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | test | ALL | nick | NULL | NULL | NULL | 29905 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.02 sec)
is null 在网上争议不大,所以我们不改变数据分布,直接测试 select * ... is null 的情况
mysql> explain select nick from test where nick is null;
+----+-------------+-------+------+---------------+------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+-------+--------------------------+
| 1 | SIMPLE | test | ref | nick | nick | 603 | const | 14952 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+-------+--------------------------+
1 row in set (0.01 sec)
所以结论:当满足普通索引时,is null 可以走索引,is not null 在对比cost后,也大概率会走索引,并不是网上大多数文章说的那样不走索引;而满足覆盖索引条件时,也会走索引。
当使用!=条件时(这里需要注意了啊,网上很多文章一口咬定!=不会走索引)
实验过程和上面一样,也是测试了2次,下面直接贴其中一个过程和结果:
update test set name = '' where id >= 35150;
update test set name = 'abc' where id < 35150;
mysql> select count(*) from test where name != '';
+----------+
| count(*) |
+----------+
| 5148 |
+----------+
1 row in set (0.04 sec)
mysql> select count(*) from test where name = '';
+----------+
| count(*) |
+----------+
| 24852 |
+----------+
1 row in set (0.01 sec)
mysql> explain select * from test where name != '';
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | test | ALL | name | NULL | NULL | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.02 sec)
mysql> explain select name from test where name != '';
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | range | name | name | 152 | NULL | 5148 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.01 sec)
结论:!= 可以走索引,!= 在对比cost后,也大概率会走索引,但并不是网上大多数文章说的那样不走索引;而满足覆盖索引条件时,也会走索引。
使用or的情况
当我们status字段没有索引的时候,发现是全表扫描
explain select * from test where age = 2644 or status = 1;
我们为status字段加上索引之后,发现用到了age和status两个索引
explain select * from test where age = 2644 or status = 1;
结论:or是否走索引取决于or前后的两个字段是否都建立了索引。
not in情况
not in 我们分2种情况测试,分别是使用主键索引和使用普通索引,每一个里边又分别测试了查询所有字段、不满足覆盖索引的字段,以及满足索引的字段。
mysql> explain select * from test where age not in (9837);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | test | ALL | age | NULL | NULL | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.03 sec)
mysql> explain select name from test where age not in (9837);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | test | ALL | age | NULL | NULL | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
mysql> explain select age from test where age not in (9837);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| 1 | SIMPLE | test | range | age | age | 4 | NULL | 15389 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.08 sec)
mysql> explain select * from test where id not in (9837);
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| 1 | SIMPLE | test | range | PRIMARY | PRIMARY | 4 | NULL | 14920 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.01 sec)
mysql> explain select name from test where id not in (9837);
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| 1 | SIMPLE | test | range | PRIMARY | PRIMARY | 4 | NULL | 14920 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.01 sec)
mysql> explain select age from test where id not in (9837);
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| 1 | SIMPLE | test | range | PRIMARY | PRIMARY | 4 | NULL | 14920 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.01 sec)
结论:使用主键索引查询任何字段都可以走索引,满足覆盖索引时可以走索引,这两种功能情况都是因为无需二次回表。
having情况
having 我们同样也分两种情况测试,一种使用主键索引,一种是普通索引。
使用普通索引
mysql> explain select name from test where age HAVING (2644);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
mysql> explain select age from test where age HAVING (2644);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| 1 | SIMPLE | test | index | NULL | age | 4 | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)
使用主键索引
mysql> explain select status from test where id HAVING (12345);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
mysql> explain select name from test where id HAVING (12345);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| 1 | SIMPLE | test | index | NULL | name | 152 | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)
mysql> explain select age from test where id HAVING (12345);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| 1 | SIMPLE | test | index | NULL | age | 4 | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)
mysql> explain select id from test where id HAVING (12345);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| 1 | SIMPLE | test | index | NULL | age | 4 | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)
结论:使用having时,当使用普通索引时,只有满足覆盖索引时才会走索引,否则不会走索引;当使用主键索引时,只有返回建立索引的字段时才会走索引,否则不会走索引。
其他通用情况
- 联合索引使用时没有遵循最左匹配原则索引会断开(使用一部分)或失效
- 当走索引返回的行数大于全表的80%(通用情况,不绝对)时,优化器是选择不走索引,具体还有看实际计算的cost
- 当用like左通配符时,索引失效,原因是违反最左原则
- 联合索引使用时没有遵循最左匹配原则索引会断开(使用一部分)或失效,但是会有特殊情况
结论
其实索引失效无非就是三种原因:
- 不满足最左原则断开
- 查询优化器分析 cost 后认为使用索引比全表扫描开销还要大
- 查询优化器错误了选择它认为开销更低的索引
我们在日常使用索引的时候遵循以下几点,基本上不会导致索引失效:
- 遵循最左原则
- 索引覆盖范围尽量大,避免优化器计算全表 cost 低于索引 cost
- 传入条件字段时保持和字段类型一致
- 不在字段上进行计算
纸上得来终觉浅,绝知此事要躬行,如果看完了对您有帮助,动动小手点个赞,关注一下,会持续更新技术文章!