MySQL查询开销计算原理和索引失效原因

码字不易,转载请附原链,搬砖繁忙回复不及时见谅,技术交流请加QQ群:909211071

抛出问题

今天运营同学说我们mis后台的一个列表页打不开了,经过排查每一条 sql 语句,几条都是一秒多,sql 大概都一样,拿出来其中一条:

select count(*) as count from t_price where day_time >= '2020-02-08' and status = 2;
+--------+
| count  |
+--------+
| 780711 |
+--------+
1 row in set (2.21 sec)

count(*) 看一下表里数据:

select count(*) from t_price;
+----------+
| count(*) |
+----------+
|  1815023 |
+----------+
1 row in set (0.26 sec)

才一百多万条数据,也不多啊,估计大概率是索引失效了,看一下联合索引的顺序:

CREATE TABLE `t_price` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  ......
  `created_time` int(11) NOT NULL DEFAULT '0' COMMENT '创建时间',
  `updated_time` int(11) NOT NULL DEFAULT '0' COMMENT '更新时间',
  PRIMARY KEY (`id`),
  KEY `product` (`day_time`,`status`,`product_id`) USING BTREE,
  KEY `market` (`day_time`,`status`,`market_id`) USING BTREE,
  KEY `day_time` (`day_time`) USING BTREE,
  KEY `status` (`status`) USING BTREE,
  KEY `province_id` (`province_id`) USING BTREE,
  KEY `customer_id` (`customer_id`) USING BTREE,
  KEY `dn_market` (`day_time`,`status`,`dn_market_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1799748 DEFAULT CHARSET=utf8 COMMENT='行情价格' |

没有任何问题啊,用 explain 看一下执行计划:

explain select count(*) as count from t_price where day_time >= '2020-02-08' and status = 2;
+----+-------------+---------+------+------------------------------------------+--------+---------+-------+--------+-------------+
| id | select_type | table   | type | possible_keys                            | key    | key_len | ref   | rows   | Extra       |
+----+-------------+---------+------+------------------------------------------+--------+---------+-------+--------+-------------+
|  1 | SIMPLE      | t_price | ref  | product,market,day_time,status,dn_market | status | 1       | const | 839884 | Using where |
+----+-------------+---------+------+------------------------------------------+--------+---------+-------+--------+-------------+
1 row in set (0.01 sec)

发现问题了,实际使用的索引是 status,而我们的语句希望使用的是 product ,可是为什么它明明能够满足联合索引 product 的最左原则,却使用单列索引 status 呢?

原理分析


突然想到 《高性能MySQL》中的一句话:优化器会根据需要扫描的行数选择合适的索引,当行数相近时,优化器会选择开销最小的索引。那么何为开销最小呢?

首先记住一个公式:Cost = CPU Cost + IO Cost

  • I/O成本:从磁盘到内存这个加载的过程损耗的时间称之为I/O成本
  • CPU成本:读取以及检测记录是否满足对应到搜索条件、对结果进行排序等操作损耗的时间称之为CPU成本

页是磁盘和内存之间交互等基本单位,读取一个页花费的成本默认是1.0,读取及检测一条记录是否符合搜索条件的成本默认是0.2。

优化器的优化有两种,一种是基于成本的优化,另一种是基于索引统计数据的成本计算。

基于成本的优化

SQL执行的代价分为server层和engine层。server层主要是cpu的代价,而engine层主要是io的代价。5.7 引入了两个系统表mysql.server_cost和mysql.engine_cost来分别配置这两个层的代价,具体的值可以通过这两张表查询。

一条查询经历以下4个步骤:

  1. 根据搜索条件,找出所有可能使用的索引
  2. 计算全表扫描的代价
  3. 计算使用不同索引执行查询的代价
  4. 对比各种执行方案的代价,找出成本最低的那个
show table status like 't_price'\G;
*************************** 1. row ***************************
           Name: t_price
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 1679769
 Avg_row_length: 631
    Data_length: 1061158912
Max_data_length: 0
   Index_length: 290586624
      Data_free: 7340032
 Auto_increment: 1827554
    Create_time: 2020-06-18 15:25:19
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment: 行情价格
1 row in set (0.00 sec)

计算全表扫描的代价

Data_length = 聚簇索引的页面数量 * 每个页面的大小

聚簇索引的页面数量 = 1061158912 / 16 / 1024 = 64768

记录行数:1690802

Cost = (64768 * 1.0 + 1.1) + (1690802 * 0.2 + 1.0)

使用二级索引的成本分析

  • where 查询:读取一个索引范围区间的  I/O 成本和读取一个页面是相同的
  • in区间查询:in列表大小 * 1.0

Cost = (索引范围区间数 * 1.0 + 1.0) + (读取二级索引记录数 * 0.2 + 回表检测的行数 * 0.2) 

使用覆盖索引的成本分析

Cost = 区间数 * 1.0 + (读取二级索引记录数 * 0.2)

引出一个问题:当 in 查询条件很多的时候怎么办,比如3000个,那明显会无限拉大 I/O 开销,这种情况下需要用下面的计算方法

基于索引统计数据的成本计算

一个值的重复次数 ≈ 记录行数(rows) / 使用的索引基数(Cardinality)

需要回表的记录数:in条件个数 * 一个值的重复次数

Cost = I/O开销(和基于成本的计算方式相同) + 需要回表的记录数 * 0.2

计算方式选择方式

show variables like '%dive%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 10    |
+---------------------------+-------+
1 row in set (0.01 sec)

当 in 中的条件个数大于该配置值时,则采取基于索引统计数据计算 CPU 开销,否则用基于成本的计算。

 

回归问题并解决

回到我们的问题中来,status 的索引长度远小于 product 索引长度,所以优化器认为 status 索引 I/O 开销比 product 索引要小,而最终返回的记录数都是相同,所以综合起来,status 索引开销更低,优化器分析的结果如下:

              "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`t_price` FORCE INDEX (`product`)",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "range",
                      "rows": 826097,
                      "cost": 334857,
                      "chosen": true
                    }
                  ]
                },
                "cost_for_plan": 334857,
                "rows_for_plan": 826097,
                "chosen": true
              }
explain select count(*) as count from t_price force index (product) where day_time >= '2020-02-08' and status = 2;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | t_price | range | product       | product | 36      | NULL | 839884 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

explain select count(*) as count from t_price force index (product) where day_time >= '2020-02-08' and status = 2;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | t_price | range | product       | product | 36      | NULL | 839884 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+

我们可以看到 status 索引的查询 type 为 ref,cost 为 335949;product 索引的查询 type 为 range,cost 为340441,所以优化器选择了 status 索引。

优化很简单,加 hint 即可:

select count(*) as count from t_price force index (product) where day_time >= '2020-02-08' and status = 2;
+--------+
| count  |
+--------+
| 780711 |
+--------+
1 row in set (0.50 sec)

多场景实验

优化完之后回想起之前也在网上查过索引失效的问题,不过文章之间都有相互矛盾的地方,本着“实验出真知”的原则,自己特意针对不同情况做了一些实验,将结果做个总结。

准备数据

CREATE TABLE `test` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL DEFAULT '',
  `age` int(11) NOT NULL DEFAULT 0,
  `nick` varchar(200) DEFAULT NULL,
  `status` int(11) NOT NULL DEFAULT 0 COMMENT '状态',
  PRIMARY KEY (`id`),
  KEY `name` (`name`) USING BTREE,
  KEY `age` (`age`) USING BTREE,
  KEY `nick` (`nick`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8

其中nick字段有索引,但是nick字段允许为NULL(用于验证判断null是否走索引的情况)

        for ($i = 1; $i <= 30000; $i++){
            $tmp = [
                'name' => 'abc' . rand(0, 10000),
                'age' => rand(0, 10000),
                'status' => $i % 2,
            ];
            if ($i % 5 != 0){
                $tmp['nick'] = 'xyz' . rand(0, 10000);
            }
            $table->insert($tmp);
        }

 

当字段允许为NULL时,is null和is not null

首先设置数据分布,这个数据比例一共测试了2次,这两次当查询 * 时都不会走索引,而只查询nick字段,也就是是满足覆盖索引条件时,可以走索引:

  1. 当 null 行数为5136时,查询 nick is not null 查询不走索引
  2. 当 null 行数为5148时,查询 nick is not null 查询不走索引
update test set nick = null where id >= 35150;
update test set nick = 'abc' where id < 35150; 
mysql> select count(*) from test where nick is null;  
+----------+
| count(*) |
+----------+
|    24852 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from test where nick is not null;
+----------+
| count(*) |
+----------+
|     5148 |
+----------+
1 row in set (0.03 sec)

查询所有字段时,索引失效:

mysql> explain select * from test where nick is not null;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | ALL  | nick          | NULL | NULL    | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)

改成只查询 nick 字段,试一下满足覆盖索引的情况:

mysql> explain select nick from test where nick is not null;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | test  | range | nick          | nick | 603     | NULL | 5146 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.02 sec)

我们再改变临界的一行,继续 select * 发现可以走索引了,结果如下:

update test set nick = null where id >= 35149;
update test set nick = 'abc' where id < 35149;
mysql> explain select * from test where nick is not null;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | ALL  | nick          | NULL | NULL    | NULL | 29905 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.02 sec)

is null 在网上争议不大,所以我们不改变数据分布,直接测试 select *  ... is null 的情况

mysql> explain select nick from test where nick is null;
+----+-------------+-------+------+---------------+------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows  | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+-------+--------------------------+
|  1 | SIMPLE      | test  | ref  | nick          | nick | 603     | const | 14952 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+-------+--------------------------+
1 row in set (0.01 sec)

所以结论:当满足普通索引时,is null 可以走索引,is not null 在对比cost后,也大概率会走索引,并不是网上大多数文章说的那样不走索引;而满足覆盖索引条件时,也会走索引。

 

当使用!=条件时(这里需要注意了啊,网上很多文章一口咬定!=不会走索引)

实验过程和上面一样,也是测试了2次,下面直接贴其中一个过程和结果:

update test set name = '' where id >= 35150;
update test set name = 'abc' where id < 35150; 
mysql> select count(*) from test where name != ''; 
+----------+
| count(*) |
+----------+
|     5148 |
+----------+
1 row in set (0.04 sec)

mysql> select count(*) from test where name = '';
+----------+
| count(*) |
+----------+
|    24852 |
+----------+
1 row in set (0.01 sec)

 

mysql> explain select * from test where name != '';
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | ALL  | name          | NULL | NULL    | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.02 sec)
mysql> explain select name from test where name != '';
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | test  | range | name          | name | 152     | NULL | 5148 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.01 sec)

 

结论:!= 可以走索引,!= 在对比cost后,也大概率会走索引,但并不是网上大多数文章说的那样不走索引;而满足覆盖索引条件时,也会走索引。

 

使用or的情况

当我们status字段没有索引的时候,发现是全表扫描

explain select * from test where age = 2644 or status = 1;

我们为status字段加上索引之后,发现用到了age和status两个索引

explain select * from test where age = 2644 or status = 1;

结论:or是否走索引取决于or前后的两个字段是否都建立了索引。

 

not in情况

not in 我们分2种情况测试,分别是使用主键索引和使用普通索引,每一个里边又分别测试了查询所有字段、不满足覆盖索引的字段,以及满足索引的字段。

mysql> explain select * from test where age not in (9837);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | ALL  | age           | NULL | NULL    | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.03 sec)

mysql> explain select name from test where age not in (9837);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | ALL  | age           | NULL | NULL    | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)

mysql> explain select age from test where age not in (9837);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | test  | range | age           | age  | 4       | NULL | 15389 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.08 sec)

 

mysql> explain select * from test where id not in (9837);
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | range | PRIMARY       | PRIMARY | 4       | NULL | 14920 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.01 sec)

mysql> explain select name from test where id not in (9837);
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | range | PRIMARY       | PRIMARY | 4       | NULL | 14920 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.01 sec)

mysql> explain select age from test where id not in (9837);
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | range | PRIMARY       | PRIMARY | 4       | NULL | 14920 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.01 sec)

结论:使用主键索引查询任何字段都可以走索引,满足覆盖索引时可以走索引,这两种功能情况都是因为无需二次回表。

 

having情况

having 我们同样也分两种情况测试,一种使用主键索引,一种是普通索引。

使用普通索引

mysql> explain select name from test where age HAVING (2644);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | ALL  | NULL          | NULL | NULL    | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)

mysql> explain select age from test where age HAVING (2644);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | test  | index | NULL          | age  | 4       | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)

 

使用主键索引

mysql> explain select status from test where id HAVING (12345);
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | test  | ALL  | NULL          | NULL | NULL    | NULL | 29839 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)

mysql> explain select name from test where id HAVING (12345);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | test  | index | NULL          | name | 152     | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)

mysql> explain select age from test where id HAVING (12345);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | test  | index | NULL          | age  | 4       | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)

mysql> explain select id from test where id HAVING (12345);
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | test  | index | NULL          | age  | 4       | NULL | 29839 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
1 row in set (0.01 sec)

结论:使用having时,当使用普通索引时,只有满足覆盖索引时才会走索引,否则不会走索引;当使用主键索引时,只有返回建立索引的字段时才会走索引,否则不会走索引。

其他通用情况

  1. 联合索引使用时没有遵循最左匹配原则索引会断开(使用一部分)或失效
  2. 当走索引返回的行数大于全表的80%(通用情况,不绝对)时,优化器是选择不走索引,具体还有看实际计算的cost
  3. 当用like左通配符时,索引失效,原因是违反最左原则
  4. 联合索引使用时没有遵循最左匹配原则索引会断开(使用一部分)或失效,但是会有特殊情况

结论

其实索引失效无非就是三种原因:

  1. 不满足最左原则断开
  2. 查询优化器分析 cost 后认为使用索引比全表扫描开销还要大
  3. 查询优化器错误了选择它认为开销更低的索引

我们在日常使用索引的时候遵循以下几点,基本上不会导致索引失效:

  1. 遵循最左原则
  2. 索引覆盖范围尽量大,避免优化器计算全表 cost 低于索引 cost
  3. 传入条件字段时保持和字段类型一致
  4. 不在字段上进行计算

 

纸上得来终觉浅,绝知此事要躬行,如果看完了对您有帮助,动动小手点个赞,关注一下,会持续更新技术文章!

 

 

评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

AirGo.

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值