mysql 覆盖索引

最新推荐文章于 2024-06-05 12:04:15 发布

Y仟仟

最新推荐文章于 2024-06-05 12:04:15 发布

阅读量992

点赞数

文章标签： mysql 数据库 database

本文链接：https://blog.csdn.net/weixin_38410177/article/details/121748487

版权

辅助索引

辅助索引也称为二级索引，索引中除了存储索引列外，还存储了主键id，对于user_name的索引idx_user_name(user_name)而言，其实等价于idx_user_name(user_name, id)，MySQL会自动在辅助索引的最后添加上主键id

索引回表

上面证明了辅助索引包含主键id，如果通过辅助索引列去过滤数据有可能需要回表，举个例子：业务需要通过用户名user_name去查询用户表users的信息，业务接口对应的SQL：

select  user_id, user_name, phone from users where user_name = 'Laaa';
复制代码

我们知道，对于索引idx_user_name而言，其实就是一个小表idx_user_name(user_name, id)，如果只查询索引中的列，只需要扫描索引就能获取到所需数据，是不需要回表的，如下SQL语句：

SQL 1: select id, user_name from users where user_name = 'Laaa';

SQL 2: select id from users where user_name = 'Laaa';

mysql> explain select id, name from users where name = 'Laaa';
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
| id | select_type | table | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
|  1 | SIMPLE      | users | NULL       | ref  | idx_user_name | idx_user_name | 82      | const |    1 |   100.00 | Using index |
mysql> explain select id from users where name = 'Laaa';
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
| id | select_type | table | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
|  1 | SIMPLE      | users | NULL       | ref  | idx_user_name | idx_user_name | 82      | const |    1 |   100.00 | Using index |
复制代码

SQL 1和SQL 2的执行计划中的Extra=Using index 表示使用覆盖索引扫描，不需要回表，再来看上面的业务SQL：

select user_id, user_name, phone from users where user_name = 'Laaa';

可以看到select后面的user_id，phone列不在索引idx_user_name中，就需要通过主键id进行回表查找，MySQL内部分如下两个阶段处理：

Section 1： select **id** from users where user_name = 'Laaa' //id = 100101

Section 2: select user_id, user_name, phone from users where id = 100101;

将Section 2的操作称为回表，即通过辅助索引中的主键id去原表中查找数据。

索引优化案例

分页查询优化

业务要根据时间范围查询交易记录，接口原始的SQL如下：

select  * from trade_info where status = 0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59' order by id desc limit 102120, 20;
复制代码

表trade_info上有索引idx_status_create_time(status,create_time)，通过上面分析知道，等价于索引**（status,create_time,id)**，对于典型的分页limit m, n来说，越往后翻页越慢，也就是m越大会越慢，因为要定位m位置需要扫描的数据越来越多，导致IO开销比较大，这里可以利用辅助索引的覆盖扫描来进行优化，先获取id，这一步就是索引覆盖扫描，不需要回表，然后通过id跟原表trade_info进行关联，改写后的SQL如下：

select * from trade_info a ,

(select  id from trade_info where status = 0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59' order by id desc limit 102120, 20) as b   //这一步走的是索引覆盖扫描，不需要回表
 where a.id = b.id;
复制代码

很多同学只知道这样写效率高，但是未必知道为什么要这样改写，理解索引特性对编写高质量的SQL尤为重要。

分而治之总是不错的

营销系统有一批过期的优惠卷要失效，核心SQL如下：

-- 需要更新的数据量500w
update coupons set status = 1 where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59';
复制代码

在Oracle里更新500w数据是很快，因为可以利用多个cpu core去执行，但是MySQL就需要注意了，一个SQL只能使用一个cpu core去处理，如果SQL很复杂或执行很慢，就会阻塞后面的SQL请求，造成活动连接数暴增，MySQL CPU 100%，相应的接口Timeout，同时对于主从复制架构，而且做了业务读写分离，更新500w数据需要5分钟，Master上执行了5分钟，binlog传到了slave也需要执行5分钟，那就是Slave延迟5分钟，在这期间会造成业务脏数据，比如重复下单等。

优化思路：先获取where条件中的最小id和最大id，然后分批次去更新，每个批次1000条，这样既能快速完成更新，又能保证主从复制不会出现延迟。

优化如下：

先获取要更新的数据范围内的最小id和最大id（表没有物理delete，所以id是连续的）

mysql> explain select min(id) min_id, max(id) max_id from coupons where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59'; 
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
| id | select_type | table | partitions | type  | possible_keys          | key                    | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
|  1 | SIMPLE      | users | NULL       | range | idx_status_create_time | idx_status_create_time | 6       | NULL | 180300 |   100.00 | Using where; Using index |
复制代码

Extra=Using where; Using index使用了索引idx_status_create_time，同时需要的数据都在索引中能找到，所以不需要回表查询数据。

以每次1000条commit一次进行循环update，主要代码如下：

current_id = min_id;
for  current_id < max_id do
  update coupons set status = 1 where id >=current_id and id <= current_id + 1000;  //通过主键id更新1000条很快
commit;
current_id += 1000;
done
复制代码

这两个案例告诉我们，要充分利用辅助索引包含主键id的特性，先通过索引获取主键id走覆盖索引扫描，不需要回表，然后再通过id去关联操作是高效的，同时根据MySQL的特性使用分而治之的思想既能高效完成操作，又能避免主从复制延迟产生的业务数据混乱。

复合索引

在单列索引不能很好的过滤数据的时候，可以结合where条件中其他字段来创建复合索引，更好的去过滤数据，减少IO的扫描次数，举个例子：业务需要按照时间段来查询交易记录，有如下的SQL：

select  * from trade_info where status = 1 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59';
复制代码

开发同学根据以往复合索引的设计的经验：唯一值多选择性好的列作为复合索引的前导列，所以创建复合索idx_create_time_status是高效的，因为create_time是一秒一个值，唯一值很多，选择性很好，而status只有离散的6个值，所以认为这样创建是没问题的，但是这个经验只适合于等值条件过滤，不适合有范围条件过滤的情况，例如idx_user_id_status(user_id，status)这个是没问题的，但是对于包含有create_time范围的复合索引来说，就不适应了，我们来看下这两种不同索引顺序的差异，即idx_status_create_time和idx_create_time_status。

-- 分别创建两种不同的复合索引
mysql> create index idx_status_create_time on trade_info(status, create_time);
mysql> create index idx_create_time_status on trade_info(create_time,status);
-- 查看SQL的执行计划
mysql> explain select * from users where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59';
+----+-------------+-------+------------+-------+-----------------------------------------------+---------------
| id | select_type | table | partitions | type  | possible_keys                                 | key                    | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+-------+------------+-------+-----------------------------------------------+---------------
|  1 | SIMPLE      | trade_info | NULL       | range | idx_status_create_time,idx_create_time_status | idx_status_create_time | 6       | NULL | 98518 |   100.00 | Using index condition |
复制代码

从执行计划可以看到，两种不同顺序的复合索引都存在的情况，MySQL优化器选择的是idx_status_create_time索引，那为什么不选择idx_create_time_status，我们通过optimizer_trace来跟踪优化器的选择。

-- 开启optimizer_trace跟踪
mysql> set session optimizer_trace="enabled=on",end_markers_in_json=on;
-- 执行SQL语句
mysql> select * from trade_info where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59';
-- 查看跟踪结果
mysql>SELECT trace FROM information_schema.OPTIMIZER_TRACE\G;
复制代码

对比下两个索引的统计数据，如下所示：

复合索引	Type	Rows	参与过滤索引列	Chosen	Cause
idx_status_create_time	Index Range Scan	98518	status AND create_time	True	Cost低
idx_create_time_status	Index Range Scan	98518	create_time	False	Cost高

MySQL优化器是基于Cost的，COST主要包括IO_COST和CPU_COST，MySQL的CBO（Cost-Based Optimizer基于成本的优化器）总是选择Cost最小的作为最终的执行计划去执行，从上面的分析，CBO选择的是复合索引idx_status_create_time，因为该索引中的status和create_time都能参与了数据过滤，成本较低；而idx_create_time_status只有create_time参数数据过滤，status被忽略了，其实CBO将其简化为单列索引idx_create_time，选择性没有复合索引idx_status_create_time好。

复合索引设计原则

将范围查询的列放在复合索引的最后面，例如idx_status_create_time。
列过滤的频繁越高，选择性越好，应该作为复合索引的前导列，适用于等值查找，例如idx_user_id_status。

这两个原则不是矛盾的，而是相辅相成的。

Y仟仟

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mysql 覆盖索引

辅助索引辅助索引也称为二级索引，索引中除了存储索引列外，还存储了主键id，对于user_name的索引idx_user_name(user_name)而言，其实等价于idx_user_name(user_name, id)，MySQL会自动在辅助索引的最后添加上主键id索引回表上面证明了辅助索引包含主键id，如果通过辅助索引列去过滤数据有可能需要回表，举个例子：业务需要通过用户名user_name去查询用户表users的信息，业务接口对应的SQL：select user_id, us
复制链接

扫一扫