mysql5.7子查询优化_通过MySQL5.7子查询的坑联想到的MySQL查询优化规律与注意点

最新推荐文章于 2021-11-15 16:26:52 发布

电路和微电子考研

最新推荐文章于 2021-11-15 16:26:52 发布

阅读量281

点赞数

文章标签： mysql5.7子查询优化

本文链接：https://blog.csdn.net/weixin_36240219/article/details/113902780

版权

本文探讨了在MySQL 5.7中遇到的子查询性能问题，详细分析了一个导致数据库CPU升高的查询语句，并解释了MySQL 5.7对子查询的优化方式，包括半连接优化。文章通过实例展示了如何通过调整查询结构来提高效率，指出在某些情况下，使用DISTINCT和JOIN可能比子查询更优。

摘要由CSDN通过智能技术生成

本文基于MySQL 5.7

问题背景

最近，发现业务线程卡死在读取数据库(MySQL 5.7)，数据库CPU飙升到了100%，经过定位发现是下面这样一个SQL(由用户触发的查询，可以理解为查看这个群组的所有用户买过哪些商品)导致的：

select * from merchandise where id in (

select merchandise_id from indent where customer_id in (

select id from customer where name in("zhx","ypy")

)

);

虽然听说MySQL5.6开始已经对于子查询做了很多查询优化，但是看来还是不够啊

MySQL历史追溯

MySQL针对子查询的优化一直不够好，在5.5版本及以前，根据《高性能MySQL》这本书说的，MySQL针对子查询的优化是优化为 exists 的方式来执行，变成逐条记录进行遍历：

在5.7版本，优化并不是这样简单的优化了：而是转换为临时表加半连接优化。

举例分析与半连接优化的局限

假设我们有三张表：

用户表：

drop table if exists customer;

create table if not exists customer(

id varchar(64) primary key,

name varchar(64) unique key,

address text

);

商品表：

drop table if exists merchandise;

create table if not exists merchandise(

id varchar(64) primary key,

name varchar(64) unique key,

description text

);

订单表：

drop table if exists indent;

create table if not exists indent (

id varchar(64) primary key,

customer_id varchar(64),

merchandise_id varchar(64),

key (customer_id, merchandise_id)

);

用户和商品是一种多对多的关系。

假设他们的数据如下：

mysql> select * from customer;

+----+------+--------------------------------+

| id | name | address |

+----+------+--------------------------------+

| 1 | zhx | Xueyuannanlu, Haidian, Beijing |

| 2 | ypy | Zhichunlu, Haidian, Beijing |

| 3 | tt | Dog House, Haidian, Beijing |

+----+------+--------------------------------+

3 rows in set

mysql> select * from merchandise

;

+----+------------+-------------+

| id | name | description |

+----+------------+-------------+

| 1 | Computer | Lenovo |

| 2 | Perfume | Dior |

| 3 | Chocolate | Dove |

| 4 | YummyBones | Pedigree |

| 5 | IPhone | Apple |

+----+------------+-------------+

5 rows in set

mysql> select * from indent

;

+----+-------------+----------------+

| id | customer_id | merchandise_id |

+----+-------------+----------------+

| 1 | 1 | 1 |

| 5 | 1 | 2 |

| 2 | 2 | 2 |

| 3 | 2 | 3 |

| 4 | 3 | 4 |

+----+-------------+----------------+

5 rows in set

假设我们想获取用户"zhx"购买过的所有商品，通过子查询应该是这样实现：

select * from merchandise where id in (

select merchandise_id from indent where customer_id in (

select id from customer where name in ("zhx")

)

);

那这个的查询计划是什么样呢？

通过如下两个命令获取：

explain select distinct merchandise.id, merchandise.name, merchandise.description from merchandise

join indent on merchandise.id = indent.merchandise_id

join customer on indent.customer_id = customer.id

where customer.name in ("zhx");

show warnings;

结果是：

show warnings;拆解出来的执行SQL伪代码是：

/* select#1 */ select `test`.`merchandise`.`id` AS `id`,`test`.`merchandise`.`name` AS `name`,`test`.`merchandise`.`description` AS `description` from

`test`.`customer` semi join (`test`.`indent`) join `test`.`merchandise`

where (

(`test`.`merchandise`.`id` = `test`.`indent`.`merchandise_id`)

and (`test`.`indent`.`customer_id` = '1')

and ('zhx' = 'zhx')

)

可以看到MySQL5.7针对这个子查询确实做了足够的优化，这里我们也看到了，MySQL通过半连接(semi join)优化了这次子查询。同时，由于存在性检查，先拿到了’zhx’的id为1，并且原来的customer.name in ("zhx")其实相当于customer.name = "zhx"，所以条件可以改写为test.indent.customer_id = ‘1’。

我们再来看看出问题的子查询场景，针对某组用户查询购买过得商品：

select * from merchandise where id in (

select merchandise_id from indent where customer_id in (

select id from customer where name in ("zhx","ypy")

)

);

这个语句的查询计划是：

/* select#1 */ select `test`.`merchandise`.`id` AS `id`,`test`.`merchandise`.`name` AS `name`,`test`.`merchandise`.`description` AS `description` from

`test`.`merchandise` semi join (`test`.`customer` join `test`.`indent`)

where (

(``.`merchandise_id` = `test`.`merchandise`.`id`)

and (`test`.`indent`.`customer_id` = `test`.`customer`.`id`)

and (`test`.`customer`.`name` in ('zhx','ypy'))

)

这个居然是个针对merchandise表的全扫描！怪不得线上数据库扛不住。

但是为什么呢？我们参考下MySQL5.7官方文档的查询优化章节：

https://dev.mysql.com/doc/refman/5.5/en/statement-optimization.html

针对大部分IN的子查询语句，会被优化成半连接和中间表的执行机制。

什么是半连接，A semi join B可以理解为A inner join B但是结果只包含B的数据；

中间表比较好理解，就是中间结果保存成一个临时表。

还有A inner join B inner join C和A inner join (B inner join C)的结果应该是一样的。

那么我们可以联想到一个优化，就是如果where条件里面包含C=？这样的条件，那么用A inner join (B inner join C)替换A inner join B inner join C可以提升很大效率。

获取用户"zhx"购买过的所有商品就是用这个定律，从

`merchandise` semi join (`test`.`customer` join `test`.`indent`)

变成了

`test`.`customer` semi join (`test`.`indent`) join `test`.`merchandise`

customer semi join indent表之后还是保留indent表和merchandise表进行join，相当于没有中间表

获取用户组多个用户"zhx","ypy"购买过的所有商品

由于没有优化，所以保持原有的：

`merchandise` semi join (`test`.`customer` join `test`.`indent`)

MySQL的semi join和中间表的优化缺陷在这里就体现出来了：

由于是merchandise去semi join中间结果，但是限制条件里面没有明确的merchandise限制，而且是和中间表进行join，所以针对merchandise进行全扫描。

这个其实很奇怪，为啥优化来优化去，变成了一个效果更差的扫描(这应该是三方面引起，一是semi join，二是没有明确的semi join要保留的表的有索引字段的条件限制，三是semi join的是中间表)

结论

MySQL针对多重子查询的优化还有待提升，最好还是用distinct和join的效率更好。

电路和微电子考研

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mysql5.7子查询优化_通过MySQL5.7子查询的坑联想到的MySQL查询优化规律与注意点

本文基于MySQL 5.7问题背景最近，发现业务线程卡死在读取数据库(MySQL 5.7)，数据库CPU飙升到了100%，经过定位发现是下面这样一个SQL(由用户触发的查询，可以理解为查看这个群组的所有用户买过哪些商品)导致的：select * from merchandise where id in (select merchandise_id from indent where custome...
复制链接

扫一扫