SQL优化（三）：子查询和IN，EXISTS用法和优化方法

最新推荐文章于 2024-08-09 23:23:34 发布

服务端开发

最新推荐文章于 2024-08-09 23:23:34 发布

阅读量6.2k

点赞数 3

分类专栏： MySQL

本文链接：https://blog.csdn.net/u010013573/article/details/89303796

版权

MySQL 专栏收录该内容

15 篇文章 13 订阅

订阅专栏

用法

1. 与IN结合使用

子查询与IN结合使用时，通常通过子查询查询出某个表单列的值，然后作为外层的SELECT的IN查询的数据源，如下，查询今天进行了购物的用户列表，首先通过子查询在订单表t_order查出所有今天进行了购物的用户的user_id，然后在外层SELECT中对于用户表t_user的每个用户都通过IN来判断自己的user_id是否在里面：

mysql> select * from t_user where id in (select user_id from t_order where buy_date=curdate());
+----+------+----------------------------------+------------+--------+
| id | name | password                         | email      | phone  |
+----+------+----------------------------------+------------+--------+
|  2 | xyz2 | 5838eec5e44b83f35c2763382b45e469 | 456@qq.com | 456789 |
+----+------+----------------------------------+------------+--------+

执行计划如下：

mysql> explain select * from t_user where id in (select user_id from t_order where buy_date=curdate());
+----+-------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+---------------------------------+
| id | select_type | table   | partitions | type | possible_keys                                 | key          | key_len | ref   | rows | filtered | Extra                           |
+----+-------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+---------------------------------+
|  1 | SIMPLE      | t_user  | NULL       | ALL  | PRIMARY                                       | NULL         | NULL    | NULL  |    1 |   100.00 | NULL                            |
|  1 | SIMPLE      | t_order | NULL       | ref  | idx_user_id,idx_user_id_buy_date,idx_buy_date | idx_buy_date | 3       | const |    1 |   100.00 | Using where; FirstMatch(t_user) |
+----+-------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+---------------------------------+
2 rows in set, 1 warning (0.00 sec)

性能分析：数据集合+O(N)线性时间复杂度

对于外层SELECT对应用户表t_user的每一行数据都要执行一次这个子查询，而这个子查询是需要返回一个数据集合而不是单条数据，然后再判断外层SELECT的当前数据行的该列的值是否在这个集合中，类似于O(N)的线性时间复杂度，如Java的集合的contains方法，所以性能是很低的，即MySQL需要返回的数据量大同时查询的时间复杂度高。

2. 与EXISTS结合使用

在外层SELECT中，除了可以通过IN来使用子查询的结果外，还可以通过EXISTS，如下：与IN不同的是，在子查询中外层的t_user表需要在子查询内使用，在子查询内部才是进行实际的查询，而EXISTS的作用是根据EXISTS返回的结果，即boolean类型的true或false来决定是否需要返回该外层SELECT当前遍历到的结果。

mysql> select * from t_user where exists (select * from t_order where t_user.id=t_order.user_id and t_order. buy_date=curdate());
+----+------+----------------------------------+------------+--------+
| id | name | password                         | email      | phone  |
+----+------+----------------------------------+------------+--------+
|  2 | xyz2 | 5838eec5e44b83f35c2763382b45e469 | 456@qq.com | 456789 |
+----+------+----------------------------------+------------+--------+
1 row in set (0.00 sec)

执行计划如下：

mysql> explain select * from t_user where exists (select * from t_order where t_user.id=t_order.user_id and t_order. buy_date=curdate());
+----+--------------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+-------------+
| id | select_type        | table   | partitions | type | possible_keys                                 | key          | key_len | ref   | rows | filtered | Extra       |
+----+--------------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+-------------+
|  1 | PRIMARY            | t_user  | NULL       | ALL  | NULL                                          | NULL         | NULL    | NULL  |    1 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t_order | NULL       | ref  | idx_user_id,idx_user_id_buy_date,idx_buy_date | idx_buy_date | 3       | const |    1 |    25.00 | Using where |
+----+--------------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

性能分析：True或False的boolean值 + O(1)常量时间复杂度

执行计划与IN差不多，外层SELECT的type都是ALL，即全表扫描，但是EXISTS的执行过程与IN不一致，对于EXISTS而言，外层SELECT对应的用户表t_user也参与到了子查询的SQL中，即 where t_user.id=t_order.user_id，故如果子查询的结果不为空，即存在数据，则外层SELECT对应的t_user表的当前数据行肯定是符合要求的，故该子查询实际上并不返回任何数据，而是返回值True或False，不需要与IN一样返回一个数据集合。
而对外层SELECT来说，通过EXISTS判断子查询返回的boolean值True或者False来判断当前数据行是否符合要求，故时间复杂度为常量级别O(1)。

优化方法

1. 使用EXISTS替代IN

由以上的分析可知，在使用子查询时，将EXISTS结合使用比与IN结合使用效率更高，所以可以使用EXISTS来替代IN。

2. 使用JOIN替代子查询

子查询不管是使用IN还是EXISTS，对外层SELECT对应的数据表均需要进行全表扫描，并且对于每行数据都需要执行一次子查询，所以如果该数据行表很大，则需要执行大量的子查询，，即可能出现“大表驱动小表”，从而产生性能问题。

对于JOIN而言，由于可以通过“小表驱动大表”，并且进行JOIN的列都加了索引，所以可以一定程度上优化子查询，如下：还是查询今天进行了购物的用户：

mysql> select * from t_user inner join t_order where t_user.id=t_order.user_id and t_order.buy_date=curdate();
+----+------+----------------------------------+------------+--------+----------+---------+------+------------+
| id | name | password                         | email      | phone  | order_id | user_id | cost | buy_date   |
+----+------+----------------------------------+------------+--------+----------+---------+------+------------+
|  2 | xyz2 | 5838eec5e44b83f35c2763382b45e469 | 456@qq.com | 456789 |        5 |       2 | 1000 | 2019-04-14 |
+----+------+----------------------------------+------------+--------+----------+---------+------+------------+
1 row in set (0.00 sec)

mysql> explain select * from t_user inner join t_order where t_user.id=t_order.user_id and t_order.buy_date=curdate();
+----+-------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys                                 | key          | key_len | ref   | rows | filtered | Extra       |
+----+-------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t_user  | NULL       | ALL  | PRIMARY                                       | NULL         | NULL    | NULL  |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | t_order | NULL       | ref  | idx_user_id,idx_user_id_buy_date,idx_buy_date | idx_buy_date | 3       | const |    1 |   100.00 | Using where |
+----+-------------+---------+------------+------+-----------------------------------------------+--------------+---------+-------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

3. 子查询和JOIN均不用，程序控制

如果数据表较大时，使用子查询和JOIN效率均不高，并且对于JOIN而言，会影响分库分表的使用，故可以对需要关联的表分别查询出来，然后在程序中进行数据拼接组装操作，从而减轻数据库的压力，因为数据库通常是应用的性能瓶颈，而应用程序通常可以进行分布式和集群部署来拓展。
除此之外，对于拼接的结果可以缓存到分布式缓存，如Redis中，从而可以重复使用。