MySQL EXPLAIN 独立子查询dependent subquery 优化示例

最新推荐文章于 2024-08-19 16:45:00 发布

薛定谔的DBA

最新推荐文章于 2024-08-19 16:45:00 发布

阅读量1w

点赞数 1

分类专栏： MYSQL MYSQL 调优优化

本文链接：https://blog.csdn.net/kk185800961/article/details/49340589

版权

MYSQL 同时被 2 个专栏收录

61 篇文章 2 订阅

订阅专栏

MYSQL 调优优化

6 篇文章 0 订阅

订阅专栏

（本例创建表脚本在文章底部）

对于mysql的出现的子查询语句，大部分都是不太好的，尤其 in() 的子查询语句，如下：

select * from test.tabname where id in(select id from test.tabname2 where name='love');

一般会认为mysql会先执行子查询，返回所有包含 test.tabname2 中所有符合条件的 id，外层查询则搜索内层的结果集。一般认为是这样执行的。

select group_concat(id) from test.tabname2 where name='love';
--内层查询结果：1,3,5,7,9,11,13,15,17,1

select * from test.tabname where id in(1,3,5,7,9,11,13,15,17,19);

很不幸，MySQL 不是这样做的。MySQL 会将相关的外层表压缩到子查询中，它认为这样可以更高效地查找到数据行。也就是说，MySQL 会将查询改写成下面的样子：

select * from test.tabname 
where exists(
	select * from test.tabname2 
	where tabname.id=tabname2.id 
	and tabname2.name='love');

这时，子查询先根据 id 来关联外部表 tabname ，因为需要 id 字段，所以mysql 无法先执行这个子查询，通过 explain 我们可以看到子查询是一个相关子查询（DEPENDENT SUBQUERY ）。

mysql> explain select * from test.tabname where id in(select id from test.tabname2 where name='love');
+----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
| id | select_type        | table    | type            | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | tabname  | ALL             | NULL          | NULL    | NULL    | NULL |   30 | Using where | 
|  2 | DEPENDENT SUBQUERY | tabname2 | unique_subquery | PRIMARY       | PRIMARY | 4       | func |    1 | Using where | 
+----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)

mysql> explain select * from test.tabname 
    -> where exists(select * from test.tabname2 where tabname.id=tabname2.id and tabname2.name='love');
+----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
| id | select_type        | table    | type   | possible_keys | key     | key_len | ref             | rows | Extra       |
+----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
|  1 | PRIMARY            | tabname  | ALL    | NULL          | NULL    | NULL    | NULL            |   30 | Using where | 
|  2 | DEPENDENT SUBQUERY | tabname2 | eq_ref | PRIMARY       | PRIMARY | 4       | test.tabname.id |    1 | Using where | 
+----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
2 rows in set (0.00 sec)

根据explain 的输出我们可以看到，mysql 先选择对 tabname 表进行全表扫描 30 次，然后根据返回的 id 逐个执行子查询。如果是一个很小的表，这个查询糟糕的性能可能还不会引起注意。如果是一个非常大的表，这个查询的性能会非常糟糕。

解决方法：使用连接查询

select tabname.* from test.tabname,test.tabname2 where tabname.id=tabname2.id and tabname2.name='love';
select tabname. * from test.tabname inner join test.tabname2 using(id) where tabname2.name='love';

执行计划：

mysql> explain select tabname.* from test.tabname,test.tabname2 where tabname.id=tabname2.id and tabname2.name='love';
mysql> explain select tabname. * from test.tabname inner join test.tabname2 using(id) where tabname2.name='love';
+----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
| id | select_type | table    | type   | possible_keys | key     | key_len | ref              | rows | Extra       |
+----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
|  1 | SIMPLE      | tabname2 | ALL    | PRIMARY       | NULL    | NULL    | NULL             |   20 | Using where | 
|  1 | SIMPLE      | tabname  | eq_ref | PRIMARY       | PRIMARY | 4       | test.tabname2.id |    1 |             | 
+----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
2 rows in set (0.00 sec)

这次可以看到，select_type 变为简单查询。首先访问的是 tabname2 ，因为表 tabname2 的记录比较少，只需该表全表扫描，再查询子查询，这省去了较多的IO。

环境脚本：

drop table if exists tabname,tabname2;

create table tabname (  
id int auto_increment not null primary key,  
name varchar(10) null,  
indate datetime null,  
tid int null,  
key(tid),  
key(indate)  
)engine=innodb;  
  
  
create table tabname2 (  
id int auto_increment not null primary key,  
name varchar(10) null,  
indate datetime null,  
tid int null,  
key(tid),  
key(indate)  
)engine=innodb;  
  


drop procedure if exists inserttab;

delimiter //
create procedure inserttab()
begin
declare i int default 1 ;
while i<= 10 do
insert into tabname(name,indate,tid) values('love',now(),2),('lucky',now(),3),('passion',now(),4);  
insert into tabname2(name,indate,tid) values('love',now(),2),('lucky',now(),3);
set i= i + 1;
end while;
end;//
delimiter ;

call inserttab;



select count(*) from test.tabname;
select count(*) from test.tabname2;

参考：《高性能 MySQL》