(本例创建表脚本在文章底部)
对于mysql的出现的子查询语句,大部分都是不太好的,尤其 in() 的子查询语句,如下:
select * from test.tabname where id in(select id from test.tabname2 where name='love');
一般会认为mysql会先执行子查询,返回所有包含 test.tabname2 中所有符合条件的 id,外层查询则搜索内层的结果集。一般认为是这样执行的。
select group_concat(id) from test.tabname2 where name='love';
--内层查询结果:1,3,5,7,9,11,13,15,17,1
select * from test.tabname where id in(1,3,5,7,9,11,13,15,17,19);
很不幸,MySQL 不是这样做的。MySQL 会将相关的外层表压缩到子查询中,它认为这样可以更高效地查找到数据行。也就是说,MySQL 会将查询改写成下面的样子:
select * from test.tabname
where exists(
select * from test.tabname2
where tabname.id=tabname2.id
and tabname2.name='love');
这时,子查询先根据 id 来关联外部表 tabname ,因为需要 id 字段,所以mysql 无法先执行这个子查询,通过 explain 我们可以看到子查询是一个相关子查询(DEPENDENT SUBQUERY )。
mysql> explain select * from test.tabname where id in(select id from test.tabname2 where name='love');
+----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
| 1 | PRIMARY | tabname | ALL | NULL | NULL | NULL | NULL | 30 | Using where |
| 2 | DEPENDENT SUBQUERY | tabname2 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)
mysql> explain select * from test.tabname
-> where exists(select * from test.tabname2 where tabname.id=tabname2.id and tabname2.name='love');
+----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
| 1 | PRIMARY | tabname | ALL | NULL | NULL | NULL | NULL | 30 | Using where |
| 2 | DEPENDENT SUBQUERY | tabname2 | eq_ref | PRIMARY | PRIMARY | 4 | test.tabname.id | 1 | Using where |
+----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
2 rows in set (0.00 sec)
根据explain 的输出我们可以看到,mysql 先选择对 tabname 表进行全表扫描 30 次,然后根据返回的 id 逐个执行子查询。如果是一个很小的表,这个查询糟糕的性能可能还不会引起注意。如果是一个非常大的表,这个查询的性能会非常糟糕。
解决方法:使用连接查询
select tabname.* from test.tabname,test.tabname2 where tabname.id=tabname2.id and tabname2.name='love';
select tabname. * from test.tabname inner join test.tabname2 using(id) where tabname2.name='love';
执行计划:
mysql> explain select tabname.* from test.tabname,test.tabname2 where tabname.id=tabname2.id and tabname2.name='love';
mysql> explain select tabname. * from test.tabname inner join test.tabname2 using(id) where tabname2.name='love';
+----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
| 1 | SIMPLE | tabname2 | ALL | PRIMARY | NULL | NULL | NULL | 20 | Using where |
| 1 | SIMPLE | tabname | eq_ref | PRIMARY | PRIMARY | 4 | test.tabname2.id | 1 | |
+----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
2 rows in set (0.00 sec)
这次可以看到,select_type 变为 简单查询。首先访问的是 tabname2 ,因为表 tabname2 的记录比较少,只需该表全表扫描,再查询子查询,这省去了较多的IO。
环境脚本:
drop table if exists tabname,tabname2;
create table tabname (
id int auto_increment not null primary key,
name varchar(10) null,
indate datetime null,
tid int null,
key(tid),
key(indate)
)engine=innodb;
create table tabname2 (
id int auto_increment not null primary key,
name varchar(10) null,
indate datetime null,
tid int null,
key(tid),
key(indate)
)engine=innodb;
drop procedure if exists inserttab;
delimiter //
create procedure inserttab()
begin
declare i int default 1 ;
while i<= 10 do
insert into tabname(name,indate,tid) values('love',now(),2),('lucky',now(),3),('passion',now(),4);
insert into tabname2(name,indate,tid) values('love',now(),2),('lucky',now(),3);
set i= i + 1;
end while;
end;//
delimiter ;
call inserttab;
select count(*) from test.tabname;
select count(*) from test.tabname2;
参考:《高性能 MySQL》