下面的演示基于MySQL5.7.27版本
一、关于MySQL子查询的优化策略介绍:
子查询优化策略
对于不同类型的子查询,优化器会选择不同的策略。
1. 对于 IN、=ANY 子查询,优化器有如下策略选择:
semijoin
Materialization
exists
2. 对于 NOT IN、<>ALL 子查询,优化器有如下策略选择:
Materialization
exists
3. 对于 derived 派生表,优化器有如下策略选择:
derived_merge,将派生表合并到外部查询中(5.7 引入 );
将派生表物化为内部临时表,再用于外部查询。
注意:update 和 delete 语句中子查询不能使用 semijoin、materialization 优化策略
二、创建数据进行模拟演示
为了方便分析问题先建两张表并插入模拟数据:
CREATE TABLE `test02` (
`id` int(11) NOT NULL,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB;
drop procedure idata;
delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while(i<=10000)do
insert into test02 values(i, i, i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
create table test01 like test02;
insert into test01 (select * from test02 where id<=1000)
三、举例分析SQL实例
子查询示例:
SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)
大部分人可定会简单的认为这个 SQL 会这样执行:
SELECT test02.b FROM test02 WHERE id < 10
结果:1,2,3,4,5,6,7,8,9
SELECT * FROM test01 WHERE test01.a IN (1,2,3,4,5,6,7,8,9);
但实际上 MySQL 并不是这样做的。MySQL 会将相关的外层表压到子查询中,优化器认为这样效率更高。也就是说,优化器会将上面的 SQL 改写成这样:
select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);
提示: 针对mysql5.5以及之前的版本
查看执行计划如下,发现这条SQL对表test01进行了全表扫描1000,效率低下:
root@localhost [dbtest01]>desc select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | PRIMARY | test01 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 10.00 | Using where |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)
但是此时实际执行下面的SQL,发现也不慢啊,这不是自相矛盾嘛,别急,咱们继续往下分析:
SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)
查看此条SQL的执行计划如下:
root@localhost [dbtest01]>desc SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| 1 | SIMPLE | | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | test01 | NULL | ref | a | a | 5 | .b | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using where |
+----+--------------+---------