mysql in优化_MySQL优化器和SemiJoin优化

提示:公众号展示代码会自动折行,建议横屏阅读

1. 背景

1.1. MySQL执行流程

MySQL的执行过程包括多个子阶段:语法分析、语义检查、逻辑优化、物理优化和执行。其中逻辑优化和物理优化统称为查询优化。一个查询优化器的输入是查询树,输出是查询执行计划。

0c08a89d66af72d377e41da4f17b649b.png

逻辑优化也称为基于规则的查询优化(Rule Based Optimization,简称RBO)。主要是对查询进行逻辑上的等价变换,目的是通过这些变换提高查询的性能。物理优化也称为基于代价的查询优化(Cost-based Optimization,简称CBO)。主要是通过一些模型,预测一个查询使用某种执行计划时的成本,并选择其中成本最小的一个。

1.2. SemiJoin定义

SemiJoin是表上的一个运算。假定R和S是两个表,R SemiJoin S可记为R ⋉ S,结果是在S中满足条件的元组。例如下面的例子是“雇员”和“部门”和它们的半连接的表:

雇员SemiJoin部门 where 雇员.DeptName=部门.DeptName

4ae677de22f72768a12ce651af492ea2.png

雇员Join部门 where 雇员.DeptName=部门.DeptName

d2a38d91917013196a3bd7d871984a00.png

从上面的结果可以得出如下结论:①半连接的结果是左表的子集。②增加右表一条重复记录,结果并不随之增多。

2. In子查询转SemiJoin的优化方法

2.1 优化方法

通用的转换格式如下:

  SELECT ...
FROM ot1 ... otN
WHERE (oe1, ... oeM) IN (SELECT ie1, ..., ieM
FROM it1 ... itK
[WHERE inner-cond])
[AND outer-cond]
[GROUP BY ...] [HAVING ...] [ORDER BY ...]

如上格式的子查询被转换为semijoin的格式如下(ot表示外表,it表示内表):

  SELECT ...
FROM (ot1 ... otN) SJ (it1 ... itK)
ON (oe1, ... oeM) = (ie1, ..., ieM)
[AND inner-cond]
[WHERE outer-cond]
[GROUP BY ...] [HAVING ...] [ORDER BY ...]

从上面的结构可以看到,转化是支持多列的。

2.2 为什么可以转化:

当执行如下SQL时,也是先遍历雇员表,然后在部门表里查找是否匹配。实际上和上面SemiJoin的语义是一致的,结果也一样。

select * from 雇员 where 雇员.DeptName in (select 部门.DeptName from 部门)

ff571276a38453ca650cebb5d6eb9fc7.png

具体Mysql将In转化成SemiJoin的代码逻辑实现在如下函数里:bool SELECT_LEX::flatten_subqueries()

2.3 一个子查询能够转换成SemiJoin还需满足如下条件:

1. 必须是In或any子查询

2. 不包含UNION按照扁平化规则后,UNION会导致结果不一致。比如:t1,t2和t3表定义和数据如下:

faf486ec3538f84ef11179a492365c6a.png

则执行如下SQL:select a from t1 where a in (select b from t2 UNION select c from t3);正确结果应该为:

e8ebdfa03c3fcc59ce26855e2a2d4a1a.png

但是SemiJoin转换后的SQL为:select a from t1 semijoin t2 where t1.a=t2.b UNION select a from t1 semijoin t3 where t1.a=t3.c;由于UNION的去重,导致结果为:

51a3fdb99f2b33a398969ea3cb29967d.png

另外即使是UNION ALL,如果t2与t3存在相同记录,结果也是不正确的。

3. 不包含GROUP BY 或 HAVING含有groupby的SQL,转换后(即扁平化后),导致与原来语义不一致了。比如原来是:select a from t1 where b in (select b from t2 group by b)按照规则转换成semijoin后为:select a from t1 semijoin (t2) where t1.b = t2.b group by t2.b;由于a1不在group by内,语法错误。

4. 不包含聚集函数。通过测试,确实没有走SemiJoin。

mysql> explain select * from t1 where a in (select count(*) from t2 where t1.b=t2.b group by a);
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using temporary |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
2 rows in set, 2 warnings (0.00 sec)

mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1276 | Field or reference 'ldy.t1.b' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` where (`ldy`.`t1`.`a`,(/* select#2 */ select 1 from `ldy`.`t2` where (`ldy`.`t1`.`b` = `ldy`.`t2`.`b`) group by `ldy`.`t2`.`a` having ((`ldy`.`t1`.`a`) = (count(0))))) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

5. 不包含LIMIT。通过测试,控制台直接报错。

mysql> select * from t1 where a in (select a from t2 limit 2);
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
mysql>

6. 子查询有ORDER BY时,转换后去掉ORDER BY

7.  外部查询没有STRAIGHT_JOIN(错)关键字STRAIGHT_JOIN表明,该查询的JOIN顺序不需要优化,按照原来SQL的顺序依次进行。MYSQL官方文档说有限制,经测试没有限制。比如,如下依然能转换为semijoin:

mysql> select * from t3;
mysql> explain select t1.*,t2.* from t2 STRAIGHT_JOIN t1 on t1.b=t2.b where t1.a in (select a from t3);
+----+--------------+-------------+------------+--------+---------------+------------+---------+----------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------+------------+---------+----------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | | NULL | eq_ref | | | 5 | ldy.t1.a | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL |
+----+--------------+-------------+------------+--------+---------------+------------+---------+----------+------+----------+----------------------------------------------------+
4 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b`,`ldy`.`t2`.`a` AS `a`,`ldy`.`t2`.`b` AS `b` from `ldy`.`t2` semi join (`ldy`.`t3`) straight_join `ldy`.`t1` where ((`ldy`.`t1`.`b` = `ldy`.`t2`.`b`) and (``.`a` = `ldy`.`t1`.`a`)) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

8. 是否相关子查询均支持。

注:测试所用的数据参考如下数据库名为ldy,表如下:

mysql> select * from t1;
+------+------+
| a | b |
+------+------+
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 | 6 |
| 6 | 7 |
| 7 | 8 |
+------+------+
8 rows in set (0.00 sec)

mysql> select * from t2;
+------+------+
| a | b |
+------+------+
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
+------+------+
4 rows in set (0.00 sec)

mysql> select * from t3;
+------+------+
| a | b |
+------+------+
| 0 | 1 |
| 2 | 3 |
| 3 | 4 |
+------+------+
3 rows in set (0.00 sec)

2.4 举例

例子1: 二重In子查询嵌套SQL通过两次SemiJoin优化成单层SQL,参考如下:

mysql> explain select * from t1 where t1.a in (select a from t2 where b in (select b from t3));
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Start temporary |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; End temporary; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` semi join (`ldy`.`t3` join `ldy`.`t2`) where ((`ldy`.`t1`.`a` = `ldy`.`t2`.`a`) and (`ldy`.`t2`.`b` = `ldy`.`t3`.`b`)) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

例子2: 并列in查询,and条件下能够转成SemiJoin:

mysql> explain select * from t1 where t1.a in (select a from t2) and t1.b in (select b from t3);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
| 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Start temporary |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; End temporary; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; FirstMatch(t1); Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` semi join (`ldy`.`t2`) semi join (`ldy`.`t3`) where ((`ldy`.`t1`.`b` = `ldy`.`t3`.`b`) and (`ldy`.`t2`.`a` = `ldy`.`t1`.`a`)) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

例子3,并列in查询,or条件下不能转成semijoin:

mysql> explain select * from t1 where t1.a in (select a from t2) or t1.b in (select b from t3);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using where |
| 3 | SUBQUERY | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
| 2 | SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` where ((`ldy`.`t1`.`a`,`ldy`.`t1`.`a` in ( (/* select#2 */ select `ldy`.`t2`.`a` from `ldy`.`t2` where 1 ), (`ldy`.`t1`.`a` in on where ((`ldy`.`t1`.`a` = `materialized-subquery`.`a`))))) or (`ldy`.`t1`.`b`,`ldy`.`t1`.`b` in ( (/* select#3 */ select `ldy`.`t3`.`b` from `ldy`.`t3` where 1 ), (`ldy`.`t1`.`b` in on where ((`ldy`.`t1`.`b` = `materialized-subquery`.`b`)))))) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

例子4: 支持多列SemiJoin:

mysql> explain select * from t1 where (a, b) in (select a, b from t2);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Start temporary |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; End temporary; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` semi join (`ldy`.`t2`) where ((`ldy`.`t1`.`b` = `ldy`.`t2`.`b`) and (`ldy`.`t1`.`a` = `ldy`.`t2`.`a`)) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

3. SemiJoin的几种执行策略

转换成SemiJoin后,MySQL将要根据权重,采用如下执行策略(以下策略也能通过Mysql的修改系统变量optimizer_switch,将某些策略关闭或打开)。假设有两个表:国家表(Country)和城市表(City),查找存在某一城市人口大于该国人口33%的所有国家。查询语句如下:

select *
from Country
where
Country.code IN (select City.Country
from City
where
City.Population > 0.33 * Country.Population and
City.Population > 1*1000*1000);

1. Duplicate Weedout:将semi-join当做普通innerJoin,之后使用临时表将重复项去除。
举例:Country SemiJoin City的执行过程如下:

4d74cc5e4a6fe6fd9ad2a21a7d431d1f.png

Country与City内连接后有German有三条记录,通过建立临时表,由于临时表的主键是Country表的RowId,重复项将不能插入。具体可以看到,去重过程如下:

8f46b16a5f822fb838c1a50f6fd14753.png

2. FirstMatch:执行join只挑选第一次匹配的项,非常类似于In语义。
举例,Country InnerJoin City的嵌套循环流程如下,先遍历Country再遍历City:

db5ed28dcc549c62373d426dff946e2f.png

当是SemiJoin时,在City里找到一个,就不需要继续找了。

6dc4e87bb349948822a0a88370afc053.png

3. LooseScan:sj-inner表将重复项分组,选出一个即返回。
查询拥有人造卫星的国家,数据库有国家表(Country)和人造卫星表(Satellite)。查询SQL如下:

select * from Country
where
Country.code in (select country_code from Satellite)

假如Satellite.country_code是索引,通过按索引顺序遍历,发现已经是分组的了。

0cd281d670746f0051d8310623b02b58.png

利用索引很容易从一个组调到另一组,而一组内只需要判断第一条记录即可,具体如下:

82f5a34c3b9bb9eeeb27252206a5cb6d.png

4. Materialize:

假如不是相关子查询,可以先物化内表为一个临时表,由于该临时表在条件字段上采用了索引,保证了唯一性(即消除了重复字段),SemiJoin的结果就和innerJoin相同了。该临时表与外表执行JOIN。

0d388902cabbdbaa978daac8ba84cf99.png

根据临时表与外表Join过程,采用类似NestedLoop形式,根据先遍历临时表还是外表,进一步包括Lookup和Scan两种。Lookup为物化inner表(该表包含一个索引,该索引既可以删除重复元素,也可用于后面执行Join运算),先循环遍历outer表,并在物化表查找。

Scan为物化inner表(该表包含一个索引,该索引既可以删除重复元素,也可用于后面执行Join运算),先循环遍历物化表,并在outer表查找。

4. 参考资料

https://mariadb.com/kb/en/library/optimization-strategies/https://dev.mysql.com/doc/refman/8.0/en/semi-joins.html

29109ece119d8793f59ad5b7e0f1036a.png

腾讯数据库技术团队对内支持微信红包,彩票、数据银行等集团内部业务,对外为腾讯云提供各种数据库产品,如CDB、CTSDB、CKV、CMongo, 腾讯数据库技术团队专注于增强数据库内核功能,提升数据库性能,保证系统稳定性并解决用户在生产过程中遇到的问题,并对生产环境中遇到的问题及知识进行分享。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值