一 背景和架构
在《庖丁解牛-图解MySQL 8.0优化器查询解析篇》一文中我们重点介绍了MySQL最新版本8.0.25关于SQL基本元素表、列、函数、聚合、分组、排序等元素的解析、设置和转换过程,本篇我们继续来介绍更为复杂的子查询、分区表和JOIN的复杂转换过程,大纲如下:
Transformation
- remove_redundant_subquery_clause :Permanently remove redundant parts from the query if 1) This is a subquery 2) Not normalizing a view. Removal should take place when a query involving a view is optimized, not when the view is created.
- remove_base_options:Remove SELECT_DISTINCT options from a query block if can skip distinct
- resolve_subquery :Resolve predicate involving subquery, perform early unconditional subquery transformations
- Convert subquery predicate into semi-join, or
- Mark the subquery for execution using materialization, or
- Perform IN->EXISTS transformation, or
- Perform more/less ALL/ANY -> MIN/MAX rewrite
- Substitute trivial scalar-context subquery with its value
- transform_scalar_subqueries_to_join_with_derived:Transform eligible scalar subqueries to derived tables.
- flatten_subqueries :Convert semi-join subquery predicates into semi-join join nests. Convert candidate subquery predicates into semi-join join nests. This transformation is performed once in query lifetime and is irreversible.
- apply_local_transforms :
- delete_unused_merged_columns : If query block contains one or more merged derived tables/views, walk through lists of columns in select lists and remove unused columns.
- simplify_joins : Convert all outer joins to inner joins if possible.
- prune_partitions :Perform partition pruning for a given table and condition.
- push_conditions_to_derived_tables :Pushing conditions down to derived tables must be done after validity checks of grouped queries done by apply_local_transforms();
- Window::eliminate_unused_objects:Eliminate unused window definitions, redundant sorts etc.
二 详细转换过程
1 解析子查询(resolve_subquery)
解析条件中带有子查询的语句,做一些早期的无限制的子查询转换,包括:
- 标记subquery是否变成semi-join
转换判断条件
-
- 检查OPTIMIZER_SWITCH_SEMIJOIN和HINT没有限制
- 子查询是IN/=ANY和EXIST subquery的谓词
- 子查询是简单查询块而不是UNION
- 子查询无隐形和显性的GROUP BY
- 子查询没有HAVING、WINDOW函数
- Resolve的阶段是Query_block::RESOLVE_CONDITION和Query_block::RESOLVE_JOIN_NEST并且没有用到最新的Hyper optimizer优化器。
- 外查询块可以支持semijoins
- 至少要一个表,而不是类似"SELECT 1"
- 子查询的策略还没有指定Subquery_strategy::UNSPECIFIED
- 父查询也至少有一个表
- 父查询和子查询都不能有straight join
- 父查询块不禁止semijoin
- IN谓词返回值是否是确定的,不是RAND
- 根据子查询判断结果是否需要转成true还是false以及是否为NULL,判断是可以做antijoin还是semijoin
- Antijoin是可以支持的,或者是semijoin
- offset和limit对于semjoin是有效的,offset是从第一行开始,limit也不是0
设置
Subquery_strategy::CANDIDATE_FOR_SEMIJOIN并添加sj_candidates
- 标记subquery是否执行时采用materialization方案
- 如果不符合转换semijoin,尝试使用物化方式,转换判断条件
- Optimzier开关subquery_to_derived=on
- 子查询是IN/=ANY or EXISTS谓词
- 子查询是简单查询块而不是UNION
- 如果是[NOT] EXISTS,必须没有聚合
- Subquery谓词在WHERE子句(目前没有在ON子句实现),而且是ANDs or ORs的表达式tree
- 父查询块支持semijoins
- 子查询的策略还没有指定Subquery_strategy::UNSPECIFIED
- 父查询也至少有一个表,然后可以做LEFT JOIN
- 父查询块不禁止semijoin
- IN谓词返回值是否是确定的,不是RAND
- 根据子查询判断结果是否需要转成true还是false以及是否为NULL,判断是可以做antijoin还是semijoin
- 不支持左边参数不是multi-column子查询(WHERE (outer_subq) = ROW(derived.col1,derived.col2))
- 该子查询不支持转换为Derived table(m_subquery_to_derived_is_impossible)
- 设置Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE并添加sj_candidates
- 如果上面两个策略无法使用,根据类型选择transformer
- Item_singlerow_subselect::select_transformer
- 对于简单的标量子查询,在查询中直接用执行结果代替
select * from t1 where a = (select 1);
=>
select * from t1 where a = 1;
- Item_in_subselect/Item_allany_subselect::select_transformer->select_in_like_transformer
- select_in_like_transformer函数来处理 IN/ALL/ANY/SOME子查询转换transformation
- 处理"SELECT 1"(Item_in_optimizer)
- 如果目前还没有子查询的执行方式,也就是无法使用semijoin/antijoin执行的子查询,会做IN->EXISTS的转换,本质是在物化执行和迭代式循环执行中做选择。IN语法代表非相关子查询仅执行一次,将查询结果物化成临时表,之后需要结果时候就去物化表中查找;EXISTS代表对于外表的每一条记录,子查询都会执行一次,是迭代式循环执行。子查询策略设定为Subquery_strategy::CANDIDATE_FOR_IN2EXISTS_OR_MAT
- 重写single-column的IN/ALL/ANY子查询(single_value_transformer)
oe $cmp$ (SELECT ie FROM ... WHERE subq_where ... HAVING subq_having)
=>
- oe $cmp$ (SELECT MAX(...) ) // handled by Item_singlerow_subselect
- oe $cmp$ \<max\>(SELECT ...) // handled by Item_maxmin_subselect
fails=>Item_in_optimizer
- 对于已经是materialized方案,不转换
- 通过equi-join转换IN到EXISTS
- 如果是ALL/ANY单值subquery谓词,尝试用MIN/MAX子查询转换
SELECT * FROM t1 WHERE a < ANY (SELECT a FROM t1);
=>
SELECT * FROM t1 WHERE a < (SELECT MAX(a) FROM t1)
- 不满足上面,调用single_value_in_to_exists_transformer转换IN到EXISTS
- 转换将要将子查询设置为相关子查询,设置UNCACHEABLE_DEPENDENT标识
- 如果子查询包含聚合函数、窗口函数、GROUP语法、HAVING语法,将判断条件加入到HAVING子句中,另外通过ref_or_null_helper来区分NULL和False的结果,如需要处理NULL IN (SELECT ...)还需要封装到Item_func_trig_cond触发器中。
SELECT ... FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2)
=>
SELECT ... FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2
[trigcond] HAVING t1.b=ref-to-<expr of SUM(t1.a)>)
- 如果子查询不包含聚合函数、窗口函数、GROUP语法,会放在WHERE查询条件中,当然如果需要处理NULL情况还是要放入HAVING子句(Item_func_trig_cond+Item_is_not_null_test)。
不需要区分NULL和FALSE的子查询:
SELECT 1 FROM ... WHERE (oe $cmp$ ie) AND subq_where
需要区分的子查询:
SELECT 1 FROM ...
WHERE subq_where AND trigcond((oe $cmp$ ie) OR (ie IS NULL))
HAVING trigcond(@<is_not_null_test@>(ie))
- JOIN::optimize()会计算materialization和EXISTS转换的代价进行选择,设置m_subquery_to_derived_is_impossible = true
- ROW值转换,通过Item_in_optimizer,不支持ALL/ANY/SOME(