MySQL-5.7-8.2.2.1 Optimizing Subqueries, Derived Tables, and View References

8.2.2.1 Optimizing Subqueries, Derived Tables, and View References with Semijoin Transformations

使用半连接转换优化子查询、派生表和视图引用

A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization.

半连接是一种准备时间转换,它支持多种执行策略,如表提取、重复删除、首次匹配、松散扫描和物化。

The optimizer uses semijoin strategies to improve subquery execution, as described in this section.

优化器使用半连接策略来改进子查询的执行,如本节所述。

For an inner join between two tables, the join returns a row from one table as many times as there are matches in the other table.

对于两个表之间的内部连接,连接从一个表返回一行的次数与另一个表中匹配的次数相同。

But for some questions, the only information that matters is whether there is a match, not the number of matches.

但对于某些问题,唯一重要的信息是是否有匹配,而不是匹配的数量。

Suppose that there are tables named class and roster that list classes in a course curriculum and class rosters (students enrolled in each class), respectively.

假设有名为class和roster 的表,分别列出课程课程和班级名册(每个班级注册的学生)中的班级。

To list the classes that actually have students enrolled, you could use this join:

要列出有学生注册的课程,你可以使用这个join:

However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.

然而,该结果为每个注册学生列出了每个班级一次。对于所问的问题,这是不必要的重复信息。

Assuming that class_num is a primary key in the class table, duplicate suppression is possible by using SELECT DISTINCT, but it is inefficient to generate all matching rows first only to eliminate duplicates later.

假设class_num是类表中的主键,可以使用SELECT DISTINCT来抑制重复,但是先生成所有匹配的行然后再消除重复是低效的。

The same duplicate-free result can be obtained by using a subquery:

使用子查询可以获得相同的无重复结果:

Here, the optimizer can recognize that the IN clause requires the subquery to return only one instance of each class number from the roster table.

在这里,优化器可以识别出IN子句要求子查询只返回roster 表中每个类号的一个实例。

In this case, the query can use a semijoin; that is, an operation that returns only one instance of each row in class that is matched by rows in roster.

在本例中,查询可以使用半连接;也就是说,只返回与roster表中的行相匹配的class 表中每一行的一个实例的操作。

Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, or view references.

外部查询规范中允许外部连接和内部连接语法,表引用可以是基表、派生表或视图引用。

In MySQL, a subquery must satisfy these criteria to be handled as a semijoin:

在MySQL中,子查询必须满足以下条件才能作为半连接处理:

It must be an IN (or =ANY) subquery that appears at the top level of the WHERE or ON clause, possibly as a term in an AND expression. For example:

它必须是一个IN(或=ANY)子查询,出现在WHERE或ON子句的顶层,可能是AND表达式中的一个术语。例如

 Here, ot_i and it_i represent tables in the outer and inner parts of the query, and oe_i and ie_i represent expressions that refer to columns in the outer and inner tables.

这里,ot_i和it_i表示查询的外部和内部部分中的表,oe_i和ie_i表示引用外部和内部表中的列的表达式。

It must be a single SELECT without UNION constructs

它必须是一个没有UNION结构的SELECT

It must not contain a GROUP BY or HAVING clause.

它不能包含GROUP BY或HAVING子句。

It must not be implicitly grouped (it must contain no aggregate functions).

它不能被隐式分组(它不能包含聚合函数)。

It must not have ORDER BY with LIMIT.

它不能有ORDER BY with LIMIT。

The statement must not use the STRAIGHT_JOIN join type in the outer query.

语句不能在外部查询中使用STRAIGHT_JOIN连接类型。

The STRAIGHT_JOIN modifier must not be present.

必须不提供STRAIGHT_JOIN修饰符。

The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.

外部表和内部表的总数必须小于连接中允许的最大表数。

The subquery may be correlated or uncorrelated. DISTINCT is permitted, as is LIMIT unless ORDER BY is also used.

子查询可以是相关的,也可以是不相关的。允许DISTINCT和LIMIT,除非ORDER BY也被使用。

If a subquery meets the preceding criteria, MySQL converts it to a semijoin and makes a cost-based choice from these strategies:

如果一个子查询符合上述条件,MySQL就会将其转换为半连接,并根据以下策略做出基于成本的选择:

Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.

将子查询转换为连接,或者使用表拉出并将查询作为子查询表和外部表之间的内部连接运行。表拉出将表从子查询拉出到外部查询。

Duplicate Weedout: Run the semijoin as if it was a join and remove duplicate records using a temporary table.

Duplicate Weedout:像连接一样运行半连接,并使用临时表删除重复记录。

FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

FirstMatch:当扫描内部表的行组合,并且给定值组有多个实例时,选择一个而不是全部返回。这种“快捷方式”扫描并消除了不必要的行。

LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

LooseScan:使用允许从每个子查询的值组中选择单个值的索引扫描子查询表。

Materialize the subquery into an indexed temporary table that is used to perform a join, where the index is used to remove duplicates.

将子查询物化到索引临时表中,该临时表用于执行连接,其中索引用于删除重复项。

The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.

以后在连接临时表和外部表时,也可以使用索引进行查找;如果不是,则扫描表。

For more information about materialization, see Section 8.2.2.2, “Optimizing Subqueries with Materialization”.

Each of these strategies can be enabled or disabled using the following optimizer_switch system variable flags:

这些策略中的每一个都可以使用以下optimizer_switch系统变量标志启用或禁用:

  • The semijoin flag controls whether semijoins are used.

  • 半连接标志控制是否使用半连接。
  • If semijoin is enabled, the firstmatchloosescanduplicateweedout, and materialization flags enable finer control over the permitted semijoin strategies.

  • 如果启用了半连接,则firstmatch、loosescan、duplicateweed和materialization标志可以对允许的半连接策略进行更精细的控制。

  • If the duplicateweedout semijoin strategy is disabled, it is not used unless all other applicable strategies are also disabled.

  • 如果禁用了duplicateweed半连接策略,则不会使用它,除非也禁用了所有其他适用的策略。

  • If duplicateweedout is disabled, on occasion the optimizer may generate a query plan that is far from optimal. This occurs due to heuristic pruning during greedy search, which can be avoided by setting optimizer_prune_level=0.

  • 如果禁用duplicate - weedout,有时优化器可能生成一个远不是最优的查询计划。这是由于在贪婪搜索期间的启发式修剪造成的,可以通过设置optimizer_prune_level=0来避免。

These flags are enabled by default. See Section 8.9.2, “Switchable Optimizations”.

这些标志在默认情况下是启用的。

The optimizer minimizes differences in handling of views and derived tables.

优化器在处理视图和派生表时尽量减少差异。

This affects queries that use the STRAIGHT_JOIN modifier and a view with an IN subquery that can be converted to a semijoin.

这将影响使用了STRAIGHT_JOIN修饰符的查询和带有可以转换为半连接的IN子查询的视图。

The following query illustrates this because the change in processing causes a change in transformation, and thus a different execution strategy:

下面的查询说明了这一点,因为处理中的更改会导致转换中的更改,从而导致不同的执行策略:

The optimizer first looks at the view and converts the IN subquery to a semijoin, then checks whether it is possible to merge the view into the outer query.

优化器首先查看视图并将IN子查询转换为半连接,然后检查是否有可能将视图合并到外部查询中。

Because the STRAIGHT_JOIN modifier in the outer query prevents semijoin, the optimizer refuses the merge, causing derived table evaluation using a materialized table.

因为外部查询中的stradi_join修饰符禁止半连接,所以优化器拒绝合并,从而导致使用物化表求值派生表。

EXPLAIN output indicates the use of semijoin strategies as follows:

EXPLAIN输出表明使用半连接策略如下:

Semijoined tables show up in the outer select. For extended EXPLAIN output, the text displayed by a following SHOW WARNINGS shows the rewritten query, which displays the semijoin structure.

半连接表显示在外部选择中。对于扩展的EXPLAIN输出,下面的SHOW WARNINGS显示的文本显示了重写后的查询,该查询显示了半连接结构。

(See Section 8.8.3, “Extended EXPLAIN Output Format”.)

From this you can get an idea about which tables were pulled out of the semijoin. If a subquery was converted to a semijoin, you can see that the subquery predicate is gone and its tables and WHERE clause were merged into the outer query join list and WHERE clause.

由此,您可以了解哪些表被从半连接中拉出。如果子查询被转换为半连接,您可以看到子查询谓词消失了,它的表和WHERE子句被合并到外部查询连接列表和WHERE子句中。

Temporary table use for Duplicate Weedout is indicated by Start temporary and End temporary in the Extra column. Tables that were not pulled out and are in the range of EXPLAIN output rows covered by Start temporary and End temporary have their rowid in the temporary table.

“重复剔除”的临时表的使用由“额外”列中的“开始临时”和“结束临时”表示。未被拉出且位于Start临时表和End临时表所覆盖的EXPLAIN输出行范围内的表的行宽在临时表中。

FirstMatch(tbl_name) in the Extra column indicates join shortcutting.

Extra列中的FirstMatch(tbl_name)表示连接快捷方式。

LooseScan(m..n) in the Extra column indicates use of the LooseScan strategy. m and n are key part numbers.

Extra列中的LooseScan(m..n)表示使用了LooseScan策略。M和n是关键的零件号。

Temporary table use for materialization is indicated by rows with a select_type value of MATERIALIZED and rows with a table value of <subqueryN>.

临时表的物化使用由select_type值MATERIALIZED的行和表值<subqueryN>的行表示。


                
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值