MySQL-5.7-8.2.2.3 Optimizing Subqueries with the EXISTS Strategy

最新推荐文章于 2024-08-09 17:00:03 发布

KING BOMB

最新推荐文章于 2024-08-09 17:00:03 发布

阅读量111

点赞数

分类专栏： Mysql 文章标签： mysql html

原文链接：https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html

版权

Mysql 专栏收录该内容

142 篇文章 2 订阅

订阅专栏

本文探讨了MySQL如何优化使用IN或=ANY操作符的子查询，特别是涉及NULL值的情况。优化器通过下推等式限制子查询的行数，但在遇到NULL值时会变得更复杂。当外部表达式可能为NULL时，查询执行会变得缓慢，优化器通过触发条件函数处理这种情况。文章提供了帮助优化查询执行的建议，如声明列非NULL，避免在条件中使用NULL，以及使用EXISTS替代IN等。

摘要由CSDN通过智能技术生成

Certain optimizations are applicable to comparisons that use the IN (or =ANY) operator to test subquery results.

某些优化适用于使用IN(或=ANY)操作符测试子查询结果的比较。

This section discusses these optimizations, particularly with regard to the challenges that NULL values present.

本节讨论这些优化，特别是关于NULL值带来的挑战。

The last part of the discussion suggests how you can help the optimizer.

讨论的最后一部分建议如何帮助优化器。

Consider the following subquery comparison:

考虑下面的子查询比较:

MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.

MySQL从外部到内部评估查询。也就是说，它首先获取外部表达式outer_expr的值，然后运行子查询并捕获它生成的行。

A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr.

一个非常有用的优化是“通知”子查询，只对内部表达式inner_expr等于outer_expr的行感兴趣。

This is done by pushing down an appropriate equality into the subquery's WHERE clause to make it more restrictive. The converted comparison looks like this:

这是通过在子查询的WHERE子句中下推一个适当的等号来实现的，以使其更具限制性。转换后的比较如下所示:

After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.

转换之后，MySQL可以使用下推等式来限制计算子查询时必须检查的行数。

More generally, a comparison of N values to a subquery that returns N-value rows is subject to the same conversion.

更一般地说，将N个值与返回N个值行的子查询进行比较，将接受相同的转换。

If oe_i and ie_i represent corresponding outer and inner expression values, this subquery comparison:

如果oe_i和ie_i表示对应的外部和内部表达式值，则子查询比较:

For simplicity, the following discussion assumes a single pair of outer and inner expression values.

为了简单起见，下面的讨论假设有一对外部和内部表达式值。

The conversion just described has its limitations. It is valid only if we ignore possible NULL values. That is, the “pushdown” strategy works as long as both of these conditions are true:

刚才描述的转换有其局限性。只有当我们忽略可能的NULL值时，它才有效。也就是说，只要这两个条件都成立，“下推”策略就有效:

outer_expr and inner_expr cannot be NULL.
outer_expr和inner_expr不能为NULL。
You need not distinguish NULL from FALSE subquery results. If the subquery is a part of an OR or AND expression in the WHERE clause, MySQL assumes that you do not care. Another instance where the optimizer notices that NULL and FALSE subquery results need not be distinguished is this construct:
您不需要区分NULL和FALSE子查询结果。如果子查询是WHERE子句中的OR或AND表达式的一部分，MySQL会假设您不关心。另一个优化器注意到不需要区分NULL和FALSE子查询结果的实例是这样的构造:

In this case, the WHERE clause rejects the row whether IN (subquery) returns NULL or FALSE.

在这种情况下，WHERE子句无论In(子查询)返回NULL还是FALSE都将拒绝行。

When either or both of those conditions do not hold, optimization is more complex.

当其中一个或两个条件都不成立时，优化就更加复杂了。

Suppose that outer_expr is known to be a non-NULL value but the subquery does not produce a row such that outer_expr = inner_expr. Then outer_expr IN (SELECT ...) evaluates as follows:

假设已知outer_expr是一个非null值，但是子查询没有生成这样的行，即outer_expr = inner_expr。然后outer_expr IN (SELECT…)计算如下:

NULL, if the SELECT produces any row where inner_expr is NULL
NULL，如果SELECT生成任何inner_expr为NULL的行
FALSE, if the SELECT produces only non-NULL values or produces nothing
FALSE，如果SELECT只产生非null值或不产生任何值

In this situation, the approach of looking for rows with outer_expr = inner_expr is no longer valid.

在这种情况下，查找带有outer_expr = inner_expr的行的方法不再有效。

It is necessary to look for such rows, but if none are found, also look for rows where inner_expr is NULL. Roughly speaking, the subquery can be converted to something like this:

有必要查找这样的行，但是如果没有找到，也要查找inner_expr为NULL的行。粗略地说，子查询可以被转换成这样:

The need to evaluate the extra IS NULL condition is why MySQL has the ref_or_null access method:

需要评估额外的IS NULL条件是为什么MySQL有ref_or_null访问方法:

The unique_subquery and index_subquery subquery-specific access methods also have “or NULL” variants.

与unique_subquery和index_subquery子查询相关的访问方法也有“或NULL”变量。

The additional OR ... IS NULL condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.

额外的OR…IS NULL条件使查询执行稍微复杂一些(并且子查询中的一些优化变得不适用)，但通常这是可以容忍的。

The situation is much worse when outer_expr can be NULL. According to the SQL interpretation of NULL as “unknown value,” NULL IN (SELECT inner_expr ...) should evaluate to:

当outer_expr可以为NULL时，情况会更糟。根据SQL对NULL作为“未知值”的解释，NULL IN (SELECT inner_expr…)应该计算为:

NULL, if the SELECT produces any rows
如果SELECT产生任何行，则为NULL
FALSE, if the SELECT produces no rows
如果SELECT不产生任何行，则为FALSE

For proper evaluation, it is necessary to be able to check whether the SELECT has produced any rows at all, so outer_expr = inner_expr cannot be pushed down into the subquery.

为了进行正确的计算，必须能够检查SELECT是否产生了任何行，因此不能将outer_expr = inner_expr下推到子查询中。

This is a problem because many real world subqueries become very slow unless the equality can be pushed down.

这是一个问题，因为许多真实世界的子查询变得非常慢，除非相等可以下推。

Essentially, there must be different ways to execute the subquery depending on the value of outer_expr.

本质上，根据outer_expr的值，必须有不同的方式来执行子查询。

The optimizer chooses SQL compliance over speed, so it accounts for the possibility that outer_expr might be NULL:

优化器选择SQL遵从性而不是速度，所以它考虑了outer_expr可能是NULL的可能性:

If outer_expr is NULL, to evaluate the following expression, it is necessary to execute the SELECT to determine whether it produces any rows:

如果outer_expr为NULL，为了计算下面的表达式，需要执行SELECT来确定它是否产生任何行:

It is necessary to execute the original SELECT here, without any pushed-down equalities of the kind mentioned previously.

这里需要执行原始的SELECT，而不需要前面提到的那种下推等式。

On the other hand, when outer_expr is not NULL, it is absolutely essential that this comparison:

另一方面，当outer_expr不为NULL时，这种比较是绝对必要的:

Be converted to this expression that uses a pushed-down condition:

转换为使用下推条件的表达式:

Without this conversion, subqueries are slow.

如果没有这种转换，子查询就会很慢。

To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within “trigger” functions. Thus, an expression of the following form:

为了解决是否将条件下推到子查询中的难题，条件被包装在“触发器”函数中。因此，有如下形式的表达式

More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:

更一般地，如果子查询比较是基于几对外部和内部表达式，则转换采用以下比较:

Each trigcond(X) is a special function that evaluates to the following values:

每个trigcond(X)都是一个特殊的函数，其计算结果如下:

X when the “linked” outer expression oe_i is not NULL
当“链接”外部表达式oe_i不为NULL时
TRUE when the “linked” outer expression oe_i is NULL
当“链接”外部表达式oe_i为NULL时为TRUE

Note

Trigger functions are not triggers of the kind that you create with CREATE TRIGGER.

触发器函数不是用create Trigger创建的那种触发器。

Equalities that are wrapped within trigcond() functions are not first class predicates for the query optimizer.

包装在trigcond()函数中的相等项不是查询优化器的第一类谓词。

Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume any trigcond(X) to be an unknown function and ignore it.

大多数优化不能处理可能在查询执行时打开或关闭的谓词，因此它们将任何trigcond(X)假设为未知函数并忽略它。

Triggered equalities can be used by those optimizations:

触发等式可以被这些优化使用:

Reference optimizations: trigcond(X=Y [OR Y IS NULL]) can be used to construct ref, eq_ref, or ref_or_null table accesses.
引用优化:trigcond(X=Y [OR Y IS NULL])可以用来构造ref, eq_ref或ref_or_null表访问。
Index lookup-based subquery execution engines: trigcond(X=Y) can be used to construct unique_subquery or index_subquery accesses.
基于索引查找的子查询执行引擎:可以使用trigcond(X=Y)构造unique_subquery或index_subquery访问。
Table-condition generator: If the subquery is a join of several tables, the triggered condition is checked as soon as possible.
表条件生成器:如果子查询是几个表的连接，则会尽快检查触发的条件。

When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off.

当优化器使用一个触发条件来创建某种基于索引查找的访问时(对于前面列表的前两个项)，它必须有一个备用策略来应对条件关闭时的情况。

This fallback strategy is always the same: Do a full table scan. In EXPLAIN output, the fallback shows up as Full scan on NULL key in the Extra column:

这个回退策略总是相同的:执行全表扫描。在EXPLAIN输出中，回退显示为在Extra列的NULL键上的Full scan:

If you run EXPLAIN followed by SHOW WARNINGS, you can see the triggered condition:

如果你运行EXPLAIN，然后是SHOW WARNINGS，你会看到触发的条件:

The use of triggered conditions has some performance implications. A NULL IN (SELECT ...) expression now may cause a full table scan (which is slow) when it previously did not.

触发条件的使用对性能有一定的影响。NULL IN (SELECT…)表达式现在可能导致全表扫描(这是缓慢的)，而以前没有。

This is the price paid for correct results (the goal of the trigger-condition strategy is to improve compliance, not speed).

这是为正确的结果所付出的代价(触发条件策略的目标是提高遵从性，而不是速度)。

For multiple-table subqueries, execution of NULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL.

对于多表子查询，NULL IN (SELECT…)的执行特别慢，因为连接优化器不会针对外部表达式为NULL的情况进行优化。

It assumes that subquery evaluations with NULL on the left side are very rare, even if there are statistics that indicate otherwise.

它假设左边带有NULL的子查询计算非常罕见，即使有统计数据表明不是这样。

On the other hand, if the outer expression might be NULL but never actually is, there is no performance penalty.

另一方面，如果外层表达式可能是NULL，但实际上从未是NULL，则不会有性能损失。

To help the query optimizer better execute your queries, use these suggestions:

为了帮助查询优化器更好地执行查询，请使用以下建议:

Declare a column as NOT NULL if it really is. This also helps other aspects of the optimizer by simplifying condition testing for the column.
声明一个列为NOT NULL(如果它真的是)。通过简化列的条件测试，这也有助于优化器的其他方面。

If you need not distinguish a NULL from FALSE subquery result, you can easily avoid the slow execution path. Replace a comparison that looks like this:
如果不需要区分NULL和FALSE子查询结果，可以轻松避免缓慢的执行路径。替换像这样的比较:

Then NULL IN (SELECT ...) is never evaluated because MySQL stops evaluating AND parts as soon as the expression result is clear.

那么NULL IN (SELECT…)永远不会被计算，因为MySQL一旦表达式结果清晰，就会停止计算AND部分。

Another possible rewrite:

另一个可能的修改:

This would apply when you need not distinguish NULL from FALSE subquery results, in which case you may actually want EXISTS.

这适用于不需要区分NULL和FALSE子查询结果的情况，在这种情况下，您可能需要EXISTS。

The subquery_materialization_cost_based flag of the optimizer_switch system variable enables control over the choice between subquery materialization and IN-to-EXISTS subquery transformation.

optimizer_switch系统变量的subquery_materialization_cost_based标志允许控制子查询物化和IN-to-EXISTS子查询转换之间的选择。

See Section 8.9.2, “Switchable Optimizations”.

KING BOMB

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MySQL-5.7-8.2.2.3 Optimizing Subqueries with the EXISTS Strategy

Certain optimizations are applicable to comparisons that use theIN(or=ANY) operator to test subquery results.某些优化适用于使用IN(或=ANY)操作符测试子查询结果的比较。This section discusses these optimizations, particularly with regard to the challenges thatNULLvalues pres...
复制链接

扫一扫