MySQL-5.7-8.2.1.15 GROUP BY Optimization

最新推荐文章于 2024-10-02 06:14:43 发布

KING BOMB

最新推荐文章于 2024-10-02 06:14:43 发布

阅读量127

点赞数

分类专栏： Mysql 文章标签： mysql 数据库 sql

原文链接：https://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html

版权

Mysql 专栏收录该内容

142 篇文章 2 订阅

订阅专栏

The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any).

满足GROUP BY子句的最一般方法是扫描整个表，并创建一个新的临时表，其中每个组的所有行都是连续的，然后使用这个临时表来发现组并应用聚合函数(如果有的话)。

In some cases, MySQL is able to do much better than that and avoid creation of temporary tables by using index access.

在某些情况下，MySQL可以做得更好，并通过使用索引访问避免创建临时表。

The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a BTREE index, but not for a HASH index).

使用GROUP BY索引的最重要的先决条件是，所有GROUP BY列引用同一索引的属性，并且索引按顺序存储键(例如，对于BTREE索引是正确的，但对于HASH索引则不是)。

Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.

是否可以用索引访问替换临时表还取决于查询中使用了索引的哪些部分、为这些部分指定的条件以及所选的聚合函数。

There are two ways to execute a GROUP BY query through index access, as detailed in the following sections. The first method applies the grouping operation together with all range predicates (if any). The second method first performs a range scan, and then groups the resulting tuples.

有两种方法可以通过索引访问来执行GROUP BY查询，具体内容将在下面的小节中介绍。第一种方法与所有范围谓词(如果有的话)一起应用分组操作。第二种方法首先执行范围扫描，然后对生成的元组进行分组

In MySQL, GROUP BY is used for sorting, so the server may also apply ORDER BY optimizations to grouping. However, relying on implicit or explicit GROUP BY sorting is deprecated.

在MySQL中，GROUP BY用于排序，因此服务器也可能对分组应用ORDER BY优化。但是，不建议使用隐式或显式GROUP BY排序。

See Section 8.2.1.14, “ORDER BY Optimization”.

Loose Index Scan

松散的索引扫描

The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns.

处理GROUP BY的最有效方法是使用索引直接检索分组列。

With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE).

使用这种访问方法，MySQL使用键排序的某些索引类型的属性(例如，BTREE)。

This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions.

此属性允许在索引中使用查找组，而不必考虑索引中满足所有WHERE条件的所有键。

This access method considers only a fraction of the keys in an index, so it is called a Loose Index Scan.

这种访问方法只考虑索引中的一小部分键，因此称为松散索引扫描。

When there is no WHERE clause, a Loose Index Scan reads as many keys as the number of groups, which may be a much smaller number than that of all keys.

没有WHERE子句时，Loose Index Scan读取的键数与组的数量一样多，组的数量可能比所有键的数量小得多。

If the WHERE clause contains range predicates (see the discussion of the range join type in Section 8.8.1, “Optimizing Queries with EXPLAIN”), a Loose Index Scan looks up the first key of each group that satisfies the range conditions, and again reads the smallest possible number of keys. This is possible under the following conditions:

如果WHERE子句包含范围谓词(请参阅第8.8.1节“用EXPLAIN优化查询”中对范围连接类型的讨论)，则松散索引扫描将查找满足范围条件的每个组的第一个键，并再次读取可能的最小键数。这在以下情况下是可能的:

The query is over a single table.
查询是在单个表上进行的。
The GROUP BY names only columns that form a leftmost prefix of the index and no other columns. (If, instead of GROUP BY, the query has a DISTINCT clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) For example, if a table t1 has an index on (c1,c2,c3), Loose Index Scan is applicable if the query has GROUP BY c1, c2. It is not applicable if the query has GROUP BY c2, c3 (the columns are not a leftmost prefix) or GROUP BY c1, c2, c4 (c4 is not in the index).
GROUP BY只命名构成索引最左边前缀的列，不命名其他列。(如果查询没有使用GROUP BY，而是使用DISTINCT子句，则所有不同的属性都指向构成索引最左边前缀的列。)例如，如果表t1有(c1,c2,c3)上的索引，则如果查询有GROUP BY c1,c2，则可以使用Loose index Scan。如果查询具有GROUP BY c2, c3(列不是最左边的前缀)或GROUP BY c1, c2, c4 (c4不在索引中)，则不适用。
The only aggregate functions used in the select list (if any) are MIN() and MAX(), and all of them refer to the same column. The column must be in the index and must immediately follow the columns in the GROUP BY.
选择列表中使用的惟一聚合函数(如果有的话)是MIN()和MAX()，它们都引用同一列。该列必须在索引中，并且必须紧跟着GROUP BY中的列。
Any other parts of the index than those from the GROUP BY referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument of MIN() or MAX() functions.
除了查询中引用的GROUP BY之外，索引的任何其他部分都必须是常量(即，它们必须与常量相等地引用)，MIN()或MAX()函数的参数除外。
For columns in the index, full column values must be indexed, not just a prefix. For example, with c1 VARCHAR(20), INDEX (c1(10)), the index uses only a prefix of c1 values and cannot be used for Loose Index Scan.
对于索引中的列，必须对整个列值建立索引，而不仅仅是一个前缀。例如，对于c1 VARCHAR(20)， INDEX (c1(10))，索引仅使用c1值的前缀，不能用于松散索引扫描。

If Loose Index Scan is applicable to a query, the EXPLAIN output shows Using index for group-by in the Extra column.

如果松散索引扫描适用于查询，那么EXPLAIN输出将在Extra列中显示Using Index for group-by。

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The Loose Index Scan access method can be used for the following queries:

假设表t1(c1,c2,c3,c4)上有一个索引idx(c1,c2,c3)。松散索引扫描访问方法可以用于以下查询:

The following queries cannot be executed with this quick select method, for the reasons given:

下查询不能使用这个快速选择方法执行，原因如下:

There are aggregate functions other than MIN() or MAX():

除了MIN()和MAX()，还有其他聚合函数:

The columns in the GROUP BY clause do not form a leftmost prefix of the index:

GROUP BY子句中的列不构成索引最左边的前缀:

The query refers to a part of a key that comes after the GROUP BY part, and for which there is no equality with a constant:

该查询指向GROUP BY部分后面的键的一部分，并且与常量不相等:

Were the query to include WHERE c3 = const, Loose Index Scan could be used.

如果查询包含WHERE c3 = const，则可以使用松散索引扫描。

The Loose Index Scan access method can be applied to other forms of aggregate function references in the select list, in addition to the MIN() and MAX() references already supported:

除了已经支持的MIN()和MAX()引用之外，Loose Index Scan访问方法还可以应用于选择列表中其他形式的聚合函数引用:

AGV(DISTINCT), SUM(DISTINCT), and COUNT(DISTINCT) are supported. AVG(DISTINCT) and SUM(DISTINCT) take a single argument. COUNT(DISTINCT) can have more than one column argument.
支持AVG(DISTINCT)、SUM(DISTINCT)和COUNT(DISTINCT)。AVG(DISTINCT)和SUM(DISTINCT)使用一个参数。COUNT(DISTINCT)可以有多个列参数。
There must be no GROUP BY or DISTINCT clause in the query.
查询中必须没有GROUP BY或DISTINCT子句。
The Loose Index Scan limitations described previously still apply.
前面描述的松散索引扫描限制仍然适用。

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The Loose Index Scan access method can be used for the following queries:

假设表t1(c1,c2,c3,c4)上有一个索引idx(c1,c2,c3)。松散索引扫描访问方法可以用于以下查询:

Tight Index Scan

紧凑索引扫描

A Tight Index Scan may be either a full index scan or a range index scan, depending on the query conditions.

紧密索引扫描可以是全索引扫描，也可以是范围索引扫描，这取决于查询条件

When the conditions for a Loose Index Scan are not met, it still may be possible to avoid creation of temporary tables for GROUP BY queries.

当不满足松散索引扫描的条件时，仍然可以避免为GROUP BY查询创建临时表。

If there are range conditions in the WHERE clause, this method reads only the keys that satisfy these conditions. Otherwise, it performs an index scan.

如果WHERE子句中有范围条件，则此方法只读取满足这些条件的键。否则，执行索引扫描。

Because this method reads all keys in each range defined by the WHERE clause, or scans the whole index if there are no range conditions, it is called a Tight Index Scan.

因为这个方法读取WHERE子句定义的每个范围中的所有键，或者如果没有范围条件，则扫描整个索引，所以它被称为紧凑索引扫描。

With a Tight Index Scan, the grouping operation is performed only after all keys that satisfy the range conditions have been found.

使用紧密索引扫描，只有在找到满足范围条件的所有键后才执行分组操作。

For this method to work, it is sufficient that there be a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the GROUP BY key.

要使此方法起作用，对于引用GROUP BY键之前或中间的键部分的查询中的所有列，只要有一个常数相等条件就足够了。

The constants from the equality conditions fill in any “gaps” in the search keys so that it is possible to form complete prefixes of the index.

相等条件中的常数填充搜索键中的任何“空白”，因此可以形成索引的完整前缀。

These index prefixes then can be used for index lookups. If the GROUP BY result requires sorting, and it is possible to form search keys that are prefixes of the index, MySQL also avoids extra sorting operations because searching with prefixes in an ordered index already retrieves all the keys in order.

然后可以使用这些索引前缀进行索引查找。如果GROUP BY结果需要排序，并且可以形成索引前缀的搜索键，MySQL也避免了额外的排序操作，因为在有序索引中用前缀搜索已经按顺序检索了所有键。

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The following queries do not work with the Loose Index Scan access method described previously, but still work with the Tight Index Scan access method.

假设表t1(c1,c2,c3,c4)上有一个索引idx(c1,c2,c3)。下面的查询不能使用前面描述的松散索引扫描访问方法，但仍然可以使用紧密索引扫描访问方法。

There is a gap in the GROUP BY, but it is covered by the condition c2 = 'a':

在GROUP BY中有一个间隙，但是它被条件c2 = 'a'覆盖了