MySQL Select and Sort Status Variables

最新推荐文章于 2024-03-14 12:58:16 发布

转载最新推荐文章于 2024-03-14 12:58:16 发布 · 1.6k 阅读

文章标签：

#variables #mysql #table #join #optimization #query

数据库（包含了应用程序与数据库之间的操作）专栏收录该内容

71 篇文章

订阅专栏

本文详细解析了MySQL查询过程中涉及的各种状态变量，包括不同类型的SELECT语句和排序操作。通过理解Select_scan、Select_range等变量的应用场景，可以有效评估并优化数据库查询效率。

MySQL Select and Sort Status Variables

MySQL SHOW STATUS; and mysqlreport using the -sas command line option show nine status variables which are counters for various types of SELECT statements MySQL has executed and rows MySQL has sorted:

Select_scan
Select_range
Select_full_join
Select_range_check
Select_full_range_join
Sort_scan
Sort_range
Sort_merge_passes
Sort_rows

In particular, Select_scan and Select_range apply to either single table queries or the first table in the join plan for a multiple table query. Select_full_join, Select_range_check, and Select_full_range_join apply to the second and subsequent tables in the join plan for a multiple table query. The Sort variables apply to any queries that sort rows. The distinction between where the variables apply is explained first, which will help clarify the subsequent explanation of each variable.

Beginning with mysqlreport v1.3 the variables are listed in the same order as above. This document will make references to mysqlreport, although the same data can be obtained from the SHOW STATUS; command and some basic math.

The Distinction Between Where the Variables Apply

An over-simplified but sufficient way to classify all queries are those that select only one table and those that select multiple tables. Regarding the nine Select and Sort variables, single and multiple table queries have one thing in common: The first table. In a single table query the first table is the only table. In a multiple table query the first table listed, the top-most row, in the output from EXPLAIN is the first table. Select_scan and Select_range only apply to the first table.

For multiple table queries Select_full_join, Select_range_check, and Select_full_range_join only apply to the second and subsequent tables (second and subsequent rows) listed in the output from EXPLAIN.

Sort_scan, Sort_range, Sort_merge_passes, and Sort_rows apply to any query that uses ORDER BY or GROUP BY, regardless of how many tables.

For example:

   mysql> EXPLAIN SELECT * FROM tbl2, tbl1 WHERE tbl1.col1 = tbl2.col2 ORDER BY tbl1.col2;
   +-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
   +-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
   | SIMPLE      | tbl2  | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary; Using filesort |
   | SIMPLE      | tbl1  | ALL  | NULL          | NULL | NULL    | NULL |   27 | Using where                     |
   +-------------+-------+------+---------------+------+---------+------+------+---------------------------------+

In the above join plan, tbl2 is the first table and tbl1 is the second table. This query will also effect some of the Sort variables because it uses ORDER BY. Although both tables cause a table scan, tbl2 will apply to Select_scan but tbl1 will apply to Select_full_join. In short, the nine Select and Sort status variables apply where:

First table or only table: Select_scan and Select_range
Second and subsequent tables: Select_full_join, Select_range_check, and Select_full_range_join
Any query that uses ORDER BY or GROUP BY: Sort_scan, Sort_range, Sort_merge_passes, and Sort_rows

Select_scan

Select_scan refers to a table that is completely read in sequence from the hard drive. For such tables EXPLAIN lists "ALL" in the "type" column. As we all know table scans are not desirable because they're slow (because hard drives are slow). However, table scans are prevalent. It's not uncommon to see a server where 50% of all SELECT queries are Select_scan. The fundamental reason why a SELECT results in a table scan is because no index on the table can satisfied the conditions of the query (i.e., everything after WHERE), or there are no indexes so all queries will result in a table scan. From a performance perspective it's safe to say you always want to decrease this value. However, in some case it might increase after optimization because the server is then able to do more. Ultimately, it will have to decrease again when qps (queries per second) gets higher.

Select_range

Select_range refers to a table that was read from the hard drive only in the necessary places to satisfy a limited range of conditions. For such tables EXPLAIN lists type: range. An index tells MySQL where the necessary places to read are, which saves time that would otherwise be wasted on disk seeks. Therefore, Select_range is a lot faster than Select_scan. The queries SELECT * FROM tbl1 WHERE col1 BETWEEN 5 AND 13; and SELECT * FROM tbl1 WHERE col1 > 5 AND col1 < 13; specify a range that MySQL could use if col1 is indexed, otherwise MySQL would have to table scan. The MySQL manual has a section on range optimization that briefly outlines the conditions in which MySQL can use a range.

Select_full_join

Select_full_join is the same as Select_scan with the difference that Select_full_join applies to the second and subsequent tables in the join plan for a multiple table query. For such tables EXPLAIN lists type: ALL. Select_full_join results if there are no indexes on the table, or no indexes can be used to join the table. A query like SELECT * FROM tbl1, tbl2 WHERE tbl1.col1 = tbl2.col1; without any indexes results in a Select_scan and a Select_full_join; Select_scan for the first table, Select_full_join for the second. Select_full_join is no more desirable than Select_scan. Together the two are even worse. When EXPLAIN lists type: ALL for each table in a join "this output indicates that MySQL is generating a Cartesian product of all the tables; that is, every combination of rows" (MySQL manual). In simpler terms: Two tables of 10 rows each joined together does not result in 20 rows, it results in 100 rows (10 multiplied by 10). In real-world applications tables usually have at least a few thousand rows, so the cross product (a.k.a. Cartesian product) of even two small tables can quickly become enormous. The worst I've ever seen was a three table join that caused a cross product of 112 billion rows (actually the query never finished before it caused the server to halt). Consequently, the per second rate and percentage of SELECT values for Select_full_join as shown by mysqlreport should be a lot less than 1. (The counter value can be high as long as the rate and percentage are less than 1.) Although it's beyond the scope of this document to discuss indexing tables for joins, a quick solution to the query would be an index on tbl1.col1. The query still causes a Select_scan for the first table in the join plan, but avoids the Select_full_join for thee second table, and the number of rows produced decreases significantly.

Select_range_check

Select_range_check is a little better than Select_full_join and uses the same range principles as Select_range. The difference is Select_range_check is not sure whether it can use a range to join the table so it keeps checking in case it finds that it can. This "uncertainty" is an effect of the join: With Select_range there's only one table so MySQL can be certain ahead of time. With multiple tables, the preceding tables may alter the range conditions therefore MySQL cannot be certain ahead of time. For such tables EXPLAIN still lists type: ALL because a type: range is not certain. For such tables MySQL also lists "Range checked for each record (index map: #)" in the "Extra" column. Like Select_range at least one of the tables requires and index for this optimization to be possible, otherwise the table will probably cause a Select_full_join. For the query SELECT * FROM tbl1, tbl2 WHERE tbl1.col1 > tbl2.col1; if tbl2.col1 is indexed then MySQL can Select_range_check:

   mysql> EXPLAIN SELECT * FROM tbl1, tbl2 WHERE tbl1.col1 > tbl2.col1;
   +-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                          |
   +-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
   | SIMPLE      | tbl1  | ALL  | NULL          | NULL | NULL    | NULL |   27 |                                                |
   | SIMPLE      | tbl2  | ALL  | col1          | NULL | NULL    | NULL |   18 | Range checked for each record (index map: 0x1) |
   +-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+

With this simple query, MySQL will check the tbl2.col1 index about 27 times, once for each row in tbl1. For each value tbl1.col1 MySQL will read only the rows in tbl2 where tbl2.col1 < tbl1.col1. (With MySQL 5.x, an index merge may be checked instead of a range.) If MySQL does use a range to join the table it will not increment Select_range; it still only increments Select_range_check.

Select_full_range_join

Select_full_range_join is the same as Select_range_check except that MySQL is certain it can join the table using a range. For such tables EXPLAIN lists type: range. Like Select_range, Select_full_range_join requires an index and is much faster than Select_full_join and potentially faster than Select_range_check. The same range optimization principles for Select_range apply to the table being joined. A simple example:

   mysql> EXPLAIN SELECT * FROM tbl1, tbl2 WHERE tbl1.col1 = 10 AND tbl2.col1 > 13;
   +-------------+-------+-------+---------------+------+---------+-------+------+-------------+
   | select_type | table | type  | possible_keys | key  | key_len | ref   | rows | Extra       |
   +-------------+-------+-------+---------------+------+---------+-------+------+-------------+
   | SIMPLE      | tbl1  | ref   | col1          | col1 | 5       | const |    1 | Using where |
   | SIMPLE      | tbl2  | range | col1          | col1 | 5       | NULL  |    1 | Using where |
   +-------------+-------+-------+---------------+------+---------+-------+------+-------------+

Since the range condition doesn't change, MySQL can certainly use the tbl2.col1 index to read only the necessary records for tbl2, then join these results to tbl1.

Sort_scan and Sort_range

Queries that sort rows, either by using ORDER BY or GROUP BY (except GROUP BY with ORDER BY NULL), undergo three primary steps:

Find rows based on WHERE conditions
Sort rows
Read rows in sorted order

Since data is stored in random order, these three steps make logical sense because the query is not going to find the matching rows in the order they need to be returned. Unless step two can be bypassed (which is possible) a Sort_scan and Sort_range will be incurred in step three. If step one results in Select_scan, step three will result in Sort_scan. If step one Select_range, then step three Sort_range. Essentially, there is no functional difference between Sort_scan and Sort_range: In either case only the necessary rows are read in sorted order. Therefore the performance implications of Sort_scan and Sort_range are no different: The effect on performance in either case is usually unnoticeable.

Sort_merge_passes

Sort_merge_passes involves step two of the query-sort process. MySQL first tries to sort all the rows in memory, the allotment of which is control by the sort_buffer_size system variable. If sort_buffer_size is not large enough for all the rows, MySQL creates a temporary file to store the sorted rows, however the temporary file will need to be sorted too after all rows have been found in step one. The re-sorting of the temporary file counts toward Sort_merge_passes. MySQL actually creates a second temporary file into which it puts the sorted contents of the first file. Therefore, it's common to see almost exactly twice as many created temporary files as Sort_merge_passes. Technically, this is a slow process, however it's impact on performance is usually unnoticeable. Increasing sort_buffer_size can decrease the number of Sort_merge_passes and also the number of temporary files created.

Sort_rows

Sort_rows is simply a total count of the number of rows sorted in step two. Since step two can be bypassed in some cases, Sort_rows is not entirely inclusive. Also since Sort_scan and Sort_range in step two are essentially no different, the Sort_rows value is not very indicative of anything. Consequently, it was removed from mysqlreport in version 1.3. Sufficient to say most servers sort hundreds of millions of rows.