sql查询书写顺序和执行顺序&ORDER BY和LIMIT分页时的数据重复问题 -- 转载

最新推荐文章于 2024-06-27 16:15:36 发布

后端小哥_哈哈

最新推荐文章于 2024-06-27 16:15:36 发布

阅读量1.1k

点赞数 1

分类专栏： Mybatis 文章标签： sql

原文链接：https://www.dazhuanlan.com/2019/12/25/5e029a08047b1/

版权

Mybatis 专栏收录该内容

11 篇文章 1 订阅

订阅专栏

ORDER BY和LIMIT分页时的数据重复问题 -- 转载

本篇文章为转载哈—大自然的搬运工，hahhahah 谢谢

原创链接
https://www.dazhuanlan.com/2019/12/25/5e029a08047b1/

sql的查询书写顺序

查询语句书写顺序：select 列 from 表【where- group by- having- order by-limit】

sql的执行顺序

查询语句执行顺序：from 表 where -group by -having - select - order by-limit

mysql中orderby和limit同时使用的bug

原因分析
在翻页时，下一页会出现上一页出现过的数据。那么这是怎么回事呢？先看一下sql语句：

SELECT * FROM tb_settlement ORDER BY type,code LIMIT offset,pageSize

好像没有啥问题，但是既然出现重复数据，那很显然就是LIMIT分页语句没有生效，或者未按我预想的方向生效。仔细想一下，这条sql语句的预想结果是首先根据ORDER BY先后按type和code进行排序，然后取出指定偏移位置之后的数据。但是事实并没有按这种结果显示，所以第一猜测就是ORDER BY和LIMIT之间没有协同好。

MySQL官方手册说明

我的mysql版本是5.6的，所以我查询了mysql5.6版本的官方手册，下面是ORDER BY和LIMIT的说明：

1、ORDER BY Optimization

2、LIMIT Query Optimization

LIMIT的用法说明

我们先不看ORDER BY，而是先看一下LIMIT的用法。

If you combine LIMIT row_coun with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.

这里主要的意思是说ORDER BY和LIMIT结合使用时，mysql在排序到LIMIT指定的数时就不会继续对之后的数据进行排序了，似乎对我们的问题没有多少帮助。接着往后看：

If an index is not used for ORDER BY but a LIMIT clause is also present, the optimizer may be able to avoid using a merge file and sort the rows in memory using an in-memory filesort operation. For details, see The In-Memory filesort Algorithm.

这句话似乎终于说到我们关心的东西了，当我们对非索引的列使用ORDER BY时，优化器会进行in-memory的文件排序操作。继续看下一句话：

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

这里终于给我们下了个结论，当ORDER BY后的列出现重复值或者说相同值时，那这些数据就不会按确定的顺序列出，每次操作后显示的顺序可能并不一样。
　　到此可以舒口气了，不是我的bug，是mysql自己进行优化了。

ORDER BY的用法说明

接下来再来看看ORDER BY Optimization，其实在说到in-memory时，mysql就给出了in-memory的算法链接了，这个链接就是到ORDER BY Optimization页面的。

MySQL has multiple filesort algorithms for sorting and retrieving results. The original algorithm uses only the ORDER BY columns. The modified algorithm uses not just the ORDER BY columns, but all columns referenced by the query. There is also an algorithm for small result sets that sorts in memory using the sort buffer as a priority queue without a merge file.

这里说，mysql其实有多种文件排序的算法，最初的算法是只根据ORDER BY指定的列排序，后来又修改为不只按ORDER BY指定的列，而是会把查询语句所有相关联的列都指定进来。而现在，对于小的结果集，mysql又采用priority queue来进行排序。
　　这里是这个算法的一些说明：

The sort buffer has a size of sort_buffer_size. If the sort elements for N rows are small enough to fit in the sort buffer (M+N rows if M was specified), the server can avoid using a merge file and performs an in-memory sort by treating the sort buffer as a priority queue:
Scan the table, inserting the select list columns from each selected row in sorted order in the queue. If the queue is full, bump out the last row in the sort order.
Return the first N rows from the queue. (If M was specified, skip the first M rows and return the next N rows.)
Absent that optimization, the server performs this operation by using a merge file for the sort:
Scan the table, repeating these steps through the end of the table:
Select rows until the sort buffer is filled.
Write the first N rows in the buffer (M+N rows if M was specified) to a merge file.
Sort the merge file and return the first N rows. (If M was specified, skip the first M rows and return the next N rows.)
The cost of the table-scan operation is the same for the queue and merge-file methods, so the optimizer chooses between methods based on other costs:
The queue method involves more CPU for inserting rows into the queue in order.
The merge-file method has I/O costs to write and read the file and CPU cost to sort it.
The optimizer considers the balance between these factors for particular values of M, N, and the row size.
An ORDER BY with and without LIMIT may return rows in different orders, as discussed in Section 8.2.1.16, “LIMIT Query Optimization”.

上面主要对比了算法的排序开销，priority queue 使用了堆排序的排序方法，而堆排序是一种不是那么稳定的排序方法，对于相同的值可能排序出来的结果和读出来的数据顺序并不一致。