MySQL大数据量分页limit优化

最新推荐文章于 2024-07-22 09:07:23 发布

水墨之白

最新推荐文章于 2024-07-22 09:07:23 发布

阅读量1.7k

点赞数 2

分类专栏： MySQL 文章标签： MySQL 分页 limit

本文链接：https://blog.csdn.net/LJJZJ/article/details/103074816

版权

MySQL 专栏收录该内容

15 篇文章 1 订阅

订阅专栏

一、limit用法

SELECT * FROM t LIMIT 10,10;

第一个参数指定第一个返回记录行的偏移量
第二个参数指定返回记录行的最大数目
如果只给定一个参数：它表示返回最大的记录行数目
第二个参数为 -1 表示检索从某一个偏移量到记录集的结束所有的记录行
初始记录行的偏移量是 0(而不是 1)

所以上面SQL的含义是查询数据库第10条到第20条数据

对于小的偏移量，直接使用limit来查询没有什么问题，但随着数据量的增大，越往后分页，limit语句的偏移量就会越大，速度也会明显变慢

二、limit在大数据量下的表现

参考网上一个案例：
表说明：

表名：order，订单表
字段情况：该表一共37个字段，不包含text等大型数据，最大为varchar(500)，id字段为索引，且为递增
数据量：5709294

select * from order_table where userId = 3 order by id limit 10000,10;

三次查询时间分别为：

3040 ms
3063 ms
3018 ms

针对这种查询方式，下面测试查询记录量对时间的影响：

select * from order_table where userId = 3 order by id limit 10000,1;
select * from order_table where userId = 3 order by id limit 10000,10;
select * from order_table where userId = 3 order by id limit 10000,100;
select * from order_table where userId = 3 order by id limit 10000,1000;
select * from order_table where userId = 3 order by id limit 10000,10000;

三次查询时间如下：

查询1条记录：3072ms 3092ms 3002ms
查询10条记录：3081ms 3077ms 3032ms
查询100条记录：3118ms 3200ms 3128ms
查询1000条记录：3412ms 3468ms 3394ms
查询10000条记录：3749ms 3802ms 3696ms

在查询记录量低于100时，查询时间基本没有差距，随着查询记录量越来越大，所花费的时间也会越来越多

针对偏移量的测试：

select * from order_table where userId = 3 order by id limit 100,100;
select * from order_table where userId = 3 order by id limit 1000,100;
select * from order_table where userId = 3 order by id limit 10000,100;
select * from order_table where userId = 3 order by id limit 100000,100;
select * from order_table where userId = 3 order by id limit 100000,100;

三次查询时间如下：

查询100偏移：25ms 24ms 24ms
查询1000偏移：78ms 76ms 77ms
查询10000偏移：3092ms 3212ms 3128ms
查询100000偏移：3878ms 3812ms 3798ms
查询1000000偏移：14608ms 14062ms 14700ms

随着查询偏移的增大，尤其查询偏移大于10万以后，查询时间急剧增加

三、limit优化

1. 使用覆盖索引

select orderType, orderAmt from order_table limit 10000, 10;

假设我们只需要查询上面两个字段，可以建立联合索引（orderType, orderAmt ），这样可以让查询走联合索引，加快性能。但是我们这个表有37个字段，这么优化明显不合适，而且这种方式加快性能方式并不明显。

2. 假设数据表的id是连续递增的，可以这样写

select * from order_table where id >= 1000000 limit 100;
select * from order_table where id between 10000000 and 1000100 limit 100

直接将对应的offset计算出来，作为where条件，这样可以利用主键索引，性能提升非常明显。

上面这种方案存在严重问题，数据库使用自增id，这没问题，但是，可能存在部分数据被删除过，也就是id并不连续，所以查出的数据并不是我们想要的。

优化方法：
3. 每次分页查询记录上一次分页最后一条id
下次查询：

select * from order_table where id >= 上页最大id limit 100;

这样处理的话，id就算不是连续也是没有问题的，但是必须自增，由于数据库主键我们一般设置成连续自增，所以这种方式可以大幅度优化性能

4. 使用子查询
上述方法每次都需要记录上一次分页的最大id，比较麻烦，我们可以使用子查询代替：

select * from order_table where id in (select id from order_table limit 1000000, 100);

select * from order_table where id >= (select id from order_table limit 1000000, 1) limit 100;

select * from order_table t1 join (select id from order_table limit 1000000, 1) t2 on t1.id >= t2.id limit 100

这种方式之所以能够大幅度优化性能，分析：

直接分页，不会走索引，全表扫描，其实是遍历主键索引树，但是每次都需要把对应行的数据取出来，要取1000100条数据，然后丢弃前一百万条，太消耗性能。可以通过explain执行计划查看对应索引情况
使用子查询，会走主键索引，虽然也是遍历主键索引树，但是只需要取id，不需要取出整行数据，最后外层查询拿到对应的100条id，查询对应数据即可。通过根据索引字段定位后，大大减少了查询的数据量，效率自然大大提升
上述使用in，where条件和连接，性能差距不大

5. 业务优化
真的会有人翻到第100万条数据吗，一般来说翻页不会查过20页，可以通过限制可以翻页的数量解决这个问题。像百度分页最多只能展示76页。还有一种方式，就是使用滚动，和微博一样，没有分页，只能不断下拉，就是使用之前记录上一页最大offset那个方法就可以做到。ES里面可以使用scroll API