offset 这个是phoenix新版本提供的属性,但是存在性能问题,大数据的情况下是不推荐的。
我本来想试试offset,随着数量增加,耗时会如何增加。
但这两个不同的测试sql,让我有了新发现:
两条语句的执行计划:
0: jdbc:phoenix:192.168.199.154> explain select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN where login_date='2018-11-24' and country='China' order by created_date desc limit 5000 offset 20000;
+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 25000 ROWS 1625000 BYTES SERIAL 1-WAY RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED ['2018-11-24','China'] | 1625000 | 25000 | 0 |
| SERVER FILTER BY FIRST KEY ONLY | 1625000 | 25000 | 0 |
| SERVER OFFSET 20000 | 1625000 | 25000 | 0 |
| SERVER 25000 ROW LIMIT | 1625000 | 25000 | 0 |
| CLIENT 5000 ROW LIMIT | 1625000 | 25000 | 0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.038 seconds)
0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' order by seq_id desc limit 20 offset 20000;
+--------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 20020 ROWS 14314300 BYTES SERIAL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24'] | 14314300 | 20020 | 0 |
| SERVER FILTER BY FIRST KEY ONLY | 14314300 | 20020 | 0 |
| SERVER OFFSET 20000 | 14314300 | 20020 | 0 |
| SERVER 20020 ROW LIMIT | 14314300 | 20020 | 0 |
| CLIENT 20 ROW LIMIT | 14314300 | 20020 | 0 |
+--------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.032 seconds)
你说 跳跃2W行,查询5000条的速度快? 还是 跳跃2W行,查询20条的速度快?
offset 2W与 20W的结果如下:
select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN
where login_date='2018-11-24' and country='China'
order by created_date desc
limit 5000 offset 20000;
-- 5,000 rows selected (0.416 seconds)
select seq_id from T_EXTENSION_ALL_DATAS_SHOW
where show_date='2018-11-24'
order by seq_id desc
limit 20 offset 20000;
-- 20 rows selected (2.071 seconds)
select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN
where login_date='2018-11-24' and country='China'
order by created_date desc
limit 5000 offset 200000;
-- 5,000 rows selected (0.515 seconds)
select seq_id from T_EXTENSION_ALL_DATAS_SHOW
where show_date='2018-11-24'
order by seq_id desc
limit 20 offset 200000;
-- 20 rows selected (20.508 seconds)
都是走RANGE SCAN 为什么差距,这么大呢?
原因在于:
IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED 这个二级索引,他比使用 T_EXTENSION_ALL_DATAS_SHOW 这种表row key索引,字段内容要少很多,所以快。
-- 创建排序索引语法
CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED ON T_EXTENSION_ALL_DATAS_LOGIN(LOGIN_DATE,COUNTRY,CREATED_DATE DESC);
0: jdbc:phoenix:192.168.199.154> select * from IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED limit 1;
+--------------+------------+----------------------+-----------------+
| :LOGIN_DATE | 0:COUNTRY | 0:CREATED_DATE | :EMAIL |
+--------------+------------+----------------------+-----------------+
| 2018-11-24 | China | 2018-11-24 15:26:57 | zrq@500000.com |
+--------------+------------+----------------------+-----------------+
那么使用这种全局的排序索引,4个字段,offset,20W。只要0.5S。
由于这个条件只有50W的数据,只能这么试试了。接近50W,耗时1S左右。
select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN where login_date='2018-11-24' and country='China' order by created_date desc limit 5000 offset 495000;
-- 5,000 rows selected (1.025 seconds)
结论:虽然分页这些不推荐使用offset。但是某些小数据场景下,还是可以使用的。
推荐不超过50W的小数据集,可以使用offset。