Oracle要求顺序的top数据检索问题

今天一位朋友,提了一个很智慧的问题,Oracle查order之后的前多少条,只能用这种写法?

select * from 
(select * from test t order by t.created_date desc) 
where rownum < 10;

印象当中,像SQL Server支持top操作,MySQL支持limit,Oracle中一般情况下,我们按顺序取前几条的操作就是利用order by和rownum,还会有其他方式?另外以上这种写法能不能不要子查询?

这些问题的答案,好像有些含糊。通过实验,论证一下。

1. 顺序取前几条的操作

需求就是得到order by cdate desc排序之后的前9条数据。

为了方便说明,创建一张测试表,存入15条数据,

SQL> desc t_order
 Name                Null?    Type
 ------------------ -------- ----------------------------
 ID                           NUMBER
 CDATE                        DATE

如果不带order by,查询返回的结果集顺序就是如下,

P.S. 关于数据检索顺序的问题,可以参考《Oracle数据顺序问题》。

SQL> select * from t_order;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
        16 2020-12-20
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        10 2020-12-26
        11 2020-12-25
        12 2020-12-24
        13 2020-12-23
        14 2020-12-22
        15 2020-12-21
15 rows selected.

按照cdate降序排列,结果集如下所示,注意这两种select操作中的id=16的位置是不同的,

SQL> select * from t_order order by cdate desc;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        10 2020-12-26
        11 2020-12-25
        12 2020-12-24
        13 2020-12-23
        14 2020-12-22
        15 2020-12-21
        16 2020-12-20
15 rows selected.

首先,我们看下原始SQL,因为需求是order by cdate desc排序之后的前9条数据,所以返回这9条数据是正确的,

SQL> select * from (select * from t_order order by cdate desc) where rownum<10;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        10 2020-12-26
9 rows selected.

第一种替代方式,使用窗口函数row_number(),

select id, cdate from (select id, cdate, row_number() over (order by cdate desc) rn from t_order) where rn<10;


----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |      1 |        |     4 (100)|          |      9 |00:00:00.01 |       7 |
|*  1 |  VIEW                    |         |      1 |     10 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       7 |
|*  2 |   WINDOW SORT PUSHED RANK|         |      1 |     15 |     4  (25)| 00:00:01 |     10 |00:00:00.01 |       7 |
|   3 |    TABLE ACCESS FULL     | T_ORDER |      1 |     15 |     3   (0)| 00:00:01 |     15 |00:00:00.01 |       7 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN"<10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("CDATE") DESC )<10)

如果用的rownum,外层循环select的是*,结果是对的,

SQL> select * from (select id, cdate, row_number() over (order by cdate desc) from t_order) where rownum<10;
        ID CDATE      ROW_NUMBER()OVER(ORDERBYCDATEDESC)
---------- ---------- ----------------------------------
         1 2021-01-04                                  1
         2 2021-01-03                                  2
         3 2021-01-02                                  3
         4 2021-01-01                                  4
         6 2020-12-30                                  5
         7 2020-12-29                                  6
         8 2020-12-28                                  7
         9 2020-12-27                                  8
        10 2020-12-26                                  9
9 rows selected.

我们从执行计划看下,SORT关键字说明这个是经过排序的,语义上正确,

select * from (select id, cdate, row_number() over (order by cdate desc) from t_order) where rownum<10;
------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |      1 |        |     4 (100)|          |      9 |00:00:00.01 |       7 |
|*  1 |  COUNT STOPKEY       |         |      1 |        |            |          |      9 |00:00:00.01 |       7 |
|   2 |   VIEW               |         |      1 |     15 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       7 |
|   3 |    WINDOW SORT       |         |      1 |     15 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       7 |
|   4 |     TABLE ACCESS FULL| T_ORDER |      1 |     15 |     3   (0)| 00:00:01 |     15 |00:00:00.01 |       7 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<10)

但是,如果select的是字段名称且未带着row_number(),结果就可能错,

SQL> select id, cdate from (select id, cdate, row_number() over (order by cdate desc) from t_order) where rownum<10;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
        16 2020-12-20
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
9 rows selected.

如果是字段,带着row_number(),就是正确的,

SQL> select id, cdate, c from (select id, cdate, row_number() over (order by cdate desc) c from t_order) where rownum<10;
        ID CDATE                       C
---------- ------------------ ----------
         1 04-JAN-21                   1
         2 03-JAN-21                   2
         3 02-JAN-21                   3
         4 01-JAN-21                   4
         6 30-DEC-20                   5
         7 29-DEC-20                   6
         8 28-DEC-20                   7
         9 27-DEC-20                   8
        10 26-DEC-20                   9
9 rows selected.

第二种替代方式,12c以上,可以使用fetch,

SQL> select * from t_order order by cdate desc fetch first 9 rows only;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        10 2020-12-26
9 rows selected.

但是我们看他的执行计划,应该是将fetch自动转成了row_number()操作,

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |      1 |        |     4 (100)|          |      9 |00:00:00.01 |       7 |
|  * 1 |  VIEW                    |         |      1 |      9 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       7 |
| * 2 |   WINDOW SORT PUSHED RANK|         |      1 |     15 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       7 |
|    3 |    TABLE ACCESS FULL     | T_ORDER |      1 |     15 |     3   (0)| 00:00:01 |     15 |00:00:00.01 |       7 |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=9)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("T_ORDER"."CDATE") DESC )<=9)

以上是从功能角度的尝试,如果从性能,每次都是全表扫,效率注定不高,因为测试SQL,无其他检索条件,只是有个order by,因此为cdate创建索引,同时将表数据量,扩为10万,再执行SQL,发现还是全表扫,

create index idx_to_01 on t_order(cdate);


select * from (select * from t_o order by cdate desc) where rownum<10
------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |      1 |        |   581 (100)|          |      9 |00:00:00.01 |     268 |
| *  1 |  COUNT STOPKEY          |      |      1 |        |            |          |      9 |00:00:00.01 |     268 |
|   2 |   VIEW                  |      |      1 |    100K|   581   (1)| 00:00:01 |      9 |00:00:00.01 |     268 |
| *  3 |    SORT ORDER BY STOPKEY|      |      1 |    100K|   581   (1)| 00:00:01 |      9 |00:00:00.01 |     268 |
|   4 |     TABLE ACCESS FULL   | T_O  |      1 |    100K|   103   (1)| 00:00:01 |    100K|00:00:00.01 |     268 |
------------------------------------------------------------------------------------------------------------------

此时其实忽略了一个问题,就是cdate字段非空属性的问题,因为cdate,默认允许为空,索引不存储空值,因此即使是order by,都是不会使用这个索引的,避免漏选数据。

设置cdate非空,

SQL> alter table t_order modify cdate not null;
Table altered.

此时执行SQL,索引全扫描,因为他是单块读索引,避免了排序,从Cost和Buffer可以看到,都降下来了,

select * from (select * from t_order order by cdate desc) where rownum<10;
-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |      1 |        |     3 (100)|          |      9 |00:00:00.01 |       5 |
|*  1 |  COUNT STOPKEY                |           |      1 |        |            |          |      9 |00:00:00.01 |       5 |
|   2 |   VIEW                        |           |      1 |      9 |     3   (0)| 00:00:01 |      9 |00:00:00.01 |       5 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_O       |      1 |    100K|     3   (0)| 00:00:01 |      9 |00:00:00.01 |       5 |
|   4 |     INDEX FULL SCAN DESCENDING| IDX_TO_01 |      1 |      9 |     2   (0)| 00:00:01 |      9 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<10)

2. 原始的写法能不能不要子查询?

即如下这两条SQL,是否是等价的?

select * from (select * from t_order order by cdate desc) where rownum<10;
select * from t_order where rownum<10 order by cdate desc;

从执行结果看,这两个SQL,结果集是不同的,带子查询,id=1到10,

SQL> select * from (select * from t_order order by cdate desc) where rownum<10;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        10 2020-12-26
9 rows selected.

不带子查询,缺少id=10,多了id=16,

SQL> select * from t_order where rownum<10 order by cdate desc;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        16 2020-12-20
9 rows selected.

需求是得到order by cdate desc排序之后的前9条数据,明显地,第一条SQL,才是正确的,

SQL> select * from t_order order by cdate desc;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        10 2020-12-26
        11 2020-12-25
        12 2020-12-24
        13 2020-12-23
        14 2020-12-22
        15 2020-12-21
        16 2020-12-20
15 rows selected.

而第二条SQL,很明显,是从以下结果集中过滤出来的,

SQL> select * from t_order;
        ID CDATE
---------- ----------
         1 2021-01-04
         2 2021-01-03
         3 2021-01-02
         4 2021-01-01
        16 2020-12-20
         6 2020-12-30
         7 2020-12-29
         8 2020-12-28
         9 2020-12-27
        10 2020-12-26
        11 2020-12-25
        12 2020-12-24
        13 2020-12-23
        14 2020-12-22
        15 2020-12-21
15 rows selected.

两者的区别,就在于是先排序再得到前9条,还是先得到前9条再排序。

我们从执行计划,能看到他的执行路径是什么,第一条SQL,先全表扫描得到所有的数据,然后排序,再执行rownum,

select * from (select * from t_order order by cdate desc) where rownum<10;
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT        |         |      1 |        |     4 (100)|          |      9 |00:00:00.01 |       7 |
|*  1 |  COUNT STOPKEY          |         |      1 |        |            |          |      9 |00:00:00.01 |       7 |
|  2 |   VIEW                  |         |      1 |     15 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       7 |
|*  3 |    SORT ORDER BY STOPKEY|         |      1 |     15 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       7 |
|  4 |     TABLE ACCESS FULL   | T_ORDER |      1 |     15 |     3   (0)| 00:00:01 |     15 |00:00:00.01 |       7 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)

第二条SQL,虽然显示TABLE ACCESS FULL,但是rows就是9,根据rownum直接得到9条(而且是无序的),然后执行order by排序,

select * from t_order where rownum<10 order by cdate desc;
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT    |         |      1 |        |     4 (100)|          |      9 |00:00:00.01 |       6 |
|   1 |  SORT ORDER BY      |         |      1 |      9 |     4  (25)| 00:00:01 |      9 |00:00:00.01 |       6 |
|* 2 |   COUNT STOPKEY     |         |      1 |        |            |          |      9 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS FULL| T_ORDER |      1 |      9 |     3   (0)| 00:00:01 |      9 |00:00:00.01 |       6 |
-----------------------------------------------------------------------------------------------------------------

因此,为了得到order by排序后的前几条,需要用到子查询。

当我们碰到这种不知道谁的语义正确的时候,从执行计划,会给我们些提示,判断究竟谁是正确的。

近期更新的文章:

日常工作中碰到的几个技术问题

了解一下sqlhc

Oracle的MD5函数介绍

Oracle 19c的examples静默安装

sqlplus登录缓慢的解决

VMWare 11安装RedHat Linux 7过程中碰到的坑

COST值相同?是真是假?

Oracle 11g的examples静默安装

同名的同义词和视图解惑

v$和v_$的一些玄机

文章分类和索引:

公众号700篇文章分类和索引

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值