今天一位朋友,提了一个很智慧的问题,Oracle查order之后的前多少条,只能用这种写法?
select * from
(select * from test t order by t.created_date desc)
where rownum < 10;
印象当中,像SQL Server支持top操作,MySQL支持limit,Oracle中一般情况下,我们按顺序取前几条的操作就是利用order by和rownum,还会有其他方式?另外以上这种写法能不能不要子查询?
这些问题的答案,好像有些含糊。通过实验,论证一下。
1. 顺序取前几条的操作
需求就是得到order by cdate desc排序之后的前9条数据。
为了方便说明,创建一张测试表,存入15条数据,
SQL> desc t_order
Name Null? Type
------------------ -------- ----------------------------
ID NUMBER
CDATE DATE
如果不带order by,查询返回的结果集顺序就是如下,
P.S. 关于数据检索顺序的问题,可以参考《Oracle数据顺序问题》。
SQL> select * from t_order;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
16 2020-12-20
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
10 2020-12-26
11 2020-12-25
12 2020-12-24
13 2020-12-23
14 2020-12-22
15 2020-12-21
15 rows selected.
按照cdate降序排列,结果集如下所示,注意这两种select操作中的id=16的位置是不同的,
SQL> select * from t_order order by cdate desc;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
10 2020-12-26
11 2020-12-25
12 2020-12-24
13 2020-12-23
14 2020-12-22
15 2020-12-21
16 2020-12-20
15 rows selected.
首先,我们看下原始SQL,因为需求是order by cdate desc排序之后的前9条数据,所以返回这9条数据是正确的,
SQL> select * from (select * from t_order order by cdate desc) where rownum<10;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
10 2020-12-26
9 rows selected.
第一种替代方式,使用窗口函数row_number(),
select id, cdate from (select id, cdate, row_number() over (order by cdate desc) rn from t_order) where rn<10;
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| | 9 |00:00:00.01 | 7 |
|* 1 | VIEW | | 1 | 10 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 7 |
|* 2 | WINDOW SORT PUSHED RANK| | 1 | 15 | 4 (25)| 00:00:01 | 10 |00:00:00.01 | 7 |
| 3 | TABLE ACCESS FULL | T_ORDER | 1 | 15 | 3 (0)| 00:00:01 | 15 |00:00:00.01 | 7 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"<10)
2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("CDATE") DESC )<10)
如果用的rownum,外层循环select的是*,结果是对的,
SQL> select * from (select id, cdate, row_number() over (order by cdate desc) from t_order) where rownum<10;
ID CDATE ROW_NUMBER()OVER(ORDERBYCDATEDESC)
---------- ---------- ----------------------------------
1 2021-01-04 1
2 2021-01-03 2
3 2021-01-02 3
4 2021-01-01 4
6 2020-12-30 5
7 2020-12-29 6
8 2020-12-28 7
9 2020-12-27 8
10 2020-12-26 9
9 rows selected.
我们从执行计划看下,SORT关键字说明这个是经过排序的,语义上正确,
select * from (select id, cdate, row_number() over (order by cdate desc) from t_order) where rownum<10;
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| | 9 |00:00:00.01 | 7 |
|* 1 | COUNT STOPKEY | | 1 | | | | 9 |00:00:00.01 | 7 |
| 2 | VIEW | | 1 | 15 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 7 |
| 3 | WINDOW SORT | | 1 | 15 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 7 |
| 4 | TABLE ACCESS FULL| T_ORDER | 1 | 15 | 3 (0)| 00:00:01 | 15 |00:00:00.01 | 7 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
但是,如果select的是字段名称且未带着row_number(),结果就可能错,
SQL> select id, cdate from (select id, cdate, row_number() over (order by cdate desc) from t_order) where rownum<10;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
16 2020-12-20
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
9 rows selected.
如果是字段,带着row_number(),就是正确的,
SQL> select id, cdate, c from (select id, cdate, row_number() over (order by cdate desc) c from t_order) where rownum<10;
ID CDATE C
---------- ------------------ ----------
1 04-JAN-21 1
2 03-JAN-21 2
3 02-JAN-21 3
4 01-JAN-21 4
6 30-DEC-20 5
7 29-DEC-20 6
8 28-DEC-20 7
9 27-DEC-20 8
10 26-DEC-20 9
9 rows selected.
第二种替代方式,12c以上,可以使用fetch,
SQL> select * from t_order order by cdate desc fetch first 9 rows only;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
10 2020-12-26
9 rows selected.
但是我们看他的执行计划,应该是将fetch自动转成了row_number()操作,
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| | 9 |00:00:00.01 | 7 |
| * 1 | VIEW | | 1 | 9 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 7 |
| * 2 | WINDOW SORT PUSHED RANK| | 1 | 15 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 7 |
| 3 | TABLE ACCESS FULL | T_ORDER | 1 | 15 | 3 (0)| 00:00:01 | 15 |00:00:00.01 | 7 |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=9)
2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("T_ORDER"."CDATE") DESC )<=9)
以上是从功能角度的尝试,如果从性能,每次都是全表扫,效率注定不高,因为测试SQL,无其他检索条件,只是有个order by,因此为cdate创建索引,同时将表数据量,扩为10万,再执行SQL,发现还是全表扫,
create index idx_to_01 on t_order(cdate);
select * from (select * from t_o order by cdate desc) where rownum<10
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 581 (100)| | 9 |00:00:00.01 | 268 |
| * 1 | COUNT STOPKEY | | 1 | | | | 9 |00:00:00.01 | 268 |
| 2 | VIEW | | 1 | 100K| 581 (1)| 00:00:01 | 9 |00:00:00.01 | 268 |
| * 3 | SORT ORDER BY STOPKEY| | 1 | 100K| 581 (1)| 00:00:01 | 9 |00:00:00.01 | 268 |
| 4 | TABLE ACCESS FULL | T_O | 1 | 100K| 103 (1)| 00:00:01 | 100K|00:00:00.01 | 268 |
------------------------------------------------------------------------------------------------------------------
此时其实忽略了一个问题,就是cdate字段非空属性的问题,因为cdate,默认允许为空,索引不存储空值,因此即使是order by,都是不会使用这个索引的,避免漏选数据。
设置cdate非空,
SQL> alter table t_order modify cdate not null;
Table altered.
此时执行SQL,索引全扫描,因为他是单块读索引,避免了排序,从Cost和Buffer可以看到,都降下来了,
select * from (select * from t_order order by cdate desc) where rownum<10;
-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 (100)| | 9 |00:00:00.01 | 5 |
|* 1 | COUNT STOPKEY | | 1 | | | | 9 |00:00:00.01 | 5 |
| 2 | VIEW | | 1 | 9 | 3 (0)| 00:00:01 | 9 |00:00:00.01 | 5 |
| 3 | TABLE ACCESS BY INDEX ROWID| T_O | 1 | 100K| 3 (0)| 00:00:01 | 9 |00:00:00.01 | 5 |
| 4 | INDEX FULL SCAN DESCENDING| IDX_TO_01 | 1 | 9 | 2 (0)| 00:00:01 | 9 |00:00:00.01 | 3 |
-----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
2. 原始的写法能不能不要子查询?
即如下这两条SQL,是否是等价的?
select * from (select * from t_order order by cdate desc) where rownum<10;
select * from t_order where rownum<10 order by cdate desc;
从执行结果看,这两个SQL,结果集是不同的,带子查询,id=1到10,
SQL> select * from (select * from t_order order by cdate desc) where rownum<10;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
10 2020-12-26
9 rows selected.
不带子查询,缺少id=10,多了id=16,
SQL> select * from t_order where rownum<10 order by cdate desc;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
16 2020-12-20
9 rows selected.
需求是得到order by cdate desc排序之后的前9条数据,明显地,第一条SQL,才是正确的,
SQL> select * from t_order order by cdate desc;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
10 2020-12-26
11 2020-12-25
12 2020-12-24
13 2020-12-23
14 2020-12-22
15 2020-12-21
16 2020-12-20
15 rows selected.
而第二条SQL,很明显,是从以下结果集中过滤出来的,
SQL> select * from t_order;
ID CDATE
---------- ----------
1 2021-01-04
2 2021-01-03
3 2021-01-02
4 2021-01-01
16 2020-12-20
6 2020-12-30
7 2020-12-29
8 2020-12-28
9 2020-12-27
10 2020-12-26
11 2020-12-25
12 2020-12-24
13 2020-12-23
14 2020-12-22
15 2020-12-21
15 rows selected.
两者的区别,就在于是先排序再得到前9条,还是先得到前9条再排序。
我们从执行计划,能看到他的执行路径是什么,第一条SQL,先全表扫描得到所有的数据,然后排序,再执行rownum,
select * from (select * from t_order order by cdate desc) where rownum<10;
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| | 9 |00:00:00.01 | 7 |
|* 1 | COUNT STOPKEY | | 1 | | | | 9 |00:00:00.01 | 7 |
| 2 | VIEW | | 1 | 15 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 7 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 15 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 7 |
| 4 | TABLE ACCESS FULL | T_ORDER | 1 | 15 | 3 (0)| 00:00:01 | 15 |00:00:00.01 | 7 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
3 - filter(ROWNUM<10)
第二条SQL,虽然显示TABLE ACCESS FULL,但是rows就是9,根据rownum直接得到9条(而且是无序的),然后执行order by排序,
select * from t_order where rownum<10 order by cdate desc;
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| | 9 |00:00:00.01 | 6 |
| 1 | SORT ORDER BY | | 1 | 9 | 4 (25)| 00:00:01 | 9 |00:00:00.01 | 6 |
|* 2 | COUNT STOPKEY | | 1 | | | | 9 |00:00:00.01 | 6 |
| 3 | TABLE ACCESS FULL| T_ORDER | 1 | 9 | 3 (0)| 00:00:01 | 9 |00:00:00.01 | 6 |
-----------------------------------------------------------------------------------------------------------------
因此,为了得到order by排序后的前几条,需要用到子查询。
当我们碰到这种不知道谁的语义正确的时候,从执行计划,会给我们些提示,判断究竟谁是正确的。
近期更新的文章:
《VMWare 11安装RedHat Linux 7过程中碰到的坑》
文章分类和索引: