参见《Oracle SQL高级编程》
概述
分析函数是以一定的方法在一个与当前行相关的结果子集中进行计算,也称为窗口函数。
一般结构为
Function(arg1 , arg2 ……) over(partition by clause order by clause windowing clause )
Windowing clause : rows | range between start_expr and end_expr
Start_expr is unbounded preceding | current row | n preceding | n following
End_expr is unbounded following | current row | n preceding | n following
不是所有的分析函数都支持开窗子句。
创建测试表
SH@ prod> create table sales_fact as
2 select country_name country , country_subregion region , prod_name product , calendar_year year , calendar_week_number week ,
3 sum(amount_sold) sale , sum(amount_sold*
4 (case when mod(rownum , 10 ) = 0 then 1.4
5 when mod(rownum , 5)= 0 then 0.6
6 when mod(rownum , 2)= 0 then 0.9
7 when mod(rownum , 2)=1 then 1.2
8 else 1 end ) ) receipts
9 from sales , times , customers , countries , products
10 where sales.time_id = times.time_id and
11 sales.prod_id = products.prod_id and
12 sales.cust_id = customers.cust_id and
13 customers.country_id = countries.country_id
14 group by country_name , country_subregion , prod_name , calendar_year , calendar_week_number ;
Table created.
把聚合函数当作分析函数使用
分析函数列只是一列数值,每一行对应一个值,对于查询的其它方面没有任何影响。
从以下查询可以得出以下几点:
1.over分区条件中的列可以不在select列表中,但是必须在数据源中。
2.over排序条件中的列可以不在select列表中,但是必须在数据源中。
3.over排序条件是对所在分区中的数据进行排序,与select语句中的排序无关。但是会影响到分析函数的结果。
4.over中的开窗条件的范围一般仅限于分区本身。rows between unbounded preceding and current row表示从分区的最开始到当前行。
5.分析函数的数据来自结果集(施加了where条件之后的)。
下面的查询中的分析列表示该年从开始到该周的销售累计。
SH@ prod> select year , week , sale ,
2 sum(sale) over( partition by region , year
3 order by week
4 rows between unbounded preceding and current row ) running_sum_ytd
5 from sales_fact
6 where country in ('Australia') and product='Xtend Memory' and week < 10
7 order by year , week ;
YEAR WEEK SALE RUNNING_SUM_YTD
---------- ---------- ---------- ---------------
1998 1 58.15 58.15
1998 2 29.39 87.54
1998 3 29.49 117.03
1998 4 29.49 146.52
1998 5 29.8 176.32
1998 6 58.78 235.1
1998 9 58.78 293.88
1999 1 53.52 53.52
1999 3 94.6 148.12
1999 4 40.5 188.62
1999 5 80.01 268.63
1999 6 40.5 309.13
1999 8 103.11 412.24
1999 9 53.34 465.58
2000 1 46.7 46.7
2000 3 93.41 140.11
2000 4 46.54 186.65
2000 5 46.7 233.35
2000 7 70.8 304.15
2000 8 46.54 350.69
2001 1 92.26 92.26
2001 2 118.38 210.64
2001 3 47.24 257.88
2001 4 256.7 514.58
2001 5 93.44 608.02
2001 6 22.44 630.46
2001 7 69.96 700.42
YEAR WEEK SALE RUNNING_SUM_YTD
---------- ---------- ---------- ---------------
2001 8 46.06 746.48
2001 9 92.67 839.15
29 rows selected.
结果与上面相同,只是排序不同方式,分析列看起来就没有规律了。
SH@ prod> select year , week , sale ,
2 sum(sale) over( partition by region , year
3 order by week
4 rows between unbounded preceding and current row ) running_sum_ytd
5 from sales_fact
6 where country in ('Australia') and product='Xtend Memory' and week < 10
7 order by year , sale ;
YEAR WEEK SALE RUNNING_SUM_YTD
---------- ---------- ---------- ---------------
1998 2 29.39 87.54
1998 4 29.49 146.52
1998 3 29.49 117.03
1998 5 29.8 176.32
1998 1 58.15 58.15
1998 6 58.78 235.1
1998 9 58.78 293.88
1999 4 40.5 188.62
1999 6 40.5 309.13
1999 9 53.34 465.58
1999 1 53.52 53.52
1999 5 80.01 268.63
1999 3 94.6 148.12
1999 8 103.11 412.24
2000 4 46.54 186.65
2000 8 46.54 350.69
2000 1 46.7 46.7
2000 5 46.7 233.35
2000 7 70.8 304.15
2000 3 93.41 140.11
2001 6 22.44 630.46
2001 8 46.06 746.48
2001 3 47.24 257.88
2001 7 69.96 700.42
2001 1 92.26 92.26
2001 9 92.67 839.15
2001 5 93.44 608.02
YEAR WEEK SALE RUNNING_SUM_YTD
---------- ---------- ---------- ---------------
2001 2 118.38 210.64
2001 4 256.7 514.58
29 rows selected.
分区中的排序选取不恰当,则分析列结果没有什么意义了。分区开窗排序的选取与分析列的结果密切相关。
SH@ prod> select year , week , sale ,
2 sum(sale) over( partition by region , year
3 order by sale
4 rows between unbounded preceding and current row ) running_sum_ytd
5 from sales_fact
6 where country in ('Australia') and product='Xtend Memory' and week < 10
7 order by year , week ;
YEAR WEEK SALE RUNNING_SUM_YTD
---------- ---------- ---------- ---------------
1998 1 58.15 176.32
1998 2 29.39 29.39
1998 3 29.49 88.37
1998 4 29.49 58.88
1998 5 29.8 118.17
1998 6 58.78 235.1
1998 9 58.78 293.88
1999 1 53.52 187.86
1999 3 94.6 362.47
1999 4 40.5 40.5
1999 5 80.01 267.87
1999 6 40.5 81
1999 8 103.11 465.58
1999 9 53.34 134.34
2000 1 46.7 186.48
2000 3 93.41 350.69
2000 4 46.54 46.54
2000 5 46.7 139.78
2000 7 70.8 257.28
2000 8 46.54 93.08
2001 1 92.26 277.96
2001 2 118.38 582.45
2001 3 47.24 115.74
2001 4 256.7 839.15
2001 5 93.44 464.07
2001 6 22.44 22.44
2001 7 69.96 185.7
YEAR WEEK SALE RUNNING_SUM_YTD
---------- ---------- ---------- ---------------
2001 8 46.06 68.5
2001 9 92.67 370.63
29 rows selected.
分析函数的执行计划
虽然有分析函数还是只需要一次全表扫描,但是需要排序。
WINDOW SORT是分析函数的典型特征。
SH@ prod> explain plan for
2 select year , week , sale ,
3 sum(sale) over( partition by region , year
4 order by sale
5 rows between unbounded preceding and current row ) running_sum_ytd
6 from sales_fact
7 where country in ('Australia') and product='Xtend Memory' and week < 10
8 order by year , week ;
Explained.
SH@ prod> select * from table(dbms_xplan.display()) ;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 173857439
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 1890 | 311 (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 18 | 1890 | 311 (1)| 00:00:04 |
| 2 | WINDOW SORT | | 18 | 1890 | 311 (1)| 00:00:04 |
|* 3 | TABLE ACCESS FULL| SALES_FACT | 18 | 1890 | 309 (1)| 00:00:04 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND
"WEEK"<10)
Note
-----
- dynamic sampling used for this statement (level=2) 说明该表还没有统计信息。
20 rows selected.
不加分析列,只是少了一步window sort。
SH@ prod> explain plan for
2 select year , week , sale
3 from sales_fact
4 where country in ('Australia') and product='Xtend Memory' and week < 10
5 order by year , week ;
Explained.
SH@ prod> select * from table(dbms_xplan.display()) ;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1978576542
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 1584 | 310 (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 18 | 1584 | 310 (1)| 00:00:04 |
|* 2 | TABLE ACCESS FULL| SALES_FACT | 18 | 1584 | 309 (1)| 00:00:04 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND
"WEEK"<10)
Note
-----
- dynamic sampling used for this statement (level=2)
19 rows selected.
如何使窗口充满整个分区
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year
2 order by week
3 rows between unbounded preceding and unbounded following )
4 max_sale
5 from sales_fact
6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10
7 order by product , country , year , week ;
YEAR WEEK SALE MAX_SALE
---------- ---------- ---------- ----------
1998 1 58.15 58.78
1998 2 29.39 58.78
1998 3 29.49 58.78
1998 4 29.49 58.78
1998 5 29.8 58.78
1998 6 58.78 58.78
1998 9 58.78 58.78
1999 1 53.52 103.11
1999 3 94.6 103.11
1999 4 40.5 103.11
1999 5 80.01 103.11
1999 6 40.5 103.11
1999 8 103.11 103.11
1999 9 53.34 103.11
2000 1 46.7 93.41
2000 3 93.41 93.41
2000 4 46.54 93.41
2000 5 46.7 93.41
2000 7 70.8 93.41
2000 8 46.54 93.41
2001 1 92.26 256.7
2001 2 118.38 256.7
2001 3 47.24 256.7
2001 4 256.7 256.7
2001 5 93.44 256.7
2001 6 22.44 256.7
2001 7 69.96 256.7
YEAR WEEK SALE MAX_SALE
---------- ---------- ---------- ----------
2001 8 46.06 256.7
2001 9 92.67 256.7
29 rows selected.
两个边界都滑动的窗口
下面语句的窗口是往前两周,加往后两周,加当前周,一共五周。(到达边界时窗口会自动缩小)
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year
2 order by week
3 rows between 2 preceding and 2 following )
4 max_sale
5 from sales_fact
6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10
7 order by product , country , year , week ;
YEAR WEEK SALE MAX_SALE
---------- ---------- ---------- ----------
1998 1 58.15 58.15
1998 2 29.39 58.15
1998 3 29.49 58.15
1998 4 29.49 58.78
1998 5 29.8 58.78
1998 6 58.78 58.78
1998 9 58.78 58.78
1999 1 53.52 94.6
1999 3 94.6 94.6
1999 4 40.5 94.6
1999 5 80.01 103.11
1999 6 40.5 103.11
1999 8 103.11 103.11
1999 9 53.34 103.11
2000 1 46.7 93.41
2000 3 93.41 93.41
2000 4 46.54 93.41
2000 5 46.7 93.41
2000 7 70.8 70.8
2000 8 46.54 70.8 这里只所以是70.8因为窗口缩小了。
2001 1 92.26 118.38
2001 2 118.38 256.7
2001 3 47.24 256.7
2001 4 256.7 256.7
2001 5 93.44 256.7
2001 6 22.44 256.7
2001 7 69.96 93.44
YEAR WEEK SALE MAX_SALE
---------- ---------- ---------- ----------
2001 8 46.06 92.67
2001 9 92.67 92.67
29 rows selected.
默认窗口是什么?
一看便知。
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year
2 order by week )
3 max_sale
4 from sales_fact
5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10
6 order by product , country , year , week ;
YEAR WEEK SALE MAX_SALE
---------- ---------- ---------- ----------
1998 1 58.15 58.15
1998 2 29.39 58.1