在OLAP系统中,由于数据量比较大,该类系统对查询功能的要求比较高!
ORACLE的分析函数正好能解决这类问题。
分析函数主要用over()来标识,在11g的官方文档上,可以看到下述函数都能用在分析函数中:
AVG *
CORR *
COUNT *
COVAR_POP *
COVAR_SAMP *
CUME_DIST
DENSE_RANK
FIRST
FIRST_VALUE *
LAG
LAST
LAST_VALUE *
LEAD
LISTAGG
MAX *
MIN *
NTH_VALUE*
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
RANK
RATIO_TO_REPORT
REGR_ (Linear Regression) Functions *
ROW_NUMBER
STDDEV *
STDDEV_POP *
STDDEV_SAMP *
SUM *
VAR_POP *
VAR_SAMP *
VARIANCE *
下面通过实际例子来说明分析函数的应用!
1、实现累加的效果:sum(sum())
SQL> select * from t;
YEARS MONTHS PRODUCT_NA SALES
---------- ---------- ---------- ----------
2008 1 A 1000
2008 1 B 1500
2008 2 A 2000
2008 2 B 3000
2008 2 C 1000
2008 3 A 3000
已选择6行。
SQL> select years,months,sum(sales) sum1,
2 sum(sum(sales)) over(partition by years order by months) sum2 from t
3 group by years,months
4 /
YEARS MONTHS SUM1 SUM2
---------- ---------- ---------- ----------
2008 1 2500 2500
2008 2 6000 8500
2008 3 3000 11500
2、为记录排名:row_number()、rank()、dense_rank()
--在以下的查询结果中,如果用传统的rownum,那么显然取前五名时便会漏了第六名
SQL> select rownum,tt.* from
2 (select * from t order by sales desc)tt
3 /
ROWNUM YEARS MONTHS PRODUCT_NA SALES
---------- ---------- ---------- ---------- ----------
1 2008 2 B 3000
2 2008 3 A 3000
3 2008 2 A 2000
4 2008 1 B 1500
5 2008 1 A 1000
6 2008 2 C 1000
已选择6行。
--使用row_number()SQL> select t.years,t.months,t.product_name,t.sales,
2 row_number() over(order by t.sales desc) row_numer,
3 rank() over(order by t.sales desc) rank,
4 dense_rank() over(order by t.sales desc) dense_rank
5 from t
6 group by t.years,t.months,t.product_name,t.sales
7 /
YEARS MONTHS PRODUCT_NA SALES ROW_NUMER RANK DENSE_RANK
---------- ---------- ---------- ---------- ---------- ---------- ----------
2008 2 B 3000 1 1 1
2008 3 A 3000 2 1 1
2008 2 A 2000 3 3 2
2008 1 B 1500 4 4 3
2008 1 A 1000 5 5 4
2008 2 C 1000 6 5 4
已选择6行。
从上述SQL的结果来看,可以看出row_number()、rank()、dense_rank()三个排名函数的区别,
row_number()返回的结果是连续递增的,不管排序的值是否相同;
rank()函数,如果遇到相同的值,那么返回相同的排名,然后接下来将跳过相同的排位,继续排名,整个序列并不一定是连续的;
dense_rank()函数返回的排序结果是连续的序列,即使它们有相同的值。
在排序时,如果遇到空值null,那么在DESC的降序排列时,NULL将排在第一位,如果要让NULL排到最后,可以添加一个nulls last
比如:
select t.years,t.months,t.product_name,t.sales,
row_number() over(order by t.sales desc) row_numer,
rank() over(order by t.sales desc) rank,
dense_rank() over(order by t.sales desc nulls last) dense_rank
from t
group by t.years,t.months,t.product_name,t.sales
3、返回最多最少的信息:first、last
SQL> select min(months) keep (dense_rank first order by sum(sales) desc) most,
2 min(months) keep (dense_rank last order by sum(sales) desc) least
3 from t
4 group by months
5 /
MOST LEAST
---------- ----------
2 1
4、将查询出来的数据分等级:NTILE(N)
SQL> select t.years,t.months,t.product_name,
2 ntile(5) over(order by sum(sales) desc) tile
3 from t
4 group by t.years,t.months,t.product_name
5 /
YEARS MONTHS PRODUCT_NA TILE
---------- ---------- ---------- ----------
2008 2 B 1
2008 3 A 1
2008 2 A 2
2008 1 B 3
2008 2 C 4
2008 1 A 5
已选择6行。
一共返回6行,切成5份,那么平均每份1.2行,那么第一份2行,余下每份1行。
SQL> select t.years,t.months,t.product_name,
2 cume_dist() over(order by product_name) cume_dist
3 from t
4 group by t.years,t.months,t.product_name
5 /
YEARS MONTHS PRODUCT_NA CUME_DIST
---------- ---------- ---------- ----------
2008 1 A .5
2008 2 A .5
2008 3 A .5
2008 1 B .833333333
2008 2 B .833333333
2008 2 C 1
已选择6行。
cume_dist() 函数返回累计分配比例,上述的总行数N=6,A的行数为3行,那么返回3/6=0.5,B的行数张2行,返回0.5+2/6=0.83333333
SQL> select t.years,t.months,t.product_name,
2 percent_rank() over(order by product_name) cume_dist
3 from t
4 group by t.years,t.months,t.product_name
5 /
YEARS MONTHS PRODUCT_NA CUME_DIST
---------- ---------- ---------- ----------
2008 1 A 0
2008 2 A 0
2008 3 A 0
2008 1 B .6
2008 2 B .6
2008 2 C 1
已选择6行。
percent_rank()函数返回累计百分比,上述的总行数N=6,A的第一行为1,那么返回(1-0)/(6-1)=0,有重复值同样返回0,B的行数是4,那么返回(4-1)/(6-1)=0.6
5、窗口计算:window
5.1、计算累积值SQL> select months,product_name,sum(sales) sum1,
2 sum(sum(sales)) over(order by months rows between unbounded preceding and unbounded following) sum2,
3 sum(sum(sales)) over(order by months rows between unbounded preceding and current row) sum3
4 from t group by months,product_name
5 /
MONTHS PRODUCT_NA SUM1 SUM2 SUM3
---------- ---------- ---------- ---------- ----------
1 A 1000 11500 1000
1 B 1500 11500 2500
2 A 2000 11500 4500
2 B 3000 11500 7500
2 C 1000 11500 8500
3 A 3000 11500 11500
上述中巧妙地运用了rows between N preceding and N follwing 的语法!
5.2、计算上下月份之类的,用first_value、last_valueSQL> select months,sum(sales) curr,
2 first_value(sum(sales)) over(order by months rows between 1 preceding and 1 following) prev,
3 last_value(sum(sales)) over(order by months rows between 1 preceding and 1 following) last,
4 avg(sum(sales)) over(order by months rows between 1 preceding and 1 following) avg
5 from t
6 group by months
7 /
MONTHS CURR PREV LAST AVG
---------- ---------- ---------- ---------- ----------
1 2500 2500 6000 4250
2 6000 2500 3000 3833.33333
3 3000 6000 3000 4500
上述中的curr表示当前月份的累积值,prev表示上月的累积值,last表示下月的累积值。
5.3、求相邻记录,lag、lead
SQL> select months,sales,lag(sales) over(order by months) lag,
2 lead(sales) over(order by months) lead
3 from t
4 /
MONTHS SALES LAG LEAD
---------- ---------- ---------- ----------
1 1000 1500
1 1500 1000 2000
2 2000 1500 3000
2 3000 2000 1000
2 1000 3000 3000
3 3000 1000
已选择6行。
点评:分析函数的功能非常强大,平时要多练习,熟练应用各种不同的场景!做到信手拈来的境界!