function(args) over( partition by xx order by yy desc/asc )
1. 排序函数
row_number () | 排序可以不重复,序号连续 |
rank () | 排序可以重复,序号不连续 |
dense_rank () | 排序可以重复,序号连续 |
案例1:如何写排序代码(不分组排序)
select
scores,
dense_rank() over(order by score desc) as 'rank'
from Scores;
案例2:如何写排序代码(分组排序)
select
department
,employee
,salary
from (
select
b.name as department
,a.name as employee
,salary
,rank() over(partition by departmentid order by salary) as salary_rank
from enployees a
join department b on a.departmentid=b.id )
where salary_rank=1; #查找每个部门中薪资最高的人的薪水
利用窗口函数可以取每个部门最高,也可以取前二高,前三高,也可以只取第一第三。
2. 聚合函数配合聚合函数实现滑动窗口
三个月内的滑动平均
function(args) over( 【 partition by xx】【order by yy asc\desc 】【rows\range frame_start 】 )
function为聚合函数:
count(col) -返回所有非空值的个数
count(*)
count(字段) -空值不计入
avg(col) -返回平均值
sum(col) -返回平均值
max(col) -返回最大的值
min(col) -返回最小的值
first(col) -返回第一个记录的值
last(col) -返回最后一个记录的值
rows模式 | 按物理行来进行划分 |
range模式 | 按数值逻辑进行划分 |
{RANGE|ROWS} frame_start
{RANGE|ROWS} BETWEEN frame_start AND frame_end
UNBOUNDED PRECEDING --从头开始
expression PRECEDING --当前行往前推expression行到当前行
CURRENT ROW --当前行
expression FOLLOWING --往下推两行
UNBOUNDED FOLLOWING --到结束行
默认:BETWEEN unbounded preceding AND CURRENT ROW
滚动求近三个月产品的平均GMV?
select product
, year_month
,gmv
, AVG(gmv) OVER(partition by product order by year_month rows BETWEEN unbounded preceding AND CURRENT ROW) as avg_gmv
from product;
# 滑动函数默认从开始行到当前行
select product
, year_month
, gmv
, AVG(gmv) OVER(partition by product order by year_month ) AS avg_gmv
from product
滚动求从上架开始到本月的平均GMV?
select product
, year_month
, gmv
, AVG(gmv) OVER(partition by product order by year_month asc ROWS 2 PRECEDING ) AS avg_gmv
from product
3、窗口函数:lag、lead函数求环比
function(args) OVER([partition by ] [order by ][asc\desc] [frame])
lag(expression,n) | 返回当前行的前n行 |
lead(expression,n) | 返回当前行的后n行 |
例:求每个产品的GMV的环比涨幅 =(当期gmv / 上一期gmv)-1 其中lag(gmv,1)取上一期的gmv
select product
, year_moth
, gmv
, lag(gmv,1) OVER(partition by department,product order by year_month) as lag_gmv
, cast(gmv as double)/lag(gmv,1) OVER(partition by department,product order by year_month) -1 as growth_rate
FROM product;
Q&A:如果日期不连续怎么办:可以通过join万年历表来解决
如果是求同比怎么办? 这里是以月为单位,只需要将lag(gmv,1)改为lag(gmv,12)即可
例:如何求销量top10%的商品信息
求销量前10:窗口函数、自连接
1、窗口函数解法用到的函数:
precent_rank() | (分组内当前行的RANK值-1) / (分组内总行数-1) |
cume_dist() | 小于等于当前值的行数 、 分组内总行数 |
row_number() 略 | 卡排名范围 |
function(args) OVER(
[partition by ]
[order by ][asc\desc]
[frame]
)
SELECT *
FROM (
SELECT product
,percent_rank() over(order by sales desc) as 'percent_rank'
,cume_dist() over(order by sales desc) as 'cume_dist'
FROM sku_sales
)
WHERE 'percent_rank'<0.1;
2、NTILE( n ) 函数
功能:将排序分区中的行 划分为特定数量的组。
对于每一行,NTILE() 函数返回一个桶号,表示行所属的组。
NTILE(n) over( [partition by yy] [order by xx] [frame] )
销量前百分之10的商品信息,可以先将商品分为10个组,然后从中拿出编号为1的组
SELECT *
FROM (
SELECT product,
NTILE(10) OVER(order by sales desc) as ntile_rank
FROM sku_sales
)
WHERE ntile_rank=1;