.开窗函数
1.聚合函数 : 多行数据 按照一定规则 进行聚合 为一行
sum avg max 。。。
理论上: 聚合后的行数 <= 聚合前的行数 [看维度选取 groupby 里面的字段]
2.需求:
既要显示 聚合前的数据 又要显示 聚合后的数据 ?
eg:
id name sal dt sal_all
1 zs 1000 2022-4 1000
2 ls 2000 2022-4 2000
3 ww 3000 2022-4 3000
1 zs 1000 2022-5 2000
2 ls 2000 2022-5 4000
3 ww 3000 2022-5 6000
窗口函数:
窗口 + 函数
窗口: 函数运行时 计算的数据集的范围
函数:运行时的函数
1.聚合函数
2.内置窗口函数
语法结构:
函数 over([partition by xxx,...] [order by xxx,...] )
over() :以谁进行开窗 【table】
partition by: 以谁进行分组 【group by column】
order by: 以谁进行排序 【column】
窗口函数:
数据:
服务器 每天的启动 次数
linux01,2022-04-15,1
linux01,2022-04-16,5
linux01,2022-04-17,7
linux01,2022-04-18,2
linux01,2022-04-19,3
linux01,2022-04-20,10
linux01,2022-04-21,4
统计累计问题:
需求:
每个服务器每天的累计 启动次数
name dt cnt
linux01,2022-04-15,1
linux01,2022-04-16,5
linux01,2022-04-17,7
linux01,2022-04-18,2
linux01,2022-04-19,3
linux01,2022-04-20,10
linux01,2022-04-21,4
name dt cnt cnt_all
linux01,2022-04-15,1 1
linux01,2022-04-16,5 6
linux01,2022-04-17,7 13
linux01,2022-04-18,2 15
linux01,2022-04-19,3 18
linux01,2022-04-20,10 28
linux01,2022-04-21,4 32
create table window01(
name varchar(50),
dt varchar(20),
cnt int
);
每个服务器每天的累计 启动次数
1.聚合函数
COUNT,SUM,MIN,MAX,AVG
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt ) as cnt_all
from window01;
2.内置窗口函数
窗口大小:
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt ) as sum_all,
– sum(cnt) over(partition by name order by dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as sum_all1
– sum(cnt) over(partition by name order by dt ROWS BETWEEN 3 PRECEDING AND CURRENT ROW ) as sum_all2
– sum(cnt) over(partition by name order by dt ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING ) as sum_all3
– sum(cnt) over(partition by name order by dt ROWS BETWEEN 3 PRECEDING AND UNBOUNDED FOLLOWING) as sum_all4
sum(cnt) over(partition by name order by dt ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as sum_all5
from window01;
2.内置窗口函数
1.取值 串行
2.排序
RANK
ROW_NUMBER
DENSE_RANK
CUME_DIST
PERCENT_RANK
NTILE
NTILE:
ntile
需求:
把数据按照某个字段进行排序 ,把数据分成几分
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt ) as sum_all,
ntile(2) over(partition by name order by dt ) as n2,
ntile(3) over(partition by name order by dt ) as n3
from window01 ;
把数据平均分配 指定 N个桶 ,如果不能平均分配 ,优先分配到 编号 小的里面
RANK
ROW_NUMBER
DENSE_RANK
RANK: 从1 开始 , 按照顺序 相同会重复 名次会留下 空的位置 生成组内的记录编号
ROW_NUMBER: 从1 开始 , 按照顺序 生成组内的记录编号
DENSE_RANK:从1 开始 , 按照顺序 生成组内的记录编号 相同会重复 名次不会会留下空的位置
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt ) as sum_all,
RANK() over(partition by name order by cnt desc ) as rk,
ROW_NUMBER() over(partition by name order by cnt desc) as rw,
DENSE_RANK() over(partition by name order by cnt desc ) as d_rk
from window01 ;
1. 串行
LEAD
LAG
1. 串行
LEAD:窗口内 向下 第n行的值
LAG:窗口内 向上 第n行的值
LEAD(column,n,default)
column => 列名
n =》 取 几行
default =》 取不到就给一个默认值
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt ) as sum_all,
LEAD(dt,1,"9999-99-99") over(partition by name order by dt ) as lead_alias,
LAG(dt,1,"9999-99-99") over(partition by name order by dt ) as lag_alias
from window01 ;
2.取值
FIRST_VALUE
LAST_VALUE