开窗函数
1.聚合函数:group by
聚合后的行数《=聚合前的行数
sum max min count avg
需求:
既要显示聚合前的数据,又要显示聚合后的结果
id name sal
1 zs 3w
2 ls 2.5w
3 ww 2w
cnt
3
rank
id name sal rank
1 zs 3w 1
2 ls 2.5w 2
3 ww 2w 3
开窗函数:
窗口函数: 窗口 + 函数
窗口: 函数运行时计算的数据集的范围
函数:运行时的函数:
1.常用的聚合函数
2.窗口内置的函数
语法结构:
函数 over([partition by xxx,...] [order by xxx,...])
over: 以谁进行开窗 =》 table
partition by:以谁进行分组(类似group by ) =》 字段
order by:以谁进行排序 =》字段
主键:
约束 :确定数据的唯一性
1.自增主键
2.独一无二主键(xxx,...)
1.聚合函数 -》开窗
数据:
zuoshao,2022-03-10,1
zuoshao,2022-03-11,2
zuoshao,2022-03-12,7
zuoshao,2022-03-13,3
zuoshao,2022-03-14,2
zuoshao,2022-03-15,4
zuoshao,2022-03-16,4
create table zuoshao_window(
name varchar(255),
dt varchar(20),
cnt int
);
需求:
统计累计的问题,每个用户每天累计点外卖的次数
name | dt cnt,sum_cnt
zuoshao,2022-03-10,1,1
zuoshao,2022-03-11,2,3
zuoshao,2022-03-12,7,10
zuoshao,2022-03-13,3,13
zuoshao,2022-03-14,2,15
zuoshao,2022-03-15,4,19
zuoshao,2022-03-16,4,23
函数 over([partition by xxx,...] [order by xxx,...])
2.指定窗口大小
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as sum_cnt,
sum(cnt) over(partition by name order by dt rows between unbounded preceding and current row ) as sum_cnt1,
sum(cnt) over(partition by name ) as sum_cnt2,
sum(cnt) over(partition by name order by dt rows between 3 preceding and current row) as sum_cnt3,
sum(cnt) over(partition by name order by dt rows between 3 preceding and 1 following) as sum_cnt4,
sum(cnt) over(partition by name order by dt rows between current row and unbounded following) as sum_cnt5
from zuoshao_window;
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as sum_cnt,
sum(cnt) over(partition by name ) as sum_cnt2,
sum(cnt) over(order by dt ) as sum_cnt3,
sum(cnt) over( ) as sum_cnt4
from zuoshao_window
order by dt,name ;
3.开窗 -内置函数
RANK
ROW_NUMBER
DENSE_RANK
NTILE
1.NTILE
需求:
把数据按照姓名进行分组 时间排序 结果数据分成 3份数
select
name,
dt,
cnt,
NTILE(3) over(partition by name order by dt) as n1,
NTILE(2) over(partition by name order by dt) as n2
from zuoshao_window
order by name ;
NTILE(N):
把数据平均分配到N中,如果不能平均分配,优先分配到较小的编号中。
2.rank相关的
RANK
ROW_NUMBER
DENSE_RANK
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as sum_cnt,
rank() over(partition by name order by cnt) as rk,
row_number() over(partition by name order by cnt) as rn,
dense_rank() over(partition by name order by cnt) as d_rk
from zuoshao_window
order by name ;
RANK:
从1开始,按照顺序,生成组内记录的编号,排序相同会重复,在名次中留下空位
ROW_NUMBER:
从1开始,按照顺序,生成组内记录的编号,序号没有重复的
DENSE_RANK:
从1开始,按照顺序,生成组内记录的编号,排序相同会重复,在名次中不留下空位
3.蹿行问题
lag 向上取第几行
lead 向下取第几行
lag(input[, offset[, default]])
input =》 col
offset =》 取第几行
default=》 取不到时 给一个默认值
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as sum_cnt,
lag(dt,1,"9999-99-99") over(partition by name order by dt) as lag_v,
lead(dt,1,"9999-99-99") over(partition by name order by dt) as lead_v
from zuoshao_window
order by name ;
4.取值问题
FIRST_VALUE(col):取分组后 截止到当前行 第一个值
LAST_VALUE(col):取分组后 截止到当前行 最后一个值
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as sum_cnt,
first_value(dt) over(partition by name order by dt) as first_v,
last_value(dt) over(partition by name order by dt) as last_v
from zuoshao_window
order by name ;
开窗函数案例:
1.我们有如下的用户访问数据
userId visitDate visitCount
u01 2017/1/21 5
u02 2017/1/23 6
u03 2017/1/22 8
u04 2017/1/20 3
u01 2017/1/23 6
u01 2017/2/21 8
U02 2017/1/23 6
U01 2017/2/22 4
要求使用SQL统计出每个用户的累积访问次数,如下表所示:
用户id 月份 小计 累积
u01 2017-01 11 11
u01 2017-02 12 23
u02 2017-01 12 12
u03 2017-01 8 8
u04 2017-01 3 3
每个用户的累积访问次数=》
每个用户每个月累计访问次数
维度: 用户、月
指标:次数、累计访问次数
1.etl:
2017/2/22 =》 2017-02 日期函数 ,string函数 sql里面
2017/2/22=>2017-2-22
2.
1.先求 每个月 次数
2. 1结果 =》 累计
select
-- 每个用户每个月 累计次数
userid,
month,
cnt,
sum(cnt) over(partition by userid order by month) as sum_cnt
from
(
select -- 每个用户每个月 次数
userid,
date_format(replace(visitdate,'/','-'), '%Y-%m') as month ,
sum(visitcount) as cnt
from test1
group by
userid,month
) a;
select
-- 每个用户每个月 累计次数
userid,
month,
cnt,
sum(cnt) over(partition by userid order by month) as sum_cnt
from
(
select -- 每个用户每个月 次数
userid,
substr(replace(visitdate,'/&