文章目录
核心思路
- 使用窗口函数
sum(xx) over(patition by yy order by zz )
- 当
sum
窗口函数没有order by
时,得到的是分组后的指定列值的总和,有order by
时,则是指定列值的前缀累加和
需求:我们有如下的用户访问数据
要求:使用SQL
统计出每个用户的累积访问次数,如下表所示:
造数据:
CREATE TABLE test1 (
userId string,
visitDate string,
visitCount INT )
ROW format delimited FIELDS TERMINATED BY "\t";
INSERT INTO TABLE test1
VALUES
( 'u01', '2017/1/21', 5 ),
( 'u02', '2017/1/23', 6 ),
( 'u03', '2017/1/22', 8 ),
( 'u04', '2017/1/20', 3 ),
( 'u01', '2017/1/23', 6 ),
( 'u01', '2017/2/21', 8 ),
( 'u02', '2017/1/23', 6 ),
( 'u01', '2017/2/22', 4 );
select * from test1;
第一步:首先需要统计每个用户每个月访问了多少次,故需要将年月日转为年月,而后分组聚合
select
userId,
date_format(regexp_replace(visitDate,'/','-'),'YYYY-MM') as visitMonth,
sum(visitCount) as cnt
from test1
group by userId,date_format(regexp_replace(visitDate,'/','-'),'YYYY-MM')
第二步:在第一步的基础上,利用sum
窗口函数有order by
的语句,统计累计到当前月的每个用户访问数,用到with as
语法
with t1 as (
select
userId,
date_format(regexp_replace(visitDate,'/','-'),'YYYY-MM') as visitMonth,
sum(visitCount) as cnt
from test1
group by userId,date_format(regexp_replace(visitDate,'/','-'),'YYYY-MM')
)
select
userId as `用户id`,
visitMonth as `月份`,
cnt as `小计`,
sum(cnt) over(partition by userId order by visitMonth) as `累积`
from t1;