连续问题
以下数据为用户的消费金额,字段分别为用户id,日期,金额,统计连续3天金额大于100的用户;
u0001,2022-05-01,33
u0001,2022-05-01,88
u0001,2022-05-03,121
u0001,2022-05-04,88
u0001,2022-05-04,211
u0001,2022-05-05,201
u0001,2022-05-06,111
u0001,2022-05-07,21
u0001,2022-05-08,111
u0002,2022-05-01,212
u0002,2022-05-03,121
u0002,2022-05-05,201
u0002,2022-05-06,111
u0002,2022-05-07,21
u0002,2022-05-08,111
1.按照用户与日期聚合,计算每一天的金额 ,作为子表t1
select
id,dt,sum(amount) as amount
from tmp.table_test1
group by id,dt
having sum(amount) >100;
2. 运用等差序列的思路,两个等差数列如果等差相同,则相同位置的数据相减等到的结果相同
2.1 按照用户分组,同时按照时间排序,对数据进行编码排号row_number ,作为子表t2
select
id,dt,amount,
row_number() over(partition by id order by dt ) as rnum
from t1;
2.2 每行数据减去排序值,作为t3
select
id,
date_sub(dt,rnum) as flag, --连续的天数为相同的日期
amount
from t2
3.按照用户及Flag相同日期分组,求每个组有多少条数据,并找出大于等于3条的数据
select
id,flag,count(*) as num
from t3
group by id,flag
having count(*) >= 3
完整hql
select
id,flag,count(*) as num
from (
select
id,
date_sub(dt,rnum) as flag, --连续的天数为相同的日期
amount
from (
select
id,dt,amount,
row_number() over(partition by id order by dt ) as rnum
from (
select
id,dt,sum(amount) as amount
from tmp.table_test1
group by id,dt
having sum(amount) >100
)t1
)t2
)t3
group by id,flag
having count(*) >= 3;