hive 计算连续7天登录的用户

整体实现思路:
1.用户每天可能不止登陆一次,将登录日期去重,取出当日登陆成功的日期,row_number()函数分组排序并计数
2.日期减去计数得到值
3.根据每个用户count(值)判断连续登陆天数
4.最后取连续登陆天数大于等于7天的用户

示例:
CREATE TABLE db_test.user_log_test(
datestr string comment ‘日期’,
uid string comment ‘用户id’,
status int comment ‘登陆状态 1:成功 0:失败’)
comment ‘用户登陆日志表’
stored as orc;
insert into db_test.user_log_test values(‘2020-08-30’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-30’,‘1’,0);
insert into db_test.user_log_test values(‘2020-08-29’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-28’,‘1’,0);
insert into db_test.user_log_test values(‘2020-08-27’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-26’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-25’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-24’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-23’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-22’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-21’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-20’,‘1’,1);
insert into db_test.user_log_test values(‘2020-08-25’,‘2’,1);
insert into db_test.user_log_test values(‘2020-08-24’,‘2’,1);
insert into db_test.user_log_test values(‘2020-08-23’,‘2’,0);
insert into db_test.user_log_test values(‘2020-08-22’,‘2’,1);
insert into db_test.user_log_test values(‘2020-08-21’,‘2’,1);
insert into db_test.user_log_test values(‘2020-08-20’,‘2’,1);
insert into db_test.user_log_test values(‘2020-08-26’,‘3’,1);
insert into db_test.user_log_test values(‘2020-08-25’,‘3’,1);
insert into db_test.user_log_test values(‘2020-08-24’,‘3’,1);
insert into db_test.user_log_test values(‘2020-08-23’,‘3’,1);
insert into db_test.user_log_test values(‘2020-08-22’,‘3’,1);
insert into db_test.user_log_test values(‘2020-08-21’,‘3’,1);
insert into db_test.user_log_test values(‘2020-08-20’,‘3’,1);

操作步骤:
1.计算出用户登陆成功的日期
select
uid
,datestr
from (
select
uid
,datestr
,row_number()over(partition by uid,datestr order by datestr asc) as rn
–,row_number()over(distribute by uid,datestr sort by datestr asc) as rn
from db_test.user_log_test
where status=1
) a
where rn=1
;
2.日期减去计数得到值
select
uid,date_sub(datestr,num) date_rn
from (
select
uid
,datestr
,row_number() over(partition by uid order by datestr) as num
from (
select
uid
,datestr
,row_number()over(partition by uid,datestr order by datestr asc) as rn
–,row_number()over(distribute by uid,datestr sort by datestr asc) as rn
from db_test.user_log_test
where status=1
) a
where rn=1
) b
3.根据每个用户count(值)判断连续登陆天数
select uid,count(*) cnt–连续登陆天数
from (
select
uid,date_sub(datestr,num) date_rn
from (
select
uid
,datestr
,row_number() over(partition by uid order by datestr) as num
from (
select
uid
,datestr
,row_number()over(partition by uid,datestr order by datestr asc) as rn
–,row_number()over(distribute by uid,datestr sort by datestr asc) as rn
from db_test.user_log_test
where status=1
) a
where rn=1
) b)c
group by uid,date_rn
4.最后取连续登陆天数大于等于7天的用户
select uid
from (
select
uid,date_sub(datestr,num) date_rn
from (
select
uid
,datestr
,row_number() over(partition by uid order by datestr) as num
from (
select
uid
,datestr
,row_number()over(partition by uid,datestr order by datestr asc) as rn
–,row_number()over(distribute by uid,datestr sort by datestr asc) as rn
from db_test.user_log_test
where status=1
) a
where rn=1
) b)c
group by uid,date_rn
having count(1)>=7

5.结果(只有用户3连续登陆超过7天)
在这里插入图片描述

或者用窗口分析函数更快查询出来
–所有用户信息
select
*
from
(
select uid ,datestr, lead(datestr,6,-1) over(partition by uid order by datestr desc ) as date1
from db_test.user_log_test a
group by uid,datestr
) as b
where
date_sub(cast(b.datestr as date),6)=cast(b.date1 as date);

--用户数
select  
count(distinct uid)

from
(
select uid ,datestr, lead(datestr,6,-1) over(partition by uid order by datestr desc ) as date1
from db_test.user_log_test a
group by uid,datestr
) as b
where
date_sub(cast(b.datestr as date),6)=cast(b.date1 as date);

统计连续登陆7天的用户个数(n天就只需要把lead(date,6,-1)中的6改成n-1并且把date_sub(cast(b.date as date),6)中的6改成n-1)

  • 0
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值