HiveSQL找出连续日期及连续的天数(方法二)

经常遇到统计连续行为日期及连续天数的问题,之前的解决方案相对复杂,本次给出最简单的解决方案。

1、测试数据:

use xxx;
drop table test_serialdate;
create table if not exists test_serialdate (
  user_id  string comment '用户ID',
  sign_date  string comment '签到日期' 
) stored as orcfile
;
 
insert into table test_serialdate 
select 'u01' as user_id, '2022-01-01' as sign_date union all 
select 'u01' as user_id, '2022-01-02' as sign_date union all 
select 'u01' as user_id, '2022-01-05' as sign_date union all 
select 'u01' as user_id, '2022-01-06' as sign_date union all 
select 'u01' as user_id, '2022-01-08' as sign_date union all 
select 'u01' as user_id, '2022-01-09' as sign_date union all 
select 'u01' as user_id, '2022-01-10' as sign_date union all 
select 'u01' as user_id, '2022-01-11' as sign_date union all 
select 'u01' as user_id, '2022-01-17' as sign_date union all 
select 'u01' as user_id, '2022-01-18' as sign_date 
;

2、思路:

解题思路和举例
行为日期排序date_sub(行为日期,序号)天数(分组count)
2022-01-0212022-01-013天
2022-01-0322022-01-01
2022-01-0432022-01-01
2022-01-0742022-01-031天
2022-01-1552022-01-105天
2022-01-1662022-01-10
2022-01-1772022-01-10
2022-01-1882022-01-10
2022-01-1992022-01-10
2022-01-29102022-01-191天

 

3、代码:

--统计每次连续打卡的次数 
select user_id,sign_group,
       min(sign_date) as sign_start_date,
       max(sign_date) as sign_end_date,
       count(1) as days 
from 
( --打卡日期-排序序号,若连续打卡则结果相同 
   select user_id,sign_date, 
          date_sub(sign_date,rk) as sign_group    --连续行为则日期相同,用来分组
   from 
   ( --打卡记录(先排重),升序排序 
      select user_id,sign_date,
             rank() over(partition by user_id order by sign_date asc) as rk 
      from test_serialdate 
   ) tc1 
) tc2 
group by user_id,sign_group 
;

 

 

 

  • 2
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值