hive上连续登录天数的查询

60 篇文章 2 订阅
1、数据测试表及测试数据
测试表表结构:
hive> desc data_room;
OK
roomid                  string                                      
pt_month                string                                      
pt_day                  string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
pt_month                string                                      
pt_day                  string                                      
Time taken: 0.158 seconds, Fetched: 9 row(s)

查询测试数据:
select * from data_room where pt_day between '2017-01-01' and '2017-01-16' and roomid=9999589;
9999589 2017-01 2017-01-01
9999589 2017-01 2017-01-02

9999589 2017-01 2017-01-04
9999589 2017-01 2017-01-05
9999589 2017-01 2017-01-06
9999589 2017-01 2017-01-07
9999589 2017-01 2017-01-08
9999589 2017-01 2017-01-09
9999589 2017-01 2017-01-10

9999589 2017-01 2017-01-12
9999589 2017-01 2017-01-13
9999589 2017-01 2017-01-14
9999589 2017-01 2017-01-15
9999589 2017-01 2017-01-16

2、实现连续登录天数据查询的Sql
实现的SQL:
with tab_room_live_continuity_preproc as(
select roomid,compare_day,continuity_days,sum(continuity_days)over(partition by roomid order by compare_day asc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) cumulative_continuity_days,row_number()over(partition by roomid order by compare_day asc) rn 
from (
select roomid,date_sub(pt_day,rn) compare_day,count(*) continuity_days
from (select roomid,to_date(pt_day) pt_day,row_number()over(partition by roomid order by to_date(pt_day) asc) rn
from data_room 
where roomid=9999589 and pt_day between '2017-01-01' and '2017-01-16') x
group by roomid,date_sub(pt_day,rn)) xx)
select roomid,continuity_days,continuity_frist_day,continuity_last_day from (
select roomid,compare_day,continuity_days,cumulative_continuity_days,date_add(compare_day,int(cumulative_continuity_days)-int(continuity_days)+1) continuity_frist_day,date_add(compare_day,int(cumulative_continuity_days)) continuity_last_day 
from tab_room_live_continuity_preproc) x;

查询结果:
9999589 2       2017-01-01      2017-01-02
9999589 7       2017-01-04      2017-01-10
9999589 5       2017-01-12      2017-01-16

3、说明与总结
此处用到了分析函数,如计算累积值、排名函数等,再构造SQL的过程中,也尝试使用了位移函数Lag/Lead和层次查询。总之,初看起来还是比较复杂的,一时半会不一定就能解决的了。
这些函数及SQL代码是在Hadoop/Hive平台上进行测试与验证的,相信在Oracle上实现应该也没有太大的问题;但相对于较低端开源免费的Mysql上,如实现起来可能会相当麻烦,目前Mysql对分析函数还不支持。


补:其实还可以更简单

select roomid,min(pt_day) continuity_frist_day,max(pt_day) continuity_last_day,count(*) continuity_days
from (select roomid,to_date(pt_day) pt_day,row_number()over(partition by roomid order by to_date(pt_day) asc) rn
from data_room 
where roomid=9999589 and pt_day between '2017-01-01' and '2017-01-16') x
group by roomid,date_sub(pt_day,rn);

查询结果:
9999589 2017-01-01      2017-01-02      2
9999589 2017-01-04      2017-01-10      7
9999589 2017-01-12      2017-01-16      5

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值