1、数据测试表及测试数据
测试表表结构:
查询测试数据:
2、实现连续登录天数据查询的Sql
实现的SQL:
查询结果:
3、说明与总结
此处用到了分析函数,如计算累积值、排名函数等,再构造SQL的过程中,也尝试使用了位移函数Lag/Lead和层次查询。总之,初看起来还是比较复杂的,一时半会不一定就能解决的了。
这些函数及SQL代码是在Hadoop/Hive平台上进行测试与验证的,相信在Oracle上实现应该也没有太大的问题;但相对于较低端开源免费的Mysql上,如实现起来可能会相当麻烦,目前Mysql对分析函数还不支持。
查询结果:
测试表表结构:
hive> desc data_room;
OK
roomid string
pt_month string
pt_day string
# Partition Information
# col_name data_type comment
pt_month string
pt_day string
Time taken: 0.158 seconds, Fetched: 9 row(s)
查询测试数据:
select * from data_room where pt_day between '2017-01-01' and '2017-01-16' and roomid=9999589;
9999589 2017-01 2017-01-01
9999589 2017-01 2017-01-02
9999589 2017-01 2017-01-04
9999589 2017-01 2017-01-05
9999589 2017-01 2017-01-06
9999589 2017-01 2017-01-07
9999589 2017-01 2017-01-08
9999589 2017-01 2017-01-09
9999589 2017-01 2017-01-10
9999589 2017-01 2017-01-12
9999589 2017-01 2017-01-13
9999589 2017-01 2017-01-14
9999589 2017-01 2017-01-15
9999589 2017-01 2017-01-16
2、实现连续登录天数据查询的Sql
实现的SQL:
with tab_room_live_continuity_preproc as(
select roomid,compare_day,continuity_days,sum(continuity_days)over(partition by roomid order by compare_day asc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) cumulative_continuity_days,row_number()over(partition by roomid order by compare_day asc) rn
from (
select roomid,date_sub(pt_day,rn) compare_day,count(*) continuity_days
from (select roomid,to_date(pt_day) pt_day,row_number()over(partition by roomid order by to_date(pt_day) asc) rn
from data_room
where roomid=9999589 and pt_day between '2017-01-01' and '2017-01-16') x
group by roomid,date_sub(pt_day,rn)) xx)
select roomid,continuity_days,continuity_frist_day,continuity_last_day from (
select roomid,compare_day,continuity_days,cumulative_continuity_days,date_add(compare_day,int(cumulative_continuity_days)-int(continuity_days)+1) continuity_frist_day,date_add(compare_day,int(cumulative_continuity_days)) continuity_last_day
from tab_room_live_continuity_preproc) x;
查询结果:
9999589 2 2017-01-01 2017-01-02
9999589 7 2017-01-04 2017-01-10
9999589 5 2017-01-12 2017-01-16
3、说明与总结
此处用到了分析函数,如计算累积值、排名函数等,再构造SQL的过程中,也尝试使用了位移函数Lag/Lead和层次查询。总之,初看起来还是比较复杂的,一时半会不一定就能解决的了。
这些函数及SQL代码是在Hadoop/Hive平台上进行测试与验证的,相信在Oracle上实现应该也没有太大的问题;但相对于较低端开源免费的Mysql上,如实现起来可能会相当麻烦,目前Mysql对分析函数还不支持。
补:其实还可以更简单
select roomid,min(pt_day) continuity_frist_day,max(pt_day) continuity_last_day,count(*) continuity_days
from (select roomid,to_date(pt_day) pt_day,row_number()over(partition by roomid order by to_date(pt_day) asc) rn
from data_room
where roomid=9999589 and pt_day between '2017-01-01' and '2017-01-16') x
group by roomid,date_sub(pt_day,rn);
查询结果:
9999589 2017-01-01 2017-01-02 2
9999589 2017-01-04 2017-01-10 7
9999589 2017-01-12 2017-01-16 5