1.建表
create table sigin(
userid int,
sigindate string
)row format delimited
fields terminated by " ";
2.数据准备
1 2017-01-01
1 2017-01-02
2 2017-01-11
1 2017-01-12
2 2017-01-12
1 2017-01-03
1 2017-01-04
2 2017-01-01
2 2017-01-02
2 2017-01-03
1 2017-01-10
2 2017-01-10
1 2017-01-11
3.导入
load data local inpath '/usr/hivedata/user.txt' into table sigin;
4.处理数据
4-1.思路:
将每个用户的所有的日期值减去排序后的row_number值,如果是连续的的日期值就会得到一个相同的日期值,然后用这个日期值分组就可以得到连续多少天登录
4-2.代码实现
select userid,min(sigindate),max(sigindate),count(*) days,sigin_date from
(select userid,sigindate,date_sub(sigindate,row_number() over(partition by userid order by sigindate)) sigin_date from sigin) t1 group by userid,sigin_date;
4-3.结果
+---------+-------------+-------------+-------+-------------+--+
| userid | c1 | c2 | days | sigin_date |
+---------+-------------+-------------+-------+-------------+--+
| 1 | 2017-01-01 | 2017-01-04 | 4 | 2016-12-31 |
| 1 | 2017-01-10 | 2017-01-12 | 3 | 2017-01-05 |
| 2 | 2017-01-01 | 2017-01-03 | 3 | 2016-12-31 |
| 2 | 2017-01-10 | 2017-01-12 | 3 | 2017-01-06 |
+---------+-------------+-------------+-------+-------------+--+
5.总结
这种连续登录的需求的关键是找到数据之间的关系,得出规律就很好解决了