题目1、
该题目的两个核心思路:
1、自连接
2、倒推法现有这么一批数据,现要求出:
每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数
三个字段的意思:
用户名,月份,访问次数
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,11
最后结果展示:
用户 月份 最大访问次数 总访问次数 当月访问次数
A 2015-01 33 33 33
A 2015-02 33 43 10
A 2015-03 38 81 38
B 2015-01 30 30 30
B 2015-02 30 45 15
B 2015-03 44 89 44
第一步
select name,mon,sum(num) from hive1
group by name,mon order by name,mon;
当月访问次数:
A 2015-01 33
A 2015-02 10
A 2015-03 38
B 2015-01 30
B 2015-02 15
B 2015-03 44
第二步:
select name,mon,sum(num)
from hive1 b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a
where b.mon <= a.mon and a.name=b.name
group by a.name,a.mon;
总访问次数:
name mon _c2
A 2015-01 33
A 2015-02 43
A 2015-03 81
B 2015-01 30
B 2015-02 45
B 2015-03 89
第三步:
select name,mon,max(b.s)
from
(select name,mon,sum(num) s from hive1 group by name,mon order by name,mon)b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a
where b.mon <= a.mon and a.name=b.name
group by a.name,a.mon;
最大访问次数:
name mon _c2
A 2015-01 33
A 2015-02 33
A 2015-03 38
B 2015-01 30
B 2015-02 30
B 2015-03 44
第四步:
select aa.name,aa.mon,cc.ss,aa.tt,bb.mm
from
(select name,mon,sum(num) tt
from hive1 b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a
where b.mon <= a.mon and a.name=b.name
group by a.name,a.mon)aa
join
(select name,mon,sum(num) mm from hive1 group by name,mon order by name,mon)bb
on aa.name=bb.name and aa.mon=bb.mon
join
(select name,mon,max(b.s) ss
from
(select name,mon,sum(num) s from hive1 gro