一、业务需求
已有表:
输出表:
为了减轻复杂度,已经去掉了天的信息只留下了年月。
1、原始数据:
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
2、表的创建和加载
create table t_access_times(username string,month string,salary int)
row format delimited fields terminated by ',';
load data local inpath '/root/t_access_times.dat' into table t_access_times;
- 1
- 2
- 3
- 4
内容显示如下:
0: jdbc:hive2://mini01:10000> select * from t_access_times;
+--------------------------+-----------------------+------------------------+--+
| t_access_times.username | t_access_times.month | t_access_times.salary |
+--------------------------+-----------------------+------------------------+--+
| A | 2015-01 | 5 |
| A | 2015-01 | 15 |
| B | 2015-01 | 5 |
| A | 2015-01 | 8 |
| B | 2015-01 | 25 |
| A | 2015-01 | 5 |
| A | 2015-02 | 4 |
| A | 2015-02 | 6 |
| B | 2015-02 | 10 |
| B | 2015-02 | 5 |
+--------------------------+-----------------------+------------------------+--+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
3、单个用户的月总金额:
select username,month,sum(salary) as salary from t_access_times group by username,month;
+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
+-----------+----------+---------+--+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
3、上面的表自己跟自己进行inner join
select A.*,B.* from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
4、再在上面的表的基础上进行累计求和
select A.username,A.month,max(A.salary) salary,sum(B.salary) accumulate
from
上面的中间表
where B.month<=A.month
group by A.username,A.month
order by A.username,A.month
5、把所有的步骤合在一起如下:
select A.username,A.month,max(A.salary) salary,sum(B.salary) accumulate
from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month<=A.month
group by A.username,A.month
order by A.username,A.month
+-------------+----------+---------+-------------+--+
| a.username | a.month | salary | accumulate |
+-------------+----------+---------+-------------+--+
| A | 2015-01 | 33 | 33 |
| A | 2015-02 | 10 | 43 |
| B | 2015-01 | 30 | 30 |
| B | 2015-02 | 15 | 45 |
+-------------+----------+---------+-------------+--+