Hive中一个较难的面试sql-级联求和

现有有用户每月销售额的表

用户 月份(日期给省了) 销售金额
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5

现在需要输出报表,各用户每月销售金额与总金额(比如2月份总金额应该是1月和2月销售额之和)

+-------------+----------+-------+---------+
| username    | month    | sal   | totalc  |
+-------------+----------+-------+---------+
| A           | 2015-01  | 33.0  | 33.0    |
| A           | 2015-02  | 10.0  | 43.0    |
| B           | 2015-01  | 30.0  | 30.0    |
| B           | 2015-02  | 15.0  | 45.0    |
+-------------+----------+-------+---------+

思路
可以先求出各个用户每个月销售额

+-----------+----------+---------+
| username  |  month   | salary  |
+-----------+----------+---------+
| A         | 2015-01  | 33      |
| A         | 2015-02  | 10      |
| B         | 2015-01  | 30      |
| B         | 2015-02  | 15      |
+-----------+----------+---------+

然后将这张表自己跟自己内连接,条件为A.username=B.username

+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+

左边为A表字段,右边为B表字段。接着根据A表的username和month进行分组,而对B表的salary进行求和,比如A用户的1月份为一组的话,那么求和会是B表的A用户的1月份和2月份销售额之和,显然不行,实际需要的是小于等于1月份的销售额之和,所以需要加判断条件B.month<=A.month。

hive上进行操作
准备工作
将数据写入文件
创建表time_money
将文件数据导入表

[root@mini1 ~]# vi time_money.data;
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
0: jdbc:hive2://localhost:10000> create table time_money(username string,month string, sal string)
0: jdbc:hive2://localhost:10000> row format delimited fields terminated by ',';
No rows affected (0.095 seconds)
0: jdbc:hive2://localhost:10000> load data local inpath '/root/time_money.data' into table time_money;
0: jdbc:hive2://localhost:10000> select * from time_money;
+----------------------+-------------------+-----------------+--+
| time_money.username  | time_money.month  | time_money.sal  |
+----------------------+-------------------+-----------------+--+
| A                    | 2015-01           | 5               |
| A                    | 2015-01           | 15              |
| B                    | 2015-01           | 5               |
| A                    | 2015-01           | 8               |
| B                    | 2015-01           | 25              |
| A                    | 2015-01           | 5               |
| A                    | 2015-02           | 4               |
| A                    | 2015-02           | 6               |
| B                    | 2015-02           | 10              |
| B                    | 2015-02           | 5               |
+----------------------+-------------------+-----------------+--+

第一步:
求每个用户的各个月份销售额

0: jdbc:hive2://localhost:10000> select username,month,sum(sal) sal from time_money group by username,month;
+-----------+----------+-------+--+
| username  |  month   |  sal  |
+-----------+----------+-------+--+
| A         | 2015-01  | 33.0  |
| A         | 2015-02  | 10.0  |
| B         | 2015-01  | 30.0  |
| B         | 2015-02  | 15.0  |
+-----------+----------+-------+--+

第二步:
将上面的表自己跟自己内连接,条件为A.username=B.usernmae

0: jdbc:hive2://localhost:10000> select * 
0: jdbc:hive2://localhost:10000> from
0: jdbc:hive2://localhost:10000>  (select username,month,sum(sal)as sal from time_money group by username,month) A 
0: jdbc:hive2://localhost:10000> inner join
0: jdbc:hive2://localhost:10000> (select username,month,sum(sal)as sal from time_money group by username,month) B
0: jdbc:hive2://localhost:10000> on
0: jdbc:hive2://localhost:10000> A.username=B.username;
+-------------+----------+--------+-------------+----------+--------+--+
| a.username  | a.month  | a.sal  | b.username  | b.month  | b.sal  |
+-------------+----------+--------+-------------+----------+--------+--+
| A           | 2015-01  | 33.0   | A           | 2015-01  | 33.0   |
| A           | 2015-01  | 33.0   | A           | 2015-02  | 10.0   |
| A           | 2015-02  | 10.0   | A           | 2015-01  | 33.0   |
| A           | 2015-02  | 10.0   | A           | 2015-02  | 10.0   |
| B           | 2015-01  | 30.0   | B           | 2015-01  | 30.0   |
| B           | 2015-01  | 30.0   | B           | 2015-02  | 15.0   |
| B           | 2015-02  | 15.0   | B           | 2015-01  | 30.0   |
| B           | 2015-02  | 15.0   | B           | 2015-02  | 15.0   |
+-------------+----------+--------+-------------+----------+--------+--+

第三步:
从上一步的结果中
进行分组查询,分组的字段是a.username a.month
求月累计值: 将b.month <= a.month的所有b.salary求和即可

0: jdbc:hive2://localhost:10000> select A.username, A.month,max(A.sal) sal,sum(B.sal) totalc
0: jdbc:hive2://localhost:10000> from
0: jdbc:hive2://localhost:10000> (select username,month,sum(sal)as sal from time_money group by username,month) A 
0: jdbc:hive2://localhost:10000> inner join
0: jdbc:hive2://localhost:10000>  (select username,month,sum(sal)as sal from time_money group by username,month) B
0: jdbc:hive2://localhost:10000> on
0: jdbc:hive2://localhost:10000> A.username = B.username
0: jdbc:hive2://localhost:10000> where B.month <= A.month
0: jdbc:hive2://localhost:10000> group by A.username,A.month
0: jdbc:hive2://localhost:10000> order by A.username,A.month;
+-------------+----------+-------+---------+--+
| a.username  | a.month  |  sal  | totalc  |
+-------------+----------+-------+---------+--+
| A           | 2015-01  | 33.0  | 33.0    |
| A           | 2015-02  | 10.0  | 43.0    |
| B           | 2015-01  | 30.0  | 30.0    |
| B           | 2015-02  | 15.0  | 45.0    |
+-------------+----------+-------+---------+--+

注:建表的时候应该讲sal设置为int类型而不是string,那样就不会出现小数。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值