Hive实现累计报表查询

最新推荐文章于 2024-07-19 09:46:40 发布

日拱一卒的Alex

最新推荐文章于 2024-07-19 09:46:40 发布

阅读量5.4k

点赞数 3

分类专栏： Hive 大数据文章标签： JAVA HIVE Hadoop 大数据 HQL

本文链接：https://blog.csdn.net/u012808902/article/details/77874722

版权

大数据同时被 2 个专栏收录

13 篇文章 1 订阅

订阅专栏

Hive

3 篇文章 0 订阅

订阅专栏

1.需求

有如下访客访问次数的统计表 t_access

访客	月份	访问次数
A	2015-01	5
A	2015-01	15
B	2015-01	5
A	2015-01	8
B	2015-01	25
A	2015-01	5
A	2015-02	4
A	2015-02	6
B	2015-02	10
B	2015-02	5
……	……	……

要求输出每个客户在每个月的总访问次数，以及在当前月份之前所有月份的累积访问次数。

输出表

访客	月份	月访问总计	累计访问总计
A	2015-01	33	33
A	2015-02	10	43
…….	…….	…….	…….
B	2015-01	30	30
B	2015-02	15	45
…….	…….	…….	…….

2.思路

1）第一步，先求每个用户的月总访问次数

select username,month,sum(count) as salary from t_access_times group by username,month

+-----------+----------+---------+--+
| username  |  month   | count   |
+-----------+----------+---------+--+
| A         | 2015-01  | 33      |
| A         | 2015-02  | 10      |
| B         | 2015-01  | 30      |
| B         | 2015-02  | 15      |
+-----------+----------+---------+--+

2）第二步，将月总访问次数表自己连接自己连接(内连接)

(select username,month,sum(count) as salary from t_access_times group by username,month) A

join

(select username,month,sum(count) as salary from t_access_times group by username,month) B

A.username=B.username

+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+

3）第三步，从上一步的结果中进行分组查询，分组的字段是a.username a.month，求月累计值：将b.month <= a.month的所有b.salary求和即可

3.HQL

select A.username,A.month,max(A.count) ,sum(B.count) 
from 
(select username,month,sum(count) as count from t_accessgroup by username,month) A 
inner join 
(select username,month,sum(count) as count from t_access group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;