聚合窗口
- count、sum、max、min、avg
- sum:其他函数与sum的用法一致
sum(col1) over (partition by col2 order by col3)
partition by:按照某一列进行分区,这一列的 值相同的行会放到一起
order by:分区内部按照某一列进行排序
注意:这里的order by与SQL语句中的全局排序的order by没有关系的
全局排序【order by】和局部排序【sort by】:指的是底层MapReduce的排序
案例:根据以下数据实现统计:每个用户每个月累计的总金额
- 原始数据
username,month,price
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,7
A,2015-03,9
B,2015-03,11
B,2015-03,6
期望结果
用户id 月份 当月金额 累计金额
+-------------+----------+---------+-------------+--+
| A | 2015-01 | 33 | 33 |
| A | 2015-02 | 10 | 43 |
| A | 2015-03 | 16 | 59 |
| B | 2015-01 | 30 | 30 |
| B | 2015-02 | 15 | 45 |
| B | 2015-03 | 17 | 62 |
+-------------+----------+---------+-------------+--+
drop table asmp.test_flume
--创建表
create external table asmp.test_flume (
username string,
month string,
price string
)
row format delimited fields terminated by '\t'
lines terminated by '\n'
LOCATION'/user/asmp/hive/asmp/test_flume';
--加载数据
load data inpath '/user/zzf/hive/zzf/test_flume/test.txt' into asmp.test_flume
--实现
select
a.username,
a.month,
a.mon_price,
sum(a.mon_price) over(partition by username