数据
有如下数据:
用户,日期,月销售额
A,2015-01-08,5
A,2015-01-11,15
B,2015-01-12,5
A,2015-01-12,8
B,2015-01-13,25
A,2015-01-13,5
C,2015-01-09,10
C,2015-01-11,20
A,2015-02-10,4
A,2015-02-11,6
C,2015-01-12,30
C,2015-02-13,10…
需求
使用hql实现如下累积报表:
用户 | 月份 | 月总额 | 累计到当月的总额 |
A | 2015-01 | 33 | 33 |
A | 2015-02 | 10 | 43 |
A | 2015-03 | 30 | 73 |
B | 2015-01 | 30 | 30 |
B | 2015-02 | 15 | 45 |
实现:普通方法
1.建表、导入数据
create table t_sale(username string,day string,msale int)
row format delimited fields terminated by ",";
load data local inpath '/root/sale.txt' into table t_sale;
2.按月份将数据分组聚合
select username,substr(day,1,7) as month_sale,sum(msale)
from t_sale
group by username,substr(day,1,7);
结果:
+———–+————-+——+–+
| username | month_sale | _c2 |
+———–+————-+——+–+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| A | 2015-03 | 20 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
| B | 2015-03 | 45 |
| C | 2015-01 | 60 |
| C | 2015-02 | 10 |
| C | 2015-03 | 30 |
+———–+————-+——+–+
3.将聚合后生成的临时表自关联
select t1.*,t2.* from
(select username,substr(day,1,7) as month_sale,sum(msale) as cnt
from t_sale
group by username,substr(day,1,7)) t1
join
(select username,substr(day,1,7) as month_sale,sum(msale) as cnt
from t_sale
group by username,substr(day,1,7)) t2
on t1.username=t2.username;
结果:
+————–+—————-+———+————–+—————-+———+–+
| t1.username | t1.month_sale | t1.cnt | t2.username | t2.month_sale | t2.cnt |
+————–+—————-+———+————–+—————-+———+–+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-01 | 33 | A | 2015-03 | 20 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-03 | 20 |
| A | 2015-03 | 20 | A | 2015-01 | 33 |
| A | 2015-03 | 20 | A | 2015-02 | 10 |
| A | 2015-03 | 20 | A | 2015-03 | 20 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-01 | 30 | B | 2015-03 | 45 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-03 | 45 |
| B | 2015-03 | 45 | B | 2015-01 | 30 |
| B | 2015-03 | 45 | B | 2015-02 | 15 |
| B | 2015-03 | 45 | B | 2015-03 | 45 |
| C | 2015-01 | 60 | C | 2015-01 | 60 |
| C | 2015-01 | 60 | C | 2015-02 | 10 |
| C | 2015-01 | 60 | C | 2015-03 | 30 |
| C | 2015-02 | 10 | C | 2015-01 | 60 |
| C | 2015-02 | 10 | C | 2015-02 | 10 |
| C | 2015-02 | 10 | C | 2015-03 | 30 |
| C | 2015-03 | 30 | C | 2015-01 | 60 |
| C | 2015-03 | 30 | C | 2015-02 | 10 |
| C | 2015-03 | 30 | C | 2015-03 | 30 |
+————–+—————-+———+————–+—————-+———+–+
4.对自关联后的数据进行筛选,选出右表月份<=左表月份的数据,以便进行累加
select t1.*,t2.* from
(select username,substr(day,1,7) as month_sale,sum(msale) as cnt
from t_sale
group by username,substr(day,1,7)) t1
join
(select username,substr(day,1,7) as month_sale,sum(msale) as cnt
from t_sale
group by username,substr(day,1,7)) t2
on t1.username=t2.username
where t1.month_sale>=t2.month_sale;
结果:
+————–+—————-+———+————–+—————-+———+–+
| t1.username | t1.month_sale | t1.cnt | t2.username | t2.month_sale | t2.cnt |
+————–+—————-+———+————–+—————-+———+–+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| A | 2015-03 | 20 | A | 2015-01 | 33 |
| A | 2015-03 | 20 | A | 2015-02 | 10 |
| A | 2015-03 | 20 | A | 2015-03 | 20 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
| B | 2015-03 | 45 | B | 2015-01 | 30 |
| B | 2015-03 | 45 | B | 2015-02 | 15 |
| B | 2015-03 | 45 | B | 2015-03 | 45 |
| C | 2015-01 | 60 | C | 2015-01 | 60 |
| C | 2015-02 | 10 | C | 2015-01 | 60 |
| C | 2015-02 | 10 | C | 2015-02 | 10 |
| C | 2015-03 | 30 | C | 2015-01 | 60 |
| C | 2015-03 | 30 | C | 2015-02 | 10 |
| C | 2015-03 | 30 | C | 2015-03 | 30
5.将数据进行聚合统计,累加当月销售额
本案例完整hql语句
select t1.username,t1.month_sale,max(t1.cnt),sum(t2.cnt) from
(select username,substr(day,1,7) as month_sale,sum(msale) as cnt
from t_sale
group by username,substr(day,1,7)) t1
join
(select username,substr(day,1,7) as month_sale,sum(msale) as cnt
from t_sale
group by username,substr(day,1,7)) t2
on t1.username=t2.username
where t1.month_sale>=t2.month_sale
group by t1.username,t1.month_sale
;
结果:
+————–+—————-+——+——+–+
| t1.username | t1.month_sale | _c2 | _c3 |
+————–+—————-+——+——+–+
| A | 2015-01 | 33 | 33 |
| A | 2015-02 | 10 | 43 |
| A | 2015-03 | 20 | 63 |
| B | 2015-01 | 30 | 30 |
| B | 2015-02 | 15 | 45 |
| B | 2015-03 | 45 | 90 |
| C | 2015-01 | 60 | 60 |
| C | 2015-02 | 10 | 70 |
| C | 2015-03 | 30 | 100 |
+————–+—————-+——+——+–+
**:当字段对应数据相同时,实现去重可以使用聚合函数max()、min().
实现:窗口函数实现累加
select username,month,m_sale,
sum(m_sale) over (partition by username order by month rows between unbounded preceding and current row)
from
(select username,substr(day,1,7) as month ,sum(msale) as m_sale
from t_sale
group by username,substr(day,1,7))tmp;