基本需求:
根据访客的每日访问信息,进行累计访问,计算每个月访问量,总访问量。
输入数据:
有如下访客访问次数统计表 t_access_times。为了减轻计算复杂度,去掉了天的信息只留下了年月。
+--------------------------+-----------------------+-----------------------------+--+
| t_access_times.username | t_access_times.month | t_access_times.access_time |
+--------------------------+-----------------------+-----------------------------+--+
| A | 2088-01 | 8 |
| A | 2088-01 | 88 |
| B | 2088-01 | 8 |
| A | 2088-01 | 8 |
| B | 2088-01 | 28 |
| A | 2088-01 | 8 |
| A | 2088-02 | 4 |
| A | 2088-02 | 6 |
| B | 2088-02 | 80 |
| B | 2088-02 | 8 |
+--------------------------+-----------------------+-----------------------------+--+
输出数据:
需要输出报表:t_access_times_accumulate
+-------------+----------+------+-------------+--+
| a.username | a.month | _c2 | accumulate |
+-------------+----------+------+-------------+--+
| A | 2088-01 | 112 | 112 |
| A | 2088-02 | 10 | 122 |
| B | 2088-01 | 36 | 36 |
| B | 2088-02 | 88 | 124 |
+-------------+----------+------+-------------+--+
实现步骤:
- 创建表,加载数据
数据源:
A,2088-01,8
A,2088-01,88
B,2088-01,8
A,2088-01,8
B,2088-01,28
A,2088-01,8
A,2088-02,4
A,2088-02,6
B,2088-02,80
B,2088-02,8
创建表:
create table t_access_times(username string,month string,access_time int)
row format delimited fields terminated by ',';
加载数据:
load data local inpath '/home/chunsoft/t_access_times.dat' into table t_access_times;
- 计算单个用户的月访问次数
select username,month,sum(access_time ) as nums from t_access_times group by username,month;
+-----------+----------+-------+--+
| username | month | nums |
+-----------+----------+-------+--+
| A | 2088-01 | 112 |
| A | 2088-02 | 10 |
| B | 2088-01 | 36 |
| B | 2088-02 | 88 |
+-----------+----------+-------+--+
- 上面的表自己跟自己进行inner join 方便求总访问量
select a.*,b.*from
(select username,month,sum(access_time) as nums from t_access_times group by username,month) a inner join
(select username,month,sum(access_time) as nums from t_access_times group by username,month) b
on a.username = b.username;
+-------------+----------+---------+-------------+----------+---------+--+
| a.username | a.month | a.nums | b.username | b.month | b.nums |
+-------------+----------+---------+-------------+----------+---------+--+
| A | 2088-01 | 112 | A | 2088-01 | 112 |
| A | 2088-01 | 112 | A | 2088-02 | 10 |
| A | 2088-02 | 10 | A | 2088-01 | 112 |
| A | 2088-02 | 10 | A | 2088-02 | 10 |
| B | 2088-01 | 36 | B | 2088-01 | 36 |
| B | 2088-01 | 36 | B | 2088-02 | 88 |
| B | 2088-02 | 88 | B | 2088-01 | 36 |
| B | 2088-02 | 88 | B | 2088-02 | 88 |
+-------------+----------+---------+-------------+----------+---------+--+
- 根据当前月份的大小来判断累计求和并排序
select a.username,a.month,max(a.nums) ,sum(b.nums) as accumulate from
(select username,month,sum(access_time) as nums from t_access_times group by username,month) a inner join
(select username,month,sum(access_time) as nums from t_access_times group by username,month) b
on a.username = b.username
where a.month>=b.month
group by a.username,a.month
order by a.username,a.month;
+-------------+----------+------+-------------+--+
| a.username | a.month | _c2 | accumulate |
+-------------+----------+------+-------------+--+
| A | 2088-01 | 112 | 112 |
| A | 2088-02 | 10 | 122 |
| B | 2088-01 | 36 | 36 |
| B | 2088-02 | 88 | 124 |
+-------------+----------+------+-------------+--+