数据准备
创建一个名为data的表,表内数据及存储位置可自定义
create external table data(
cookieid string,
create_time string,
num int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
查看表内数据:
hive> select * from data;
OK
cookie1 2015-04-10 1
cookie1 2015-04-11 5
cookie1 2015-04-12 7
cookie1 2015-04-13 3
cookie1 2015-04-14 2
cookie1 2015-04-15 4
cookie1 2015-04-16 4
1.SUM
sum函数实现分组内和的统计和连续累计的统计,可以做任意行相加
SUM(pv) OVER(partition by cookieid order by createtime) AS pv1,
对pv进行逐个相加,并以日期正序进行排序,其输出结果如下所示,pv1为pv中当前数据之前所有数据之和
cookieid createtime pv pv1
-------------------------------------
cookie1 2015-04-10 1 1
cookie1 2015-04-11 5 6
cookie1 2015-04-12 7 13
cookie1 2015-04-13 3 16
cookie1 2015-04-14 2 18
cookie1 2015-04-15 4 22
cookie1 2015-04-16 4 26
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
ROWS BETWEEN语句也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点
SUM(pv) OVER(partition BY cookieid order by createtime rows between 3 preceding and current row) AS pv2,
pv2数据为该数据加前三行的数据相加
cookieid createtime pv pv2
-------------------------------------
cookie1 2015-04-10 1 1
cookie1 2015-04-11 5 6
cookie1 2015-04-12 7 13
cookie1 2015-04-13 3 16
cookie1 2015-04-14 2 17
cookie1 2015-04-15 4 16
cookie1 2015-04-16 4 13
其他窗口函数(AVG(MIN) (MAX)用法与sum一致