窗口函数:SUM、AVG、COUNT、MAX、MIN
-》功能:用于实现数据分区后的聚合
-》语法:fun_name(col1) over (partition by col2 order by col3)
实现功能 over 按照什么分区,分区内部按照什么排序
-》示例:实现分区内的累加,其他的原理类似
-》区别:
聚合函数:分组
group by :一组返回一条
开窗聚合:分区
partition by:将相同的数据放到一起
–创建文件:vim /export/datas/window.txt
cookie1,2018-04-10,1
cookie1,2018-04-11,5
cookie1,2018-04-12,7
cookie1,2018-04-13,3
cookie2,2018-04-13,3
cookie2,2018-04-14,2
cookie2,2018-04-15,4
cookie1,2018-04-14,2
cookie1,2018-04-15,4
cookie1,2018-04-16,4
cookie2,2018-04-10,1
cookie2,2018-04-11,5
cookie2,2018-04-12,7
–创建表
create database db_function;
use db_function;
create table itcast_f1(
cookieid string,
daystr string,
pv int
) row format delimited fields terminated by ‘,’;
–加载数据
load data local inpath ‘/export/datas/window.txt’ into table itcast_f1;
set hive.exec.mode.local.auto=true;
–实现分区内起始到当前行的pv累加,默认窗口:取第一行开始到当前行的和
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr) as pv1
from
itcast_f1;
–实现分区内所有pv的累加,不指定排序默认窗口:从第一行到最后一行
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid ) as pv2
from
itcast_f1;
–手动指定窗口的大小:分区
rows between 起始位置 and 结束位置
rows between unbounded preceding and current row
–实现分区内起始到当前行的pv累加
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between unbounded preceding and current row) as pv3
from
itcast_f1;
–实现分区内指定前N行到当前行的pv累加
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between 3 preceding and current row) as pv4
from
itcast_f1;
–实现分区内指定前N行到后N行的pv累加
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between 3 preceding and 1 following) as pv5
from
itcast_f1;
–实现分区内指定当前行到后N行的pv累加
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between current row and unbounded following) as pv6
from
itcast_f1;
- preceding:往前
- following:往后
- current row:当前行
- unbounded:起点
- unbounded preceding 表示从前面的起点
- unbounded following:表示到后面的终点