Hive-----窗口函数

hive窗口函数

1.聚合函数over():指定分析函数工作的数据窗口大小,这个数据窗口大小可能会随着行的变而变化–eg:查询在2015年4月份购买过的顾客及总人数

select name,count(*) over ()
from tablename;
where substring(orderdate,1,7) = '2015-04'

2.partition by子句:将数据按照边界值分组–eg:看顾客的购买明细及月购买总额

select name,orderdate,cost,sum(cost) over(partition by month(orderdate))
from tablename

3.order by子句:让输入的数据强制排序–eg:看顾客的购买明细及月购买总额并按月份升序排列

select name,orderdate,cost,sum(cost) over(partition by month(orderdate) order by orderdate )
from tablename

4.window子句:对分组的数据进行更细腻的划分
4.1:- PRECEDING:往前
4.2- FOLLOWING:往后
4.3- CURRENT ROW:当前行
4.4- UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

eg:
select name,orderdate,cost,
sum(cost) over() as sample1,--所有行相加
sum(cost) over(partition by name) as sample2,--按name分组,组内数据相加
sum(cost) over(partition by name order by orderdate) as sample3,--按name分组,组内数据累加
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )  as sample4 ,--和sample3一样,由起点到当前行的聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING   and current row) as sample5, --当前行和前面一行做聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING   AND 1 FOLLOWING  ) as sample6,--当前行和前边一行及后面一行
sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行
from tablename;

5.序列函数NTILE:用于将分组数据按照顺序切分成n片,返回当前切片值

eg:select name,orderdate,cost,
       ntile(3) over() as sample1 , --全局数据切片
       ntile(3) over(partition by name), -- 按照name进行分组,在分组内将数据切成3份
       ntile(3) over(order by cost),--全局按照cost升序排列,数据切成3份
       ntile(3) over(partition by name order by cost ) --按照name分组,在分组内按照cost升序排列,数据切成3份
from tablename

6.排名函数
row_number:类似于对数据排序后直接添加行号:1234…,数据相同时排名不同,排名连续(没有并列名次情况,顺序递增)。
rank:1222557…,数据相同具有相同名次,排名跳跃(有并列名次情况,顺序跳跃递增)。
dense_rank:122233445…,数据相同具有相同名次,排名连续;(有并列名次情况,顺序递增)。

eg:

SELECT 
cookieid,
createtime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,//bank
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,//dense_bank
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3 //row_number
FROM tablename 
WHERE cookieid = 'cookie1';

7.LAG和LEAD函数:
lag:返回上数据行的数
lead:返回下数据行的数据

eg:
select name,orderdate,cost,
lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) as time1,
lag(orderdate,2) over (partition by name order by orderdate) as time2
from tablename;

8.first_value和last_value函数:
first_value:分组内排序后,截止到当前行,第一个值
last_value:分组内排序后,截止到当前行,最后一个值

eg:
select name,orderdate,cost,
first_value(orderdate) over(partition by name order by orderdate) as time1,
last_value(orderdate) over(partition by name order by orderdate) as time2
from tablename;

8.rand():随机去数据,但先得对随机数进行排序

select * from student order by rand() limit 3;
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值