全网最详细的Hive文章系列,强烈建议收藏加关注!
后面更新文章都会列出历史文章目录,帮助大家回顾知识重点。
目录
一、窗口函数 ROW_NUMBER,RANK,DENSE_RANK
系列历史文章
2021年大数据Hive(五):Hive的内置函数(数学、字符串、日期、条件、转换、行转列)
2021年大数据Hive(三):手把手教你如何吃透Hive数据库和表操作(学会秒变数仓大佬)
2021年大数据Hive(二):Hive的三种安装模式和MySQL搭配使用
前言
2021大数据领域优质创作博客,带你从入门到精通,该博客每天更新,逐渐完善大数据各个知识体系的文章,帮助大家更高效学习。
有对大数据感兴趣的可以关注微信公众号:三帮大数据
Hive的开窗函数
一、窗口函数 ROW_NUMBER,RANK,DENSE_RANK
1、数据准备
-
-
cookie1,2021-06-10,1
-
-
cookie1,2021-06-11,5
-
-
cookie1,2021-06-12,7
-
-
cookie1,2021-06-13,3
-
-
cookie1,2021-06-14,2
-
-
cookie1,2021-06-15,4
-
-
cookie1,2021-06-16,4
-
-
cookie2,2021-06-10,2
-
-
cookie2,2021-06-11,3
-
-
cookie2,2021-06-12,5
-
-
cookie2,2021-06-13,6
-
-
cookie2,2021-06-14,3
-
-
cookie2,2021-06-15,9
-
-
cookie2,2021-06-16,7
-
-
-
-
CREATE
TABLE it_t1 (
-
-
cookieid
string,
-
-
createtime
string,
--day
-
-
pv
INT
-
-
)
ROW
FORMAT
DELIMITED
FIELDS
TERMINATED
BY
',';
-
-
-
-
-- 加载数据:
-
-
load
data
local inpath
'/export/data/hivedatas/it_t2.txt'
into
table it_t1;
2、ROW_NUMBER
ROW_NUMBER() 从1开始,按照顺序,生成分组内记录的序列
-
SELECT
-
-
cookieid,
-
-
createtime,
-
-
pv,
-
-
ROW_NUMBER()
OVER(
PARTITION
BY cookieid
ORDER
BY pv
desc)
AS rn
-
-
FROM it_t1;
3、RANK 和 DENSE_RANK
RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位
DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位
-
SELECT
-
-
cookieid,
-
-
createtime,
-
-
pv,
-
-
RANK()
OVER(
PARTITION
BY cookieid
ORDER
BY pv
desc)
AS rn1,
-
-
DENSE_RANK()
OVER(
PARTITION
BY cookieid
ORDER
BY pv
desc)
AS rn2,
-
-
ROW_NUMBER()
OVER(
PARTITION
BY cookieid
ORDER
BY pv
DESC)
AS rn3
-
-
FROM it_t1
-
-
WHERE cookieid =
'cookie1';
二、Hive分析窗口函数 SUM,AVG,MIN,MAX
1、数据准备
-
--建表语句:
-
-
create
table it_t2(
-
-
cookieid
string,
-
-
createtime
string,
--day
-
-
pv
int
-
-
)
row
format
delimited
-
-
fields
terminated
by
',';
-
-
-
-
--加载数据:
-
-
load
data
local inpath
'/root/hivedata/ it_t2.txt'
into
table it_t2;
-
-
-
-
--开启智能本地模式
-
-
SET hive.exec.mode.local.auto=
true;
2、SUM(结果和ORDER BY相关,默认为升序)
-
select cookieid,createtime,pv,
-
-
sum(pv)
over(
partition
by cookieid
order
by createtime)
as pv1
-
-
from it_t2;
-
-
-
-
select cookieid,createtime,pv,
-
-
sum(pv)
over(
partition
by cookieid
order
by createtime
rows
between
unbounded
preceding
and
current
row)
as pv2
-
-
from it_t2;
-
-
-
-
select cookieid,createtime,pv,
-
-
sum(pv)
over(
partition
by cookieid)
as pv3
-
-
from it_t2;
--如果没有order by排序语句 默认把分组内的所有数据进行sum操作
-
-
-
-
select cookieid,createtime,pv,
-
-
sum(pv)
over(
partition
by cookieid
order
by createtime
rows
between
3
preceding
and
current
row)
as pv4
-
-
from it_t2;
-
-
-
-
select cookieid,createtime,pv,
-
-
sum(pv)
over(
partition
by cookieid
order
by createtime
rows
between
3
preceding
and
1
following)
as pv5
-
-
from it_t2;
-
-
-
-
select cookieid,createtime,pv,
-
-
sum(pv)
over(
partition
by cookieid
order
by createtime
rows
between
current
row
and
unbounded
following)
as pv6
-
-
from it_t2;
-
-
-
-
--pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
-
-
--pv2: 同pv1
-
-
--pv3: 分组内(cookie1)所有的pv累加
-
-
--pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号,
-
-
13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
-
-
--pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
-
-
--pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,
-
-
14号=14号+15号+16号=2+4+4=10
-
-
-
-
/*
-
-
- 如果不指定rows between,默认为从起点到当前行;
-
-
- 如果不指定order by,则将分组内所有值累加;
-
-
- 关键是理解rows between含义,也叫做window子句:
-
-
- preceding:往前
-
-
- following:往后
-
-
- current row:当前行
-
-
- unbounded:起点
-
-
- unbounded preceding 表示从前面的起点
-
-
- unbounded following:表示到后面的终点
-
-
*/
3、AVG,MIN,MAX
AVG,MIN,MAX和SUM用法一样
-
select cookieid,createtime,pv,
-
-
avg(pv)
over(
partition
by cookieid
order
by createtime
rows
between
unbounded
preceding
and
current
row)
as pv2
-
-
from it_t2;
-
-
-
-
select cookieid,createtime,pv,
-
-
max(pv)
over(
partition
by cookieid
order
by createtime
rows
between
unbounded
preceding
and
current
row)
as pv2
-
-
from it_t2;
-
-
-
-
select cookieid,createtime,pv,
-
-
min(pv)
over(
partition
by cookieid
order
by createtime
rows
between
unbounded
preceding
and
current
row)
as pv2
-
-
from it_t2;
- 📢博客主页:https://lansonli.blog.csdn.net
- 📢欢迎点赞 👍 收藏 ⭐留言 📝 如有错误敬请指正!
- 📢本文由 Lansonli 原创,首发于 CSDN博客🙉
- 📢大数据系列文章会每天更新,停下休息的时候不要忘了别人还在奔跑,希望大家抓紧时间学习,全力奔赴更美好的生活✨