窗口函数详解
我们常见的窗口函数有max,min,avg,count,sum……除此之外还有几个关于排序的窗口函数。
例如:RANK() OVER(),dense_rank(),row_number()
一,测试准备:
1.数据准备.txt文档
95002, SuSan, women, 19, IS
95017, WangFeng juan, women, 18, IS
95018, WangYi, wonen, 19,IS
95013, FengNei, men, 21.c3
95014. WangXi aoli, women, 19, CS
95019, XingXiaoli, women, 19, IS
95020, ZhaoQi an, men, 21, IS
95003, WangMin, women, 22, MA
95004. ZhangLi, men, 19, I3
95012. SunHua, women, 20, cs
95010, Kongi aotao, men, 19,cS
95005, LiuGang, men, 18, JA
95006, SunQing, men, 23, C3
95007. YiSilin, women, 19, MA
95008, LiNa, women, 18, cs
95021, ZhouEr, men, 17, MA
95022. ZhengMIng, men, 20, MA
95001, 1iYong, men, 20, cs
95011, BaoXi aobai, men, 18, MA
95009, MengYuanyuan, men, 18, MA
95015, WangJun, men, 18, MAs
2.create database myhivetest; --新建数据库
3.在myhivetest数据库中建表
create table staff(
id int,name string, sex string,score int,department string
) row format delimited fields terminated by ",";
4.在表中加载数据
前提:将txt上传到Linux上,
提示:想查看文件路径用 -》》 pwd
load data local inpath "/home/hadoop/lzptest/self-study/student.txt" into table staff;
7.加载完成之后测试一下查询
select * from staff
4.此处在hive中的工作已经准备就绪
5.use myhivetest; --启动使用数据库,你可以试着从hive切还到beeline2再执行此命令,
4.查看当前正在使用的数据库
select current_database();
查看数据库下的表
show tables in myhivetest;
二,函数应用
max,min,avg,count,sum
又称为聚合函数,当只查询单个指标就可以直接查询,不需要order by。而查询语句中既有指标又有维度,这时候就需要order by 维度。
- max
- SELECT MAX(score) AS big_num FROM staff --查询最大值,单个字段
- SELECT id,name,sex,MAX(score) AS big_num ,department FROM staff GROUP BY id,name,sex ,department --查询最大值,多个字段
- min
- SELECT MIN(score) AS big_num FROM staff --查询最小值,单个字段
- SELECT id,name,sex,MIN(score) AS big_num ,department FROM staff GROUP BY id,name,sex ,department --查询最小值,多个字段
- avg
- SELECT AVG(score) AS big_num FROM staff --查询平均值,单个字段
- SELECT id,name,sex,AVG(score) AS big_num ,department FROM staff GROUP BY id,name,sex ,department --查询平均值,多个字段
- count
- SELECT count(字段) AS big_num FROM staff --统计行数,单个字段
- sum
- SELECT SUM(score) AS big_num FROM staff --求和,单个字段
RANK() OVER(),dense_rank(),row_number()
这几个都是排序的窗口函数,首先要说两个词desc和asc,降序和升序
我们最简单的排序是:
SELECT id,name,sex,score,department FROM staff ORDER BY score desc
但是这样排序下来的是这个样子的,这怎么可能知道排名?如果有老板看到你这样会让你滚蛋的
我们要的是什么?老板要的是什么?他要的是这个伢子的
RANK()
SELECT id,name,sex,RANK()OVER(ORDER BY score DESC ) AS rk,department FROM staff
RANK() OVER:排名之后,相同分数会占用名次位置,
dense_rank()
SELECT id,name,sex,dense_rank() OVER(ORDER BY score DESC) AS rk,department FROM test2
dense_rank():排名之后,相同的分数排名相同,但不会占用名次
row_number()
SELECT id,name,sex,score,department,row_number() over(ORDER BY score DESC) AS rk
FROM staff
–row_number():不会占用名次位置,并且以此排名。