【Hive】排序函数row_number的使用
语法格式:row_number() over(partition by 分组列 order by 排序列 desc)
row_number() over()分组排序功能:
在使用 row_number() over()函数时候,over()里头的分组以及排序的执行晚于 where 、group by、 order by 的执行。
例一:
表数据:
create table TEST_ROW_NUMBER_OVER(
id varchar(10) not null,
name varchar(10) null,
age varchar(10) null,
salary int null
);
select * from TEST_ROW_NUMBER_OVER t;
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(1,'a',10,8000);
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(1,'a2',11,6500);
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(2,'b',12,13000);
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(2,'b2',13,4500);
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(3,'c',14,3000);
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(3,'c2',15,20000);
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(4,'d',16,30000);
insert into TEST_ROW_NUMBER_OVER(id,name,age,salary) values(5,'d2',17,1800);
一次排序:对查询结果进行排序(无分组)
select id,name,age,salary,row_number()over(order by salary desc) rn
from TEST_ROW_NUMBER_OVER t
结果:
进一步排序:根据id分组排序
select id,name,age,salary,row_number()over(partition by id order by salary desc) rank
from TEST_ROW_NUMBER_OVER t
结果:
再一次排序:找出每一组中序号为一的数据:
select * from(select id,name,age,salary,row_number()over(partition by id order by salary desc) rank
from TEST_ROW_NUMBER_OVER t)
where rank <2
结果:
排序找出年龄在13岁到16岁数据,按salary排序
select id,name,age,salary,row_number()over(order by salary desc) rank
from TEST_ROW_NUMBER_OVER t where age between '13' and '16'
结果:结果中 rank 的序号,其实就表明了 over(order by salary desc) 是在where age between and 后执行的
排序函数的区别
row_number()是hive排序函数中的一种,常与rank()和dense_rank()作比较,其区别点在与:
- row_number() ,中的排序是顺序排的,数字依次累加,不考虑是否重复为1,2,3,4;
- rank(),中的排序考虑重复问题,如果有重复排序编号相同,但会留下空位,1,2,2,4;
- dense_rank(),中的排序同样考虑重复问题,但不会留下空位,1,2,2,3;