hive06: hive的排序方法

姚家金吼

于 2020-12-03 21:32:00 发布

阅读量89

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/yaojaijinhou/article/details/110571377

版权

7 篇文章 0 订阅

订阅专栏

1,a
4,b
5,b
7,c
3,a
2,a
8,e
6,c

create table tb_sort(
id int,
name string
)
row format delimited fields terminated by ',';
load data local inpath '/data/x' into table tb_sort;

select * from tb_sort order by id;

全局排序，查询结果慢

类似MR中partition，进行分区，结合sort by使用,否则结果并没有排序。

注意，Hive要求DISTRIBUTE BY语句要写在SORT BY语句之前。

select
*
from
tb_sort
distribute by id
sort by name;

1.设置reduce个数

set mapreduce.job.reduces=3;

2.sort by 排序

select
* 
from
tb_sort
sort by
id;

当分区字段和区内排序字段相同并且是升序的时候使用cluster by,
相当于distribute by 字段 sort by 字段合起来，但cluster by 不能用降序排列.

cluster by id 就相当于 distribute by id sort by id

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注