hive四个排序

数据准备

-- 测试表
select * from test;
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 1            | 6            | 14           |
| 2            | 3            | 4            |
| 9            | 8            | 2            |
| 5            | 2            | 7            |
| 6            | 2            | 6            |
| 1            | 3            | 2            |
| 7            | 13           | 7            |
| 5            | 9            | 1            |
| 2            | 3            | 4            |
| 4            | 3            | 8            |
| 1            | 6            | 4            |
| 7            | 1            | 9            |
+--------------+--------------+--------------+--+

– reducetask的数量为2

set mapred.reduce.tasks=2;

1.order by

全局排序只有一个reduce

select * from test order by field1;
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 1            | 6            | 4            |
| 1            | 3            | 2            |
| 1            | 6            | 14           |
| 2            | 3            | 4            |
| 2            | 3            | 4            |
| 4            | 3            | 8            |
| 5            | 9            | 1            |
| 5            | 2            | 7            |
| 6            | 2            | 6            |
| 7            | 1            | 9            |
| 7            | 13           | 7            |
| 9            | 8            | 2            |
+--------------+--------------+--------------+--+
job: 1
Map: 1  Reduce: 1

多字段排序

select * from test order by field1,field2;
按字段1排序,字段1一样的按照字段2排序.
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 1            | 3            | 2            |
| 1            | 6            | 4            |
| 1            | 6            | 14           |
-----------------------------------------------
| 2            | 3            | 4            |
| 2            | 3            | 4            |
------------------------------------------------
| 4            | 3            | 8            |
-------------------------------------------------
| 5            | 2            | 7            |
| 5            | 9            | 1            |
------------------------------------------------
| 6            | 2            | 6            |
-----------------------------------------------
| 7            | 1            | 9            |
| 7            | 13           | 7            |
-------------------------------------------------
| 9            | 8            | 2            |
+--------------+--------------+--------------+--+
job: 1
Map: 1  Reduce: 1

2.sort by

每个reduce内部进行排序,局部有序全局不是有序

select * from test sort by field1;
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 1            | 6            | 4            |
| 1            | 6            | 14           |
| 4            | 3            | 8            |  reduce task 0
| 5            | 9            | 1            |
| 5            | 2            | 7            |
| 6            | 2            | 6            |
| 7            | 1            | 9            |
| 9            | 8            | 2            |
-------------------------------------------------
| 1            | 3            | 2            |
| 2            | 3            | 4            |  reduce task 1
| 2            | 3            | 4            |
| 7            | 13           | 7            |
+--------------+--------------+--------------+--+
job: 1
Map: 1  Reduce: 2

多字段排序

select * from test sort by field1,field2;
同一个reduce中,按字段1排序,字段1一样的按照字段2排序.
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 1            | 6            | 4            |
| 1            | 6            | 14           |
| 4            | 3            | 8            |
| 5            | 2            | 7            |
| 5            | 9            | 1            |
| 6            | 2            | 6            |
| 7            | 1            | 9            |
| 9            | 8            | 2            |
-----------------------------------------------
| 1            | 3            | 2            |
| 2            | 3            | 4            |
| 2            | 3            | 4            |
| 7            | 13           | 7            |
+--------------+--------------+--------------+--+

3.distribute by

控制在map端如何拆分数据给reduce端的
hive会根据distribute by后面列,根据reduce的个数进行数据分发,默认是采用hash算法。

select * from test distribute by field1 sort by field1;

将字段field1的hashcode值模于redice的numberr得到的结果相同的就分配到了
同一个reduce中,然后sort by在每个reduce中进行排序
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 2            | 3            | 4            | 2.hashCode() % 2 = 0
| 2            | 3            | 4            | 2.hashCode() % 2 = 0
| 4            | 3            | 8            | 4.hashCode() % 2 = 0         reduce task 0
| 6            | 2            | 6            | 6.hashCode() % 2 = 0
----------------------------------------------------
| 1            | 6            | 4            | 1.hashCode() % 2 = 1
| 1            | 3            | 2            | 1.hashCode() % 2 = 1
| 1            | 6            | 14           | 1.hashCode() % 2 = 1
| 5            | 9            | 1            | 5.hashCode() % 2 = 1         reduce task 1
| 5            | 2            | 7            | 5.hashCode() % 2 = 1
| 7            | 1            | 9            | 7.hashCode() % 2 = 1
| 7            | 13           | 7            | 7.hashCode() % 2 = 1
| 9            | 8            | 2            | 9.hashCode() % 2 = 1
+--------------+--------------+--------------+--+
分组字段 field1
排序字段 field1
job: 1
Map: 1  Reduce: 2
select * from test distribute by field1 sort by field2;
distribute by后面字段数据的hashCode值%reduce的个数,值相同的发送到同一个reduce
sort by后面字段,是每个reduce中按照这个字段排序
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 6            | 2            | 6            |
| 4            | 3            | 8            |
| 2            | 3            | 4            |
| 2            | 3            | 4            |
-------------------------------------------------
| 7            | 1            | 9            |
| 5            | 2            | 7            |
| 1            | 3            | 2            |
| 1            | 6            | 4            |
| 1            | 6            | 14           |
| 9            | 8            | 2            |
| 5            | 9            | 1            |
| 7            | 13           | 7            |
+--------------+--------------+--------------+--+
分组字段 field1
排序字段 field2
job:1
Map: 1  Reduce: 2
select * from test distribute by field1,field2 sort by field2;

多字段排序

select * from test distribute by field1 sort by field1,field2;
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 2            | 3            | 4            |
| 2            | 3            | 4            |
| 4            | 3            | 8            |
| 6            | 2            | 6            |
------------------------------------------------
| 1            | 3            | 2            |
| 1            | 6            | 4            |
| 1            | 6            | 14           |
                ---------------
| 5            | 2            | 7            |
| 5            | 9            | 1            |
                --------------
| 7            | 1            | 9            |
| 7            | 13           | 7            |
                ---------------
| 9            | 8            | 2            |
+--------------+--------------+--------------+--+
分组字段 field1
排序字段 field1,field2

job: 1
Map: 1  Reduce: 2

4.cluster by

相当于distribute by+sort by 同一个字段,只能默认升序,不能使用倒序;

select * from test distribute by field1 sort by field1;
select * from test cluster by field1;
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 2            | 3            | 4            |
| 2            | 3            | 4            |
| 4            | 3            | 8            |
| 6            | 2            | 6            |
---------------------------------------------------
| 1            | 6            | 4            |
| 1            | 3            | 2            |
| 1            | 6            | 14           |
| 5            | 9            | 1            |
| 5            | 2            | 7            |
| 7            | 1            | 9            |
| 7            | 13           | 7            |
| 9            | 8            | 2            |
+--------------+--------------+--------------+--+
分组字段 field1
排序字段 field1
job:1
Map: 1  Reduce: 2

多字段控制map数据输出到reduce,多字段排序

select * from test cluster by field1,field2;
+--------------+--------------+--------------+--+
| test.field1  | test.field2  | test.field3  |
+--------------+--------------+--------------+--+
| 1            | 3            | 2            |
| 5            | 9            | 1            |
| 6            | 2            | 6            |
                --------------
| 7            | 1            | 9            |
| 7            | 13           | 7            |
-------------------------------------------------
| 1            | 6            | 4            |
| 1            | 6            | 14           |
| 2            | 3            | 4            |
| 2            | 3            | 4            |
| 4            | 3            | 8            |
| 5            | 2            | 7            |
| 9            | 8            | 2            |
+--------------+--------------+--------------+--+
  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值