hive sql order by 与sort by

hive> select * from lpx_partition_test;
OK
99001    xxxcompany_name1    2011-06-08    GuangDong
99001    xxxcompany_name1    2011-06-08    ZheJiang
99002    xxxcompany_name1    2011-06-08    ZheJiang
99003    xxxcom2    2011-06-08    ZheJiang
99001    xxxcompany_name1    2011-06-09    GuangDong
99001    xxxcompany_name1    2011-06-09    ZheJiang
99002    xxxcompany_name1    2011-06-09    ZheJiang
99003    xxxcom2    2011-06-09    ZheJiang
99001    xxxcompany_name1    20110608    GuangDong
99001    xxxcompany_name1    20110608    ZheJiang
99002    xxxcompany_name1    20110609    ZheJiang
99003    xxxcom2    20110609    ZheJiang
Time taken: 0.66 seconds
hive> select * from lpx_partition_test order by global_id;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Execution log at: /tmp/root/root_20110802143737_e2a261fb-af38-45f6-9c36-8fdf7fc57d4b.log
Job running in-process (local Hadoop)
2011-08-02 14:37:46,552 null map = 0%,  reduce = 0%
2011-08-02 14:37:50,761 null map = 100%,  reduce = 0%
2011-08-02 14:37:56,129 null map = 100%,  reduce = 100%
Ended Job = job_local_0001
OK
99001    xxxcompany_name1    20110608    ZheJiang
99001    xxxcompany_name1    2011-06-09    ZheJiang
99001    xxxcompany_name1    20110608    GuangDong
99001    xxxcompany_name1    2011-06-08    ZheJiang
99001    xxxcompany_name1    2011-06-09    GuangDong
99001    xxxcompany_name1    2011-06-08    GuangDong
99002    xxxcompany_name1    2011-06-09    ZheJiang
99002    xxxcompany_name1    2011-06-08    ZheJiang
99002    xxxcompany_name1    20110609    ZheJiang
99003    xxxcom2    20110609    ZheJiang
99003    xxxcom2    2011-06-08    ZheJiang
99003    xxxcom2    2011-06-09    ZheJiang
Time taken: 16.342 seconds

hive> set hive.exec.dynamic.partition.mode=strict;
hive> select * from lpx_partition_test order by global_id limit 8;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Execution log at: /tmp/root/root_20110802144040_8bba22bc-968f-4588-bcdc-0519c2a03115.log
Job running in-process (local Hadoop)
2011-08-02 14:40:45,156 null map = 0%,  reduce = 0%
2011-08-02 14:40:46,202 null map = 100%,  reduce = 0%
2011-08-02 14:40:49,213 null map = 100%,  reduce = 100%
Ended Job = job_local_0001
OK
99001    xxxcompany_name1    20110608    ZheJiang
99001    xxxcompany_name1    2011-06-09    ZheJiang
99001    xxxcompany_name1    20110608    GuangDong
99001    xxxcompany_name1    2011-06-08    ZheJiang
99001    xxxcompany_name1    2011-06-09    GuangDong
99001    xxxcompany_name1    2011-06-08    GuangDong
99002    xxxcompany_name1    2011-06-09    ZheJiang
99002    xxxcompany_name1    2011-06-08    ZheJiang
Time taken: 12.169 seconds

hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive>  select * from lpx_partition_test order by global_id limit 8;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Execution log at: /tmp/root/root_20110802144242_5f4ce867-4b96-4dd3-bf65-16588b47bda0.log
Job running in-process (local Hadoop)
2011-08-02 14:42:59,045 null map = 0%,  reduce = 0%
2011-08-02 14:43:02,050 null map = 100%,  reduce = 0%
2011-08-02 14:43:04,052 null map = 100%,  reduce = 100%
Ended Job = job_local_0001
OK
99001    xxxcompany_name1    20110608    ZheJiang
99001    xxxcompany_name1    2011-06-09    ZheJiang
99001    xxxcompany_name1    20110608    GuangDong
99001    xxxcompany_name1    2011-06-08    ZheJiang
99001    xxxcompany_name1    2011-06-09    GuangDong
99001    xxxcompany_name1    2011-06-08    GuangDong
99002    xxxcompany_name1    2011-06-09    ZheJiang
99002    xxxcompany_name1    2011-06-08    ZheJiang
Time taken: 12.595 seconds

在order by 子句中有一些局限,在set hive.exec.dynamic.partition.mode=strict模式下,加limit是有作用的,但是在set hive.exec.dynamic.partition.mode=nonstrict模式下limit不是必须的,因为为了输出所有结果的排序,reducer必须排序所有的结果,如果输出结果非常大,这个reducer将会话费将长的时间。

hive> select * from lpx_partition_test sort by global_id;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Execution log at: /tmp/root/root_20110802144949_c3fc1b32-bb9e-40fa-af89-de4caefc30e6.log
Job running in-process (local Hadoop)
2011-08-02 14:49:41,150 null map = 0%,  reduce = 0%
2011-08-02 14:49:42,156 null map = 100%,  reduce = 0%
2011-08-02 14:49:44,180 null map = 100%,  reduce = 100%
Ended Job = job_local_0001
OK
99001    xxxcompany_name1    20110608    ZheJiang
99001    xxxcompany_name1    2011-06-09    ZheJiang
99001    xxxcompany_name1    20110608    GuangDong
99001    xxxcompany_name1    2011-06-08    ZheJiang
99001    xxxcompany_name1    2011-06-09    GuangDong
99001    xxxcompany_name1    2011-06-08    GuangDong
99002    xxxcompany_name1    2011-06-09    ZheJiang
99002    xxxcompany_name1    2011-06-08    ZheJiang
99002    xxxcompany_name1    20110609    ZheJiang
99003    xxxcom2    20110609    ZheJiang
99003    xxxcom2    2011-06-08    ZheJiang
99003    xxxcom2    2011-06-09    ZheJiang
Time taken: 10.589 seconds

sort by 子句支持进入reducer前进行排序,order by 和sort by 子句的区别在于,order by 子句保证输出结果的整体顺序,而sort by 子句保证进入reducer前就是有序的,如果有多个reducer任务,则结果可能出现部分最终有序的结果。由于这里只有一个reducerer任务,所以看起来不是很明显。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值