hive 全局排序（Order By）、每个Reduce内部排序（Sort By）、Cluster By

最新推荐文章于 2024-07-21 16:34:38 发布

但行益事莫问前程

最新推荐文章于 2024-07-21 16:34:38 发布

阅读量4.7w

点赞数 2

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/javahelpyou/article/details/125558398

版权

hive 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

文章目录

1. 全局排序（Order By）

只有一个Reducer

如：
emp表按照部门和工资升序排序

 select ename, deptno, sal from emp order by deptno, sal ;

在这里插入图片描述

2. 每个Reduce内部排序（Sort By）

对于大规模的数据集order by的效率非常低。在并不需要全局排序时，可使用sort by（desc降序 asc升序）。Sort by在每个Reducer内部进行排序，对全局结果集并非有序。
sort by通常与distribute by结合使用(distribute by 控制某个特定行到哪个reducer,类似MR中partition分区)

如：
employee表根据部门分区，按照员工编号排序

insert overwrite local directory '/home/hdfs/data/sortby' select * from employee e distribute by e.deptno sort by e.empno

在这里插入图片描述

3. Cluster By

当distribute by和sort by字段相同时，可以使用cluster by方式。
如：
employee表根据部门号分区，按照部门号排序

 insert overwrite local directory '/home/hdfs/data/clusterby' select * from emp cluster by deptno;

在这里插入图片描述

但行益事莫问前程

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
0
评论
hive 全局排序（Order By）、每个Reduce内部排序（Sort By）、Cluster By

如： emp表按照部门和工资升序排序对于大规模的数据集order by的效率非常低。在并不需要全局排序时，可使用sort by（desc降序 asc升序）。Sort by在每个Reducer内部进行排序，。 (distribute by 控制某个特定行到哪个reducer,类似MR中partition分区)如： employee表根据部门分区，按照员工编号排序当时，可以使用cluster by方式。如： employee表根据部门号分区，按照部门号排序...
复制链接

扫一扫