Order By
Order By 会对输入做全局的排序,所以只会有一个reduce,这样的话在大量数据面前查询效率较低,费时较长
select * from salary order by salary desc;
//desc 升序 asc降序
结果:
+--------------+----------------+----------------+
| salary.name | salary.depart | salary.salary |
+--------------+----------------+----------------+
| yanyan | 2 | 12000 |
| xiaolong | 2 | 8000 |
| john | 2 | 8000 |
| qianba | 1 | 8000 |
| zhaowu | 2 | 7000 |
| xmith | 2 | 5000 |
| lilei | 2 | 5000 |
| zhaoliu | 1 | 4000 |
| wangwu | 1 | 4000 |
| lisi | 1 | 4000 |
| zhangsan | 1 | 3000 |
+--------------+----------------+----------------+
Sort By
Sort By不是全局的排序,它在进入reduce阶段之前就已经排序。
使用Sort by 可以设定reduce的数量
set mapred.reduce.tasks=2; //启动两个reduce
设置过后,然后使用sort by 执行一条查询 你会发现hiveserver2中的reduce信息如下:reduce启动了两个
2018-09-03 22:28:57,698 Stage-1 map = 0%, reduce = 0%
2018-09-03 22:29:07,527 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.69 sec
2018-09-03 22:29:23,086 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.07 sec
MapReduce Total cumulative CPU time: 6 seconds 70 msec
Ended Job = job_1535982778257_0006
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 2 Cumulative CPU: 6.07 sec HDFS Read: 11241 HDFS Write: 460 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 70 msec
distribute by
distribute by通过计算一个列的值进行分区,有几个分区 sory by 就执行几次,也就会有多少个reduce
select * from salary distribute by depart sort by salary asc;
//我们通过 distribute by 对列计算值分区,然后查询排序
结果:
+--------------+----------------+----------------+
| salary.name | salary.depart | salary.salary |
+--------------+----------------+----------------+
| xmith | 2 | 5000 |
| lilei | 2 | 5000 |
| zhaowu | 2 | 7000 |
| xiaolong | 2 | 8000 |
| john | 2 | 8000 |
| yanyan | 2 | 12000 |
| zhangsan | 1 | 3000 |
| zhaoliu | 1 | 4000 |
| wangwu | 1 | 4000 |
| lisi | 1 | 4000 |
| qianba | 1 | 8000 |
+--------------+----------------+----------------+
CLUSTER BY //聚集
cluster by就是distribute by+sort by的组合,但是只能默认升序,不支持自定义升序排序
select * from salary cluster by salary;
结果:
+--------------+----------------+----------------+
| salary.name | salary.depart | salary.salary |
+--------------+----------------+----------------+
| zhangsan | 1 | 3000 |
| zhaoliu | 1 | 4000 |
| wangwu | 1 | 4000 |
| lisi | 1 | 4000 |
| xmith | 2 | 5000 |
| lilei | 2 | 5000 |
| zhaowu | 2 | 7000 |
| xiaolong | 2 | 8000 |
| john | 2 | 8000 |
| qianba | 1 | 8000 |
| yanyan | 2 | 12000 |
+--------------+----------------+----------------+