之前一直没太关注 order by 和sort by的区别,今天看了下。
首先 ,如果在 严格模式下直接使用order by 会报错,必须加上 LIMIT关键字;
In strict mode, if ORDER BY is specified, LIMIT must also be specified.
set hive.mapred.mode=nonstrict; #或者将参数值设置为,nostrict
select
*
from
A
where d ='2018-10-22'
order by checkin_time
limit 100
sort by 的语法不会受到set hive.mapred.mode 参数影响,
select
*
from
A
where d ='2018-10-22'
sort by checkin_time
distribute by $ 按指定的key 去分发数据,相同key数据会被分到同一个reduce
select
*
from
A
where d ='2018-10-22'
distribute by clientname
sort by checkin_time
#cluster by 等价于以上语句,但是cluster by 只能降序
select
*
from
A
where d ='2018-10-22'
cluster by checkin_time