细说Hive SQL触发MR的场景

背景

首先,说一下结论,Hive是否触发作业跟hive.fetch.task.conversion这个参数有关

那么我们先来看看hive.fetch.task.conversion这个参数是啥

打开hive终端

hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=more

通过参阅文档,得知hive.fetch.task.conversion这个参数有以下三个值

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.fetch.task.conversion

下面我们通过一下几类查询来做试验

  1. desc live_info;
  2. select * from live_info;
  3. select * from live_info where traffic in ("1", "3");
  4. select count(1) from live_info;
  5. select user, count(1) as user_count from live_info group by user;
  6. select user, count(1) as user_count from live_info group by user order by user desc;

hive.fetch.task.conversion=none

我本机就默认为more了,我先把它改成none

hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=none

select * from live_info;

select * from live_info where traffic in ("1", "3"); 

 select count(1) from live_info;

 select user, count(1) as user_count from live_info group by user;

 select user, count(1) as user_count from live_info group by user order by user desc;

总结一下,通过上面六幅图可以看出,当hive.fetch.task.conversion=none的时候,只有desc不走MR

hive.fetch.task.conversion=minimal

hive> set hive.fetch.task.conversion=minimal;
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=minimal

select * from live_info;马上出结果 

select * from live_info where traffic in ("1", "3");

select count(1) from live_info;

select user, count(1) as user_count from live_info group by user; 

select user, count(1) as user_count from live_info group by user order by user desc;

总结一下,通过上面六幅图可以看出,当hive.fetch.task.conversion=minimal的时候,desc和select * 不走MR

hive.fetch.task.conversion=more

hive> set hive.fetch.task.conversion=more;
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=more

select * from live_info;马上出结果 

select * from live_info where traffic in ("1", "3"); 马上出结果 

select count(1) from live_info;

select user, count(1) as user_count from live_info group by user; 

select user, count(1) as user_count from live_info group by user order by user desc;

总结一下,通过上面六幅图可以看出,当hive.fetch.task.conversion=more的时候,desc和select * 还有select * from live_info where traffic in ("1", "3"); 都不走MR

下面做个大总结

对于Describe table这种SQL语句,Hive会直接的从metastore去读取数据

对于select * from table这种SQL查询语句,Hive不用经历map或者reduce过程

对于select * from live_info where traffic in ("1", "3"); 这种SQL查询语句,Hive仅仅只有map,没有reduce,仅仅filter出traffic为"1","3"的记录

对于select count(1) from live_info;这种SQL查询语句,只有reduce,没有map,但是如果我们除了count还需要取某个字段,reduce阶段之前还是需要走map的,比如说下面的SQL

对于select user, count(1) as user_count from live_info group by user; 这种SQL查询语句,除了要经历map阶段过滤出user字段,还要经历reduce阶段做聚合

对于select user, count(1) as user_count from live_info group by user order by user desc; 这种SQL查询语句,同样需要经历map和reduce阶段,不同的是,这条SQL语句有两个reduce,其中一个reduce用来排序保持全局排序好的结果。

 

  • 10
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值