背景
首先,说一下结论,Hive是否触发作业跟hive.fetch.task.conversion这个参数有关
那么我们先来看看hive.fetch.task.conversion这个参数是啥
打开hive终端
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=more
通过参阅文档,得知hive.fetch.task.conversion这个参数有以下三个值
下面我们通过一下几类查询来做试验
- desc live_info;
- select * from live_info;
- select * from live_info where traffic in ("1", "3");
- select count(1) from live_info;
- select user, count(1) as user_count from live_info group by user;
- select user, count(1) as user_count from live_info group by user order by user desc;
hive.fetch.task.conversion=none
我本机就默认为more了,我先把它改成none
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=none
select * from live_info;
select * from live_info where traffic in ("1", "3");
select count(1) from live_info;
select user, count(1) as user_count from live_info group by user;
select user, count(1) as user_count from live_info group by user order by user desc;
总结一下,通过上面六幅图可以看出,当hive.fetch.task.conversion=none的时候,只有desc不走MR
hive.fetch.task.conversion=minimal
hive> set hive.fetch.task.conversion=minimal;
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=minimal
select * from live_info;马上出结果
select * from live_info where traffic in ("1", "3");
select count(1) from live_info;
select user, count(1) as user_count from live_info group by user;
select user, count(1) as user_count from live_info group by user order by user desc;
总结一下,通过上面六幅图可以看出,当hive.fetch.task.conversion=minimal的时候,desc和select * 不走MR
hive.fetch.task.conversion=more
hive> set hive.fetch.task.conversion=more;
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=more
select * from live_info;马上出结果
select * from live_info where traffic in ("1", "3"); 马上出结果
select count(1) from live_info;
select user, count(1) as user_count from live_info group by user;
select user, count(1) as user_count from live_info group by user order by user desc;
总结一下,通过上面六幅图可以看出,当hive.fetch.task.conversion=more的时候,desc和select * 还有select * from live_info where traffic in ("1", "3"); 都不走MR
下面做个大总结
对于Describe table这种SQL语句,Hive会直接的从metastore去读取数据
对于select * from table这种SQL查询语句,Hive不用经历map或者reduce过程
对于select * from live_info where traffic in ("1", "3"); 这种SQL查询语句,Hive仅仅只有map,没有reduce,仅仅filter出traffic为"1","3"的记录
对于select count(1) from live_info;这种SQL查询语句,只有reduce,没有map,但是如果我们除了count还需要取某个字段,reduce阶段之前还是需要走map的,比如说下面的SQL
对于select user, count(1) as user_count from live_info group by user; 这种SQL查询语句,除了要经历map阶段过滤出user字段,还要经历reduce阶段做聚合
对于select user, count(1) as user_count from live_info group by user order by user desc; 这种SQL查询语句,同样需要经历map和reduce阶段,不同的是,这条SQL语句有两个reduce,其中一个reduce用来排序保持全局排序好的结果。