Hive之简单查询不启用MapReduce

如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下:

01 hive> SELECT id, money FROM m limit 10;
02 Total MapReduce jobs = 1
03 Launching Job 1 out of 1
04 Number of reduce tasks is set to 0 since there's no reduce operator
05 Cannot run job locally: Input Size (= 235105473) is larger than
06 hive.exec.mode.local.auto.inputbytes.max (= 134217728)
07 Starting Job = job_1384246387966_0229, Tracking URL =
08  
09 http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/
10  
11 Kill Command = /home/q/hadoop-2.2.0/bin/hadoop job 
12 -kill job_1384246387966_0229
13 hadoop job information forStage-1: number of mappers: 1;
14 number of reducers: 0
15 2013-11-13 11:35:16,167 Stage-1 map = 0%,  reduce = 0%
16 2013-11-13 11:35:21,327 Stage-1 map = 100%,  reduce = 0%,
17  Cumulative CPU 1.26 sec
18 2013-11-13 11:35:22,377 Stage-1 map = 100%,  reduce = 0%,
19  Cumulative CPU 1.26 sec
20 MapReduce Total cumulative CPU time: 1 seconds 260 msec
21 Ended Job = job_1384246387966_0229
22 MapReduce Jobs Launched:
23 Job 0: Map: 1   Cumulative CPU: 1.26sec  
24 HDFS Read: 8388865 HDFS Write: 60 SUCCESS
25 Total MapReduce CPU Time Spent: 1 seconds 260 msec
26 OK
27 1       122
28 1       185
29 1       231
30 1       292
31 1       316
32 1       329
33 1       355
34 1       356
35 1       362
36 1       364
37 Time taken: 16.802 seconds, Fetched: 10 row(s)

  我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce job,直接通过Fetch task获取数据,可以通过下面几种方法实现:
  方法一:

01 hive> set hive.fetch.task.conversion=more;
02 hive> SELECT id, money FROM m limit 10;
03 OK
04 1       122
05 1       185
06 1       231
07 1       292
08 1       316
09 1       329
10 1       355
11 1       356
12 1       362
13 1       364
14 Time taken: 0.138 seconds, Fetched: 10 row(s)

上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
  方法二:

1 bin/hive --hiveconf hive.fetch.task.conversion=more

  方法三:
上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:

01 <property>
02   <name>hive.fetch.task.conversion</name>
03   <value>more</value>
04   <description>
05     Some select queries can be converted to single FETCH task
06     minimizing latency.Currently the query should be single
07     sourced not having any subquery and should not have
08     any aggregations or distincts (which incurrs RS),
09     lateral views and joins.
10     1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
11     2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
12   </description>
13 </property>

这样就可以长期启用Fetch任务了,很不错吧,也赶紧去试试吧!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值