如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下:
01 | hive> SELECT id, money FROM m limit 10 ; |
02 | Total MapReduce jobs = 1 |
03 | Launching Job 1 out of 1 |
04 | Number of reduce tasks is set to 0 since there's no reduce operator |
05 | Cannot run job locally: Input Size (= 235105473 ) is larger than |
06 | hive.exec.mode.local.auto.inputbytes.max (= 134217728 ) |
07 | Starting Job = job_1384246387966_0229, Tracking URL = |
11 | Kill Command = /home/q/hadoop- 2.2 . 0 /bin/hadoop job |
12 | -kill job_1384246387966_0229 |
13 | hadoop job information for Stage- 1 : number of mappers: 1 ; |
15 | 2013 - 11 - 13 11 : 35 : 16 , 167 Stage- 1 map = 0 %, reduce = 0 % |
16 | 2013 - 11 - 13 11 : 35 : 21 , 327 Stage- 1 map = 100 %, reduce = 0 %, |
17 | Cumulative CPU 1.26 sec |
18 | 2013 - 11 - 13 11 : 35 : 22 , 377 Stage- 1 map = 100 %, reduce = 0 %, |
19 | Cumulative CPU 1.26 sec |
20 | MapReduce Total cumulative CPU time: 1 seconds 260 msec |
21 | Ended Job = job_1384246387966_0229 |
22 | MapReduce Jobs Launched: |
23 | Job 0 : Map: 1 Cumulative CPU: 1.26 sec |
24 | HDFS Read: 8388865 HDFS Write: 60 SUCCESS |
25 | Total MapReduce CPU Time Spent: 1 seconds 260 msec |
37 | Time taken: 16.802 seconds, Fetched: 10 row(s) |
我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce job,直接通过Fetch task获取数据,可以通过下面几种方法实现:
方法一:
01 | hive> set hive.fetch.task.conversion=more; |
02 | hive> SELECT id, money FROM m limit 10 ; |
14 | Time taken: 0.138 seconds, Fetched: 10 row(s) |
上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
方法二:
1 | bin/hive --hiveconf hive.fetch.task.conversion=more |
方法三:
上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:
02 | <name>hive.fetch.task.conversion</name> |
05 | Some select queries can be converted to single FETCH task |
06 | minimizing latency.Currently the query should be single |
07 | sourced not having any subquery and should not have |
08 | any aggregations or distincts (which incurrs RS), |
09 | lateral views and joins. |
10 | 1 . minimal : SELECT STAR, FILTER on partition columns, LIMIT only |
11 | 2 . more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns) |
这样就可以长期启用Fetch任务了,很不错吧,也赶紧去试试吧!