Hive性能调优之Fetch抓取

Fetch抓取是指hive在某些情况的查询可以不必使用MapReduce计算,例如在执行一个简单的select * from XX 时,我们只需要简单的进行抓取对应目录下的数据即可。

hive-default.xml.template中,hive.fetch.task.conversion默认是morn,老版本中默认是minimal。
该属性为morn时,在全局查找,字段查找,limit查找等都不走MapReduce。

<property>
    <name>hive.fetch.task.conversion</name>
    <value>more</value>
    <description>
      Expects one of [none, minimal, more].
      Some select queries can be converted to single FETCH task minimizing latency.
      Currently the query should be single sourced not having any subquery and should not have
      any aggregations or distincts (which incurs RS), lateral views and joins.
      0. none : disable hive.fetch.task.conversion
      1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
      2. more    : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
    </description>
  </property>

我们来做个比较:

hive (default)> set hive.fetch.task.conversion=none;
hive (default)> select * from emp ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200710194508_8561a541-492c-48f4-a381-95f041e276c8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1594379941865_0001, Tracking URL = http://master:18088/proxy/application_1594379941865_0001/
Kill Command = /usr/hdk/hadoop/bin/hadoop job  -kill job_1594379941865_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-07-10 19:45:29,124 Stage-1 map = 0%,  reduce = 0%
2020-07-10 19:45:44,427 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.05 sec
MapReduce Total cumulative CPU time: 3 seconds 50 msec
Ended Job = job_1594379941865_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 3.05 sec   HDFS Read: 5489 HDFS Write: 916 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 50 msec
OK
emp.empno	emp.ename	emp.job	emp.mgr	emp.hiredate	emp.sal	emp.comm	emp.deptno
7369	SMITH	CLERK	7902	1980-12-17	800.0	NULL	20
7499	ALLEN	SALESMAN	7698	1981-2-20	1600.0	300.0	30
7521	WARD	SALESMAN	7698	1981-2-22	1250.0	500.0	30
7566	JONES	MANAGER	7839	1981-4-2	2975.0	NULL	20
7654	MARTIN	SALESMAN	7698	1981-9-28	1250.0	1400.0	30
7698	BLAKE	MANAGER	7839	1981-5-1	2850.0	NULL	30
7782	CLARK	MANAGER	7839	1981-6-9	2450.0	NULL	10
7788	SCOTT	ANALYST	7566	1987-4-19	3000.0	NULL	20
7839	KING	PRESIDENT	NULL	1981-11-17	5000.0	NULL	10
7844	TURNER	SALESMAN	7698	1981-9-8	1500.0	0.0	30
7876	ADAMS	CLERK	7788	1987-5-23	1100.0	NULL	20
7900	JAMES	CLERK	7698	1981-12-3	950.0	NULL	30
7902	FORD	ANALYST	7566	1981-12-3	3000.0	NULL	20
7934	MILLER	CLERK	7782	1982-1-23	1300.0	NULL	10
Time taken: 37.739 seconds, Fetched: 14 row(s)

这是使用mr去查询的运行结果,运行了37s。

hive (default)> set hive.fetch.task.conversion=more;
hive (default)> select * from emp ;
OK
emp.empno	emp.ename	emp.job	emp.mgr	emp.hiredate	emp.sal	emp.comm	emp.deptno
7369	SMITH	CLERK	7902	1980-12-17	800.0	NULL	20
7499	ALLEN	SALESMAN	7698	1981-2-20	1600.0	300.0	30
7521	WARD	SALESMAN	7698	1981-2-22	1250.0	500.0	30
7566	JONES	MANAGER	7839	1981-4-2	2975.0	NULL	20
7654	MARTIN	SALESMAN	7698	1981-9-28	1250.0	1400.0	30
7698	BLAKE	MANAGER	7839	1981-5-1	2850.0	NULL	30
7782	CLARK	MANAGER	7839	1981-6-9	2450.0	NULL	10
7788	SCOTT	ANALYST	7566	1987-4-19	3000.0	NULL	20
7839	KING	PRESIDENT	NULL	1981-11-17	5000.0	NULL	10
7844	TURNER	SALESMAN	7698	1981-9-8	1500.0	0.0	30
7876	ADAMS	CLERK	7788	1987-5-23	1100.0	NULL	20
7900	JAMES	CLERK	7698	1981-12-3	950.0	NULL	30
7902	FORD	ANALYST	7566	1981-12-3	3000.0	NULL	20
7934	MILLER	CLERK	7782	1982-1-23	1300.0	NULL	10
Time taken: 0.289 seconds, Fetched: 14 row(s)

这样只需要0.2s。

本地模式

我们在处理小数据集的时候可以使用本地模式来处理所有任务,这样可以明显的缩短执行时间。
本地模式要求数据量小于128MB(一个标准块大小),同时文件数小于4个(可调整)。

hive (default)> set hive.exec.mode.local.auto;
hive.exec.mode.local.auto=false
hive (default)> set hive.exec.mode.local.auto=true;
hive (default)> set hive.exec.mode.local.auto.inputbytes.max;
hive.exec.mode.local.auto.inputbytes.max=134217728
hive (default)> set hive.exec.mode.local.auto.input.files.max;
hive.exec.mode.local.auto.input.files.max=4
hive (default)> select * from emp cluster by deptno;
Automatically selecting local only mode for query
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200710210745_10f9202c-678f-4ddb-aa2b-555551d11e1d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2020-07-10 21:07:46,989 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local897334446_0004
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 10848 HDFS Write: 6412 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
emp.empno	emp.ename	emp.job	emp.mgr	emp.hiredate	emp.sal	emp.comm	emp.deptno
7934	MILLER	CLERK	7782	1982-1-23	1300.0	NULL	10
7839	KING	PRESIDENT	NULL	1981-11-17	5000.0	NULL	10
7782	CLARK	MANAGER	7839	1981-6-9	2450.0	NULL	10
7876	ADAMS	CLERK	7788	1987-5-23	1100.0	NULL	20
7788	SCOTT	ANALYST	7566	1987-4-19	3000.0	NULL	20
7369	SMITH	CLERK	7902	1980-12-17	800.0	NULL	20
7566	JONES	MANAGER	7839	1981-4-2	2975.0	NULL	20
7902	FORD	ANALYST	7566	1981-12-3	3000.0	NULL	20
7844	TURNER	SALESMAN	7698	1981-9-8	1500.0	0.0	30
7499	ALLEN	SALESMAN	7698	1981-2-20	1600.0	300.0	30
7698	BLAKE	MANAGER	7839	1981-5-1	2850.0	NULL	30
7654	MARTIN	SALESMAN	7698	1981-9-28	1250.0	1400.0	30
7521	WARD	SALESMAN	7698	1981-2-22	1250.0	500.0	30
7900	JAMES	CLERK	7698	1981-12-3	950.0	NULL	30
Time taken: 1.559 seconds, Fetched: 14 row(s)
hive (default)> set hive.exec.mode.local.auto;
hive.exec.mode.local.auto=false
hive (default)> set hive.fetch.task.conversion;
hive.fetch.task.conversion=more
hive (default)> select * from emp cluster by deptno;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200710210943_088e87fe-968d-4c11-868f-267a89bb5fef
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1594379941865_0003, Tracking URL = http://master:18088/proxy/application_1594379941865_0003/
Kill Command = /usr/hdk/hadoop/bin/hadoop job  -kill job_1594379941865_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-10 21:09:53,706 Stage-1 map = 0%,  reduce = 0%
2020-07-10 21:10:03,663 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.74 sec
2020-07-10 21:10:13,125 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.11 sec
MapReduce Total cumulative CPU time: 5 seconds 110 msec
Ended Job = job_1594379941865_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.11 sec   HDFS Read: 10306 HDFS Write: 916 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 110 msec
OK
emp.empno	emp.ename	emp.job	emp.mgr	emp.hiredate	emp.sal	emp.comm	emp.deptno
7934	MILLER	CLERK	7782	1982-1-23	1300.0	NULL	10
7839	KING	PRESIDENT	NULL	1981-11-17	5000.0	NULL	10
7782	CLARK	MANAGER	7839	1981-6-9	2450.0	NULL	10
7876	ADAMS	CLERK	7788	1987-5-23	1100.0	NULL	20
7788	SCOTT	ANALYST	7566	1987-4-19	3000.0	NULL	20
7369	SMITH	CLERK	7902	1980-12-17	800.0	NULL	20
7566	JONES	MANAGER	7839	1981-4-2	2975.0	NULL	20
7902	FORD	ANALYST	7566	1981-12-3	3000.0	NULL	20
7844	TURNER	SALESMAN	7698	1981-9-8	1500.0	0.0	30
7499	ALLEN	SALESMAN	7698	1981-2-20	1600.0	300.0	30
7698	BLAKE	MANAGER	7839	1981-5-1	2850.0	NULL	30
7654	MARTIN	SALESMAN	7698	1981-9-28	1250.0	1400.0	30
7521	WARD	SALESMAN	7698	1981-2-22	1250.0	500.0	30
7900	JAMES	CLERK	7698	1981-12-3	950.0	NULL	30
Time taken: 30.785 seconds, Fetched: 14 row(s)

如果不开启本地模式,就要30s。

本地模式是搭配着hive.exec.mode.local.auto.inputbytes.maxhive.exec.mode.local.auto.input.files.max使用的。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

寒 暄

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值