1. 执行计划(Explain)
1)基本语法
EXPLAIN [EXTENDED | DEPENDENCY | AUTHORIZATION] query
2)案例实操
(1)查看下面这条语句的执行计划
没有生成 MR 任务的
hive (default)> explain select * from emp;
OK
Explain
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: emp
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: empno (type: int), ename (type: string), job (type: string), mgr (type: int), hiredate (type: string), sal (type: double), comm (type: double), deptno (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
ListSink
Time taken: 14.564 seconds, Fetched: 17 row(s)
有生成 MR 任务的
hive (default)> explain select deptno, avg(sal) avg_sal from emp group by deptno;
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1 #stage-1依赖于stage-0阶段
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree: #Map操作
TableScan
alias: emp #扫描的表
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
Select Operator #查询的操作
expressions: sal (type: double), deptno (type: int)
outputColumnNames: sal, deptno
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
Group By Operator #读取的字段
aggregations: sum(sal), count(sal)
keys: deptno (type: int)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: double), _col2 (type: bigint)
Execution mode: vectorized
Reduce Operator Tree: #reduce操作
Group By Operator
aggregations: sum(VALUE._col0), count(VALUE._col1)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), (_col1 / _col2) (type: double)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0 #第二个阶段
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 2.475 seconds, Fetched: 53 row(s)
(2)查看详细执行计划(多加一个extends)
hive (default)> explain extended select * from emp;
hive (default)> explain extended select deptno, avg(sal) avg_sal from emp group by deptno;
2. Fetch 抓取
Fetch 抓取是指,Hive 中对某些情况的查询可以不必使用 MapReduce 计算。例如:SELECT * FROM employees;在这种情况下,Hive 可以简单地读取 employee 对应的存储目录下的文件, 然后输出查询结果到控制台。
在 hive-default.xml.template 文件中 hive.fetch.task.conversion 默认是 more,老版本 hive 默认是 minimal,该属性修改为 more 以后,在全局查找、字段查找、limit 查找等都不走 mapreduce。
<property>
<name>hive.fetch.task.conversion</name>
<value>more</value>
<description>
Expects one of [none, minimal, more].Some select queries can be converted to single FETCH task minimizing
latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incurs RS), lateral views and joins. 0. none : disable hive.fetch.task.conversion 1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only 2. more : SELECT, FILTER, LIMIT only (support TABLESAMPLE and
virtual columns --虚拟字段)
</description>
</property>
1)案例实操:
(1)把 hive.fetch.task.conversion 设置成 none,然后执行查询语句,都会执行 mapreduce 程序。
hive (default)> set hive.fetch.task.conversion=none;
hive (default)> select * from emp;
hive (default)> select ename from emp;
hive (default)> select ename from emp limit 3;
(2)把 hive.fetch.task.conversion 设置成 more,然后执行查询语句,如下查询方式都不 会执行 mapreduce 程序。
hive (default)> set hive.fetch.task.conversion=more;
hive (default)> select * from emp;
hive (default)> select ename from emp;
hive (default)> select ename from emp limit 3;