hive入门学习:explain执行计划的理解
我们都知道,hive在执行的时候会把所对应的SQL语句都会转换成mapreduce代码执行,但是具体的MR执行信息我们怎样才能看出来呢?这里就用到了explain的关键字,他可详细的表示出在执行所对应的语句所对应的MR代码。语法格式如下。extended关键字可以更加详细的列举出代码的执行过程。
EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query
explain会把查询语句转化成stage组成的序列,主要由三方面组成:
1:查询的抽象语法树
2:plane中各个stage的依赖情况
3:每个阶段的具体描述:描述具体来说就是显示出对应的操作算子和与之操作的对应的数据,例如查询算子,filter算子,fetch算子等等。下面我来看一个具体的例子:
<span style="font-size:18px;">explain
from emp insert overwrite table emp_explain
select job,sum(substr(emp.sal,4))
group by emp.job;</span>
会出现如下的信息:表示如上的代码呗划分成为3个stage。并且stage是一个根stage,stage0依赖于stage1,stage2依赖于stage0。具体表示的是每个stage的依赖信息。
<span style="font-size:24px;">STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree://
TableScan
alias: emp//表示对emp表格进行操作
Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE
Select Operator //select算子操作
expressions: job (type: string), sal (type: double)//select对应的数据类型
outputColumnNames: job, sal
Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE
Group By Operator// group by 算子
aggregations: sum(substr(sal, 4))//聚合操作
keys: job (type: string)
mode: hash
outputColumnNames: _col0, _col1//聚合输出的数据
Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: double)
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 328 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), UDFToInteger(_col1) (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 328 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 3 Data size: 328 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: emp_dept.emp_explain
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: emp_dept.emp_explain
Stage: Stage-2
Stats-Aggr Operator//聚合操作</span>
具体的信息的进一步需要了解编译语言编译原理等技术,这里就不进一步了解了。