一、数据准备
join_d1.c1<int> | join_d1.c2<string> | join_d1.c3<double> | join_d1.c4<string> | |
1 | a | 1.1 | a | |
2 | b | 1.2 | b | |
3 | c | 1.3 | c | |
4 | d | 1.4 | d | |
2 | e | 1.5 | e | |
3 | f | 1.6 | f |
二、查看执行计划
hive> explain select c1,upper(c2) from join_d1 where c3 > '1.2';
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: join_d1
Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (c3 > 1.3) (type: boolean)
Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: c1 (type: int), c2 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE
Limit
Number of rows: 10
Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: 10
Processor Tree:
ListSink
Time taken: 0.041 seconds, Fetched: 35 row(s
三、解析执行计划
注意:HIVE执行引擎,是会根据SQL,进行词法分析,解析,生成AST Tree。再生成QB,生成OperatorTree。进行OperatorTree优化,生成MR
这里,我们关心OperatorTree的执行过程.Operator执行是链式的,可以认为是责任链模式。
首先该SQL,分为两个Stage,
Stage0是FetchTask。是依赖于Stage1的。FetchTask会生成FetchOperator。FetchOperator是读取HDFS每一行数据,再Push到Stage1中。
Stage1是 TableScanOperator -> FilterOperator -> SelectOperator -> . 最后调用Stage0的 ListSinkOperator即结果输出。
过程如下:
FetchOperator从HDFS上读取第一行数据1,a,1.1,a -> TableScanOperator(可做limit限制) -> FilterOperator (1.1 > 1.2) false,则该行抛弃,下一行
FetchOperator从HDFS上读取第一行数据2,b,1.2,b -> TableScanOperator(可做limit限制) -> FilterOperator (1.2 > 1.2) false,则该行抛弃,下一行
FetchOperator从HDFS上读取第三行数据3,c,1.3,c -> TableScanOperator(可做limit限制) -> FilterOperator (1.3 > 1.2) true -> SelectOperator 选择c1,c2 -> ListSinkOperator
.. 依次迭代下去