Flink JOIN 执行计划
代码:
val table1 = env.fromElements((1, "hello")).toTable(tEnv, 'a, 'b)
val table2 = env.fromElements((1, "hello")).toTable(tEnv, 'c, 'd)
val table = table1.join(table2).where("b = d").select("a, c")
执行计划:
== Abstract Syntax Tree ==
LogicalProject(a=[$0], c=[$2])
LogicalFilter(condition=[=($1, $3)])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[_DataSetTable_0]])
LogicalTableScan(table=[[_DataSetTable_1]])
== Optimized Logical Plan ==
DataSetCalc(select=[a, c])
DataSetJoin(where=[=(b, d)], join=[a, b, c, d], joinType=[InnerJoin])
DataSetScan(table=[[_DataSetTable_0]])
DataSetScan(table=[[_DataSetTable_1]])
== Physical Execution Plan ==
Stage 4 : Data Source
content : collect elements with CollectionInputFormat
Partitioning : RANDOM_PARTITIONED
Stage 3 : Map
content : from: (a, b)
ship_strategy : Forward
exchange_mode : PIPELINED
driver_strategy : Map
Partitioning : RANDOM_PARTITIONED
Stage 6 : Data Source
content : collect elements with CollectionInputFormat
Partitioning : RANDOM_PARTITIONED
Stage 5 : Map
content : from: (c, d)
ship_strategy : Forward
exchange_mode : PIPELINED
driver_strategy : Map
Partitioning : RANDOM_PARTITIONED
Stage 2 : Join
content : where: (=(b, d)), join: (a, b, c, d)
ship_strategy : Hash Partition on [1]
exchange_mode : PIPELINED
driver_strategy : Hybrid Hash (build: from: (a, b) (id: 3))
Partitioning : RANDOM_PARTITIONED
Stage 1 : FlatMap
content : select: (a, c)
ship_strategy : Forward
exchange_mode : PIPELINED
driver_strategy : FlatMap
Partitioning : RANDOM_PARTITIONED
Stage 0 : Data Sink
content : org.apache.flink.api.java.io.DiscardingOutputFormat
ship_strategy : Forward
exchange_mode : PIPELINED
Partitioning : RANDOM_PARTITIONED
Flink优化器的深度优先遍历:
/**
* Plan.accept
* Traverses the job depth first from all data sinks on towards the sources.
*
* @see Visitable#accept(Visitor)
*/
@Override
public void accept(Visitor<Operator<?>> visitor) {
for (GenericDataSinkBase<?> sink : this.sinks) {
sink.accept(visitor);
}
}
/**
* GenericDataSinkBase.accept
* Accepts the visitor and applies it this instance. This method applies the visitor in a depth-first traversal.
* The visitors pre-visit method is called and, if returning
* <tt>true</tt>, the visitor is recursively applied on the single input. After the recursion returned,
* the post-visit method is called.
*
* @param visitor The visitor.
*
* @see org.apache.flink.util.Visitable#accept(org.apache.flink.util.Visitor)
*/
@Override
public void accept(Visitor<Operator<?>> visitor) {
boolean descend = visitor.preVisit(this);
if (descend) {
this.input.accept(visitor);
visitor.postVisit(this);
}
}