Flink DAG编译和优化

最新推荐文章于 2024-07-21 07:15:00 发布

Aegeaner

最新推荐文章于 2024-07-21 07:15:00 发布

阅读量2.4k

点赞数

本文链接：https://blog.csdn.net/Aegeaner/article/details/53506102

版权

Flink DAG编译和优化

1. 创建ProgramPlan。

class ExecutionEnvironment：

public Plan createProgramPlan()；

Plan描述了所有的数据源，所有Sink，所有操作，可以在PlanExecutor中作为独立单元执行。

2. 编译。

class Optimizer：
private OptimizerPostPass getPostPassFromPlan(Plan program)；
public OptimizedPlan compile(Plan program) throws CompilerException

从Program中取得OptimizerPostPass并实例化。OptimizerPostPass为优化器计划生成中用到提供Visitor接口。
将指定Program翻译为OptimizedPlan。

所有node赋值了local strategy，所有channel赋值了shipping strategy。OptimizedPlan描述了每个operator使用的策略（如hash join vs. sort-merge join），使用什么数据交换方法（local pipe forward, shuffle, braodcast），使用什么交换模式（pipelined, batch），将中间结果缓存到哪里等。

优化过程分为三个阶段：

创建程序的DAG实现。
使用GraphCreatingVisitor深度优先遍历每个Sink，为每个operator创建一个node，保存在GraphVisitor的con2node里。
- 用channel连接它们。
- 查找关于本地策略和通道类型的提示，相应地设置类型和策略
- 使用IdAndEstimatesVisitor进行DFS估计数据源的数据量和通过计划传播这些估计
使用BranchesVisitor和InterestingPropertyVisitor进行DFS计算相关属性和数据结构。
生成计划的其他部分。
- PlanFinalizer
- BinaryUnionReplacer
- RangePartitionRewriter
- postPasser.postPass(plan);

Visitor接口：

/**
 * A visitor encapsulates functionality that is applied to each node in the process of a traversal of a tree or DAD. 
 */
@Internal
public interface Visitor<T extends Visitable<T>> {

    /**
     * 
     * @param visitable
     * 
     * @return True, if the traversal should continue, false otherwise.
     */
    boolean preVisit(T visitable);

    /**
     * @param visitable
     */
    void postVisit(T visitable);
}

从DataSinkNode开始递归进行深度优先遍历的方法：

    @Override
    public void accept(Visitor<OptimizerNode> visitor) {
        if (visitor.preVisit(this)) {
            if (getPredecessorNode() != null) {
                getPredecessorNode().accept(visitor);
            } else {
                throw new CompilerException();
            }
            visitor.postVisit(this);
        }
    }