flink-1.10 之前,任务提交通常遇到以下问题:
- 任务提交由 Execution Environments 负责,并且和部署的环境(yarn k8s mesos)高度绑定,导致最终 Execution Environments 的数量很多,用户针对不同环境需要维护的配置很多,代码复用度比较低。
- 用户需要针对不同的 job 维护不同的 flink-conf.yaml 配置文件,不能像 spark 那样通过 —D 参数动态指定
- 用户获取 flink 作业信息只能通过 REST API,下游工具不能很方便的适配
flink-1.10 通过以下 3 个 Flip 解决上述问题
1.FLIP-73 : 通用的 Executor 接口
FLIP-73 中给出以下公式:
最终的 Execution Environments 数量 = API 数量(batch, streaming) × 部署环境数量(local, remote, collection, cli/context) + ε(optimizedPlan, previewPlan)
以 streaming api 对应的 StreamExecutionEnvironment 为例,有以下子类:
StreamExecutionEnvironment 类的 execute() 方法声明为 abstract,每个子类需要实现各自的 execute() 方法
public abstract JobExecutionResult execute(StreamGraph streamGraph) throws Exception;
在 Flink 1.10 中,作业提交逻辑被抽象到了通用的 Executor 接口 (PipelineExecutor)(FLIP-73).
// PipelineExecutor.java
/**
* The entity responsible for executing a {@link Pipeline}, i.e. a user job.
*/
@Internal
public interface PipelineExecutor {
/**
* Executes a {@link Pipeline} based on the provided configuration and returns a {@link JobClient} which allows to
* interact with the job being executed, e.g. cancel it or take a savepoint.
*
* <p><b>ATTENTION:</b> The caller is responsible for managing the lifecycle of the returned {@link JobClient}. This
* means that e.g. {@code close()} should be called explicitly at the call-site.
*
* @param pipeline the {@link Pipeline} to execute
* @param configuration the {@link Configuration} with the required execution parameters
* @return a {@link CompletableFuture} with the {@link JobClient} corresponding to the pipeline.
*/
CompletableFuture<JobClient> execute(final Pipeline pipeline, final Configuration configuration)