客户端提交任务的方式主要有以下几种
- 命令行
- REST接口
- SQL
- python
- scala
JobManager有三种提交任务的模式
- Application Mode: runs the cluster exclusively for one application. The job’s main method (or client) gets executed on the JobManager. Calling
execute
/executeAsync
multiple times in an application is supported.
1.11版本以后才会有的。本文暂不涉及。 - Per-Job Mode: runs the cluster exclusively for one job. The job’s main method (or client) runs only prior to the cluster creation.
每一个任务都会启动一个集群。 - Session Mode: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers
集群只有一个JobManager。各个job共享TaskManager
部署方式
无论如何部署,都是大概这四种进程。注意图片最下面master/yarn只是代表部署在yarn的AM节点。
+
对于JM,有三个组成部分
The JobManager has a number of responsibilities related to coordinating the distributed execution of Flink Applications: it decides when to schedule the next task (or set of tasks), reacts to finished tasks or execution failures, coordinates checkpoints, and coordinates recovery on failures, among others. This process consists of three different components:
ResourceManager 注意这个RM和yarn的rm两件事情。
The ResourceManager is responsible for resource de-/allocation and provisioning in a Flink cluster — it manages task slots, which are the unit of resource scheduling in a Flink cluster (see TaskManagers). Flink implements multiple ResourceManagers for different environments and resource providers such as YARN, Mesos, Kubernetes and standalone deployments. In a standalone setup, the ResourceManager can only distribute the slots of available TaskManagers and cannot start new TaskManagers on its own.
Dispatcher
The Dispatcher provides a REST interface to submit Flink applications for execution and starts a new JobMaster for each submitted job. It also runs the Flink WebUI to provide information about job executions.
JobMaster
A JobMaster is responsible for managing the execution of a single JobGraph. Multiple jobs can run simultaneously in a Flink cluster, each having its own JobMaster.
Resorce Provider的不同,部署方式有如下四种
Standalone:最基本模式。
Kubernetes
YARN
Mesos
Flink Standalone
Kubernetes
https://zhuanlan.zhihu.com/p/108302052?utm_source=wechat_timeline
YARN
或者是这张超级牛逼的图
Mesos
外部依赖
高可用服务
主要是避免JobManager崩溃。会有多个备用JobManager在主崩溃后借助高可用服务迅速恢复。主要提供是
- Zookeeper
- Kubernetes HA
持久化服务
主要是依赖各种本地或者远程的文件系统
资源提供服务
取决于部署方式。
Metrics Storage
Application-level data sources and sinks
外部输入输出。比如kafka,elasticSearch,Cassandra