前言:
为深入的理解Flink的工作机制,从现在开始学习Flink源码。包括Flink集群启动流程,任务执行流程。
各技术组件RPC的实现:
技术组件 | RPC实现 |
---|---|
HDFS | Netty |
HBase | HBase-2.x以前:NIO + ProtoBuf HBase-2.x以后:Netty |
Zookeeper | BIO (主节点选举)+ NIO (3.4)+ Netty(3.6) |
Spark | Spark-1.x基于 Akka Spark-2.x基于 Netty |
Flink | Akka + Netty |
1.start-cluster.sh
在图中可以看到 。Flink的JobManager与TaskManager都是通过flink-daemon.sh来启动。
传入到flink-daemon.sh脚本大致分别为:
JobManager:flink-daemon.sh start standalonesession
TaskManger:flink-daemon.sh start taskexecutor
通过查看flink-daemon.sh:
case $DAEMON in
(taskexecutor)
# TaskManager 的启动主类 TaskManagerRunner
CLASS_TO_RUN=org.apache.flink.runtime.taskexecutor.TaskManagerRunner
;;
(zookeeper)
CLASS_TO_RUN=org.apache.flink.runtime.zookeeper.FlinkZooKeeperQuorumPeer
;;
(historyserver)
CLASS_TO_RUN=org.apache.flink.runtime.webmonitor.history.HistoryServer
;;
(standalonesession)
# JobManager 的启动主类 StandaloneSessionClusterEntrypoint
CLASS_TO_RUN=org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint
;;
(standalonejob)
CLASS_TO_RUN=org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint
;;
(*)
echo "Unknown daemon '${DAEMON}'. $USAGE."
exit 1
;;
esac
发现
1、JobManager的启动代号:standalonesession,实现类是:StandaloneSessionClusterEntrypoint
2、TaskManager的启动代号:taskexecutor,实现类是:TaskManagerRunner
StandaloneSessionClusterEntrypoint、TaskManagerRunner也是启动的进程名
并且都是用如下java命令来启动
echo "Starting $DAEMON daemon on host $HOSTNAME."
$JAVA_RUN $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}" > "$out" 200<&- 2>&1 < /dev/null &