Flink源码走读
启动流程
本文以Flink源码中package org.apache.flink.streaming.examples.socket
SocketWindowWordCount为例,走读Flink源码,解析flink任务的执行流程
final String hostname;
final int port;
try {
final ParameterTool params = ParameterTool.fromArgs(args);
hostname = params.has(“hostname”) ? params.get(“hostname”) : “localhost”;
port = params.getInt(“port”);
} catch (Exception e) {
System.err.println(“No port specified. Please run ‘SocketWindowWordCount ” +
“–hostname –port ’, where hostname (localhost by default) ” +
“and port is the address of the text server”);
System.err.println(“To start a simple text server, run ‘netcat -l ’ and ” +
“type the input text into the command line”);
return;
}
上述代码从命令行中读取Flink JobManager 的 hostname、port配置
final StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
- 以上代码得到Flink运行时的基本环境,包括:
- 任务运行时的并行度(本地运行时默认为CPU核心数)
- ExecutionConfig
- 包括 运行模式: ExecutionMode : PIPELINED, PIPELINED_FORCED,BATCH,BATCH_FORCED
- 检查点模式 : CheckpointingMode: EXACTLY_ONCE , AT_LEAST_ONCE
- 运行状态的后台存储 : AbstractStateBackend: MEMORY_STATE_BACKEND_NAME = “jobmanager”; public static final String FS_STATE_BACKEND_NAME = “filesystem”; public static final String ROCKSDB_STATE_BACKEND_NAME = “rocksdb”;
DataStream text = env.socketTextStream(hostname, port, “\n”);
从env 中得到 socketTextStream, socketTextStream 是从socket中以流的形式读取text数据
env.socketTextStream 返回的是DataStreamSource对象
介绍下DataStream和 DataStreamSource:
public class SingleOutputStreamOperator extends DataStream {
public class DataStreamSource extends SingleOutputStreamOperator {
DataStream windowCounts = text
.flatMap(new FlatMapFunction