Flink源码解析之 --- 启动流程

Flink源码走读

启动流程

本文以Flink源码中package org.apache.flink.streaming.examples.socket
SocketWindowWordCount为例,走读Flink源码,解析flink任务的执行流程

final String hostname;
final int port;
try {
final ParameterTool params = ParameterTool.fromArgs(args);
hostname = params.has(“hostname”) ? params.get(“hostname”) : “localhost”;
port = params.getInt(“port”);
} catch (Exception e) {
System.err.println(“No port specified. Please run ‘SocketWindowWordCount ” +
“–hostname –port ’, where hostname (localhost by default) ” +
“and port is the address of the text server”);
System.err.println(“To start a simple text server, run ‘netcat -l ’ and ” +
“type the input text into the command line”);
return;
}

上述代码从命令行中读取Flink JobManager 的 hostname、port配置

final StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();

  • 以上代码得到Flink运行时的基本环境,包括:
    • 任务运行时的并行度(本地运行时默认为CPU核心数)
    • ExecutionConfig
    • 包括 运行模式: ExecutionMode : PIPELINED, PIPELINED_FORCED,BATCH,BATCH_FORCED
    • 检查点模式 : CheckpointingMode: EXACTLY_ONCE , AT_LEAST_ONCE
    • 运行状态的后台存储 : AbstractStateBackend: MEMORY_STATE_BACKEND_NAME = “jobmanager”; public static final String FS_STATE_BACKEND_NAME = “filesystem”; public static final String ROCKSDB_STATE_BACKEND_NAME = “rocksdb”;

DataStream text = env.socketTextStream(hostname, port, “\n”);
从env 中得到 socketTextStream, socketTextStream 是从socket中以流的形式读取text数据

env.socketTextStream 返回的是DataStreamSource对象

介绍下DataStream和 DataStreamSource:
public class SingleOutputStreamOperator extends DataStream {
public class DataStreamSource extends SingleOutputStreamOperator {
DataStream继承关系

DataStream windowCounts = text
.flatMap(new FlatMapFunction

©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页