目录
1.其中 BaseSource、BaseTransform、BaseSink都是接口、都实现Plugin接口。他们的实现类就是对应的插件类型
4.最后启动 execution.start(sources, transforms, sinks);
转载请标明出处:SeaTunnel2.1.1源码解析_Adobee Chen的博客-CSDN博客
一:启动脚本解析
在 /bin/start-seatunnel-flink.sh
#!/bin/bash
function usage() {
echo "Usage: start-seatunnel-flink.sh [options]"
echo " options:"
echo " --config, -c FILE_PATH Config file"
echo " --variable, -i PROP=VALUE Variable substitution, such as -i city=beijing, or -i date=20190318"
echo " --check, -t Check config"
echo " --help, -h Show this help message"
}
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]] || [[ $# -le 1 ]]; then
usage
exit 0
fi
is_exist() {
if [ -z $1 ]; then
usage
exit -1
fi
}
PARAMS=""
while (( "$#" )); do
case "$1" in
-c|--config)
CONFIG_FILE=$2
is_exist ${CONFIG_FILE}
shift 2
;;
-i|--variable)
variable=$2
is_exist ${variable}
java_property_value="-D${variable}"
variables_substitution="${java_property_value} ${variables_substitution}"
shift 2
;;
*) # preserve positional arguments
PARAMS="$PARAMS $1"
shift
;;
esac
done
if [ -z ${CONFIG_FILE} ]; then
echo "Error: The following option is required: [-c | --config]"
usage
exit -1
fi
# set positional arguments in their proper place
eval set -- "$PARAMS"
BIN_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
APP_DIR=$(dirname ${BIN_DIR})
CONF_DIR=${APP_DIR}/config
PLUGINS_DIR=${APP_DIR}/lib
DEFAULT_CONFIG=${CONF_DIR}/application.conf
CONFIG_FILE=${CONFIG_FILE:-$DEFAULT_CONFIG}
assemblyJarName=$(find ${PLUGINS_DIR} -name seatunnel-core-flink*.jar)
if [ -f "${CONF_DIR}/seatunnel-env.sh" ]; then
source ${CONF_DIR}/seatunnel-env.sh
fi
string_trim() {
echo $1 | awk '{$1=$1;print}'
}
export JVM_ARGS=$(string_trim "${variables_substitution}")
exec ${FLINK_HOME}/bin/flink run \
${PARAMS} \
-c org.apache.seatunnel.SeatunnelFlink \
${assemblyJarName} --config ${CONFIG_FILE}
其中: 启动脚本能接收的 --config --variable --check(还不支持) --help
只要不是 config、variable参数就放到PARAMS参数里,最后执行flink 执行命令,PARAMS当作flink参数执行。
org.apache.seatunnel.SeatunnelFlink 这个类就是主入口
二:源码解析
-
入口
public class SeatunnelFlink {
public static void main(String[] args) throws Exception {
FlinkCommandArgs flinkArgs = CommandLineUtils.parseFlinkArgs(args);
Seatunnel.run(flinkArgs);
}
}
FlinkCommandArgs中进行命令行参数解析
public static FlinkCommandArgs parseFlinkArgs(String[] args) {
FlinkCommandArgs flinkCommandArgs = new FlinkCommandArgs();
JCommander.newBuilder()
.addObject(flinkCommandArgs)
.build()
.parse(args);
return flinkCommandArgs;
}
进入到Seatunnel.run(flinkArgs);
public static <T extends CommandArgs> void run(T commandArgs) {
if (!Common.setDeployMode(commandArgs.getDeployMode().getName())) {
throw new IllegalArgumentException(
String.format("Deploy mode: %s is Illegal", commandArgs.getDeployMode()));
}
try {
Command<T> command = CommandFactory.createCommand(commandArgs);
command.execute(commandArgs);
} catch (ConfigRuntimeException e) {
showConfigError(e);
throw e;
} catch (Exception e) {
showFatalError(e);
throw e;
}
}
进入到CommandFactory.createCommand(commandArgs);,根据不同的类型选择Command,我们看的是flinkCommand。
public static <T extends CommandArgs> Command<T> createCommand(T commandArgs) {
switch (commandArgs.getEngineType()) {
case FLINK:
return (Command<T>) new FlinkCommandBuilder().buildCommand((FlinkCommandArgs) commandArgs);
case SPARK:
return (Command<T>) new SparkCommandBuilder().buildCommand((SparkCommandArgs) commandArgs);
default:
throw new RuntimeException(String.format("engine type: %s is not supported", commandArgs.getEngineType()));
}
}
进入到 buildCommand,根据是否检查config进入到不同的实现类
public Command<FlinkCommandArgs> buildCommand(FlinkCommandArgs commandArgs) {
return commandArgs.isCheckConfig() ? new FlinkConfValidateCommand() : new FlinkTaskExecuteCommand();
}
FlinkConfValidateCommand和FlinkTaskExecuteCommand两个类都实现了Command类。并且都只有一个execute()方法
public class FlinkConfValidateCommand implements Command<FlinkCommandArgs>
public class FlinkTaskExecuteCommand extends BaseTaskExecuteCommand<FlinkCommandArgs, FlinkEnvironment>
在SeaTunnel.run(flinkArgs)进入 command.execute(commandArgs);
我们先看FlinkTaskExecuteCommand 类中的execute方法
2.execute()核心方法
public void execute(FlinkCommandArgs flinkCommandArgs) {
//flink
EngineType engine = flinkCommandArgs.getEngineType();
// --config
String configFile = flinkCommandArgs.getConfigFile();
//将String变成Config类
Config config = new ConfigBuilder<>(configFile, engine).getConfig();
//解析执行上下文
ExecutionContext<FlinkEnvironment> executionContext = new ExecutionContext<>(config, engine);
//解析 sources模块
List<BaseSource<FlinkEnvironment>> sources = executionContext.getSources();
//解析 tansform模块
List<BaseTransform<FlinkEnvironment>> transforms = executionContext.getTransforms();
//解析 sink模块
List<BaseSink<FlinkEnvironment>> sinks = executionContext.getSinks();
baseCheckConfig(sinks, transforms, sinks);
showAsciiLogo();
try (Execution<BaseSource<FlinkEnvironment>,
BaseTransform<FlinkEnvironment>,
BaseSink<FlinkEnvironment>,
FlinkEnvironment> execution = new ExecutionFactory<>(executionContext).createExecution()) {
//准备
prepare(executionContext.getEnvironment(), sources, transforms, sinks);
//启动
execution.start(sources, transforms, sinks);
//关闭
close(sources, transforms, sinks);
} catch (Exception e) {
throw new RuntimeException("Execute Flink task error", e);
}
}
1.其中 BaseSource、BaseTransform、BaseSink都是接口、都实现Plugin接口。他们的实现类就是对应的插件类型
如果我们的source、sink是kafka的话那么对应的就是source就是KafkaTableStream、Sink就是KafkaSink
2.execute()方法向下走,创建一个执行环境。
进入ExecutionFactory种的createExecution()
public Execution<BaseSource<ENVIRONMENT>, BaseTransform<ENVIRONMENT>, BaseSink<ENVIRONMENT>, ENVIRONMENT> createExecution() {
Execution execution = null;
switch (executionContext.getEngine()) {
case SPARK:
SparkEnvironment sparkEnvironment = (SparkEnvironment) executionContext.getEnvironment();
switch (executionContext.getJobMode()) {
case STREAMING:
execution = new SparkStreamingExecution(sparkEnvironment);
break;
case STRUCTURED_STREAMING:
execution = new StructuredStreamingExecution(sparkEnvironment);
break;
default:
execution = new SparkBatchExecution(sparkEnvironment);
}
break;
case FLINK:
FlinkEnvironment flinkEnvironment = (FlinkEnvironment) executionContext.getEnvironment();
switch (executionContext.getJobMode()) {
case STREAMING:
execution = new FlinkStreamExecution(flinkEnvironment);
break;
default:
execution = new FlinkBatchExecution(flinkEnvironment);
}
break;
default:
throw new IllegalArgumentException("No suitable engine");
}
LOGGER.info("current execution is [{}]", execution.getClass().getName());
return (Execution<BaseSource<ENVIRONMENT>, BaseTransform<ENVIRONMENT>, BaseSink<ENVIRONMENT>, ENVIRONMENT>) execution;
}
进入到FlinkStreamExecution中,可以看到最终是创建flink 执行环境。
private final FlinkEnvironment flinkEnvironment;
public FlinkStreamExecution(FlinkEnvironment streamEnvironment) {
this.flinkEnvironment = streamEnvironment;
}
3. 调用plugin.prepare(env)
protected final void prepare(E env, List<? extends Plugin<E>>... plugins) {
for (List<? extends Plugin<E>> pluginList : plugins) {
pluginList.forEach(plugin -> plugin.prepare(env));
}
}
例如kafka->kafka
KafkaTableStream prepare
public void prepare(FlinkEnvironment env) {
topic = config.getString(TOPICS);
PropertiesUtil.setProperties(config, kafkaParams, consumerPrefix, false);
tableName = config.getString(RESULT_TABLE_NAME);
if (config.hasPath(ROWTIME_FIELD)) {
rowTimeField = config.getString(ROWTIME_FIELD);
if (config.hasPath(WATERMARK_VAL)) {
watermark = config.getLong(WATERMARK_VAL);
}
}
String schemaContent = config.getString(SCHEMA);
format = FormatType.from(config.getString(SOURCE_FORMAT).trim().toLowerCase());
schemaInfo = JSONObject.parse(schemaContent, Feature.OrderedField);
}
KafkaSink prepare
public void prepare(FlinkEnvironment env) {
topic = config.getString("topics");
if (config.hasPath("semantic")) {
semantic = config.getString("semantic");
}
String producerPrefix = "producer.";
PropertiesUtil.setProperties(config, kafkaParams, producerPrefix, false);
kafkaParams.put("key.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
kafkaParams.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
}
4.最后启动 execution.start(sources, transforms, sinks);
通过步骤2.已经知道execution是根据不同引擎创建不同的执行环境,kafka是FlinkStreamExecution。那么就在FlinkStreamExecution中找到start()方法
5.执行flink 代码程序
其中sorce.getDate在KafkaTableStream中的getDate方法,sink在KafkaSink中的outputStream方法
public void start(List<FlinkStreamSource> sources, List<FlinkStreamTransform> transforms, List<FlinkStreamSink> sinks) throws Exception {
List<DataStream<Row>> data = new ArrayList<>();
for (FlinkStreamSource source : sources) {
DataStream<Row> dataStream = source.getData(flinkEnvironment);
data.add(dataStream);
registerResultTable(source, dataStream);
}
DataStream<Row> input = data.get(0);
for (FlinkStreamTransform transform : transforms) {
DataStream<Row> stream = fromSourceTable(transform.getConfig()).orElse(input);
input = transform.processStream(flinkEnvironment, stream);
registerResultTable(transform, input);
transform.registerFunction(flinkEnvironment);
}
for (FlinkStreamSink sink : sinks) {
DataStream<Row> stream = fromSourceTable(sink.getConfig()).orElse(input);
sink.outputStream(flinkEnvironment, stream);
}
try {
LOGGER.info("Flink Execution Plan:{}", flinkEnvironment.getStreamExecutionEnvironment().getExecutionPlan());
flinkEnvironment.getStreamExecutionEnvironment().execute(flinkEnvironment.getJobName());
} catch (Exception e) {
LOGGER.warn("Flink with job name [{}] execute failed", flinkEnvironment.getJobName());
throw e;
}
}
6.最后关闭
protected final void close(List<? extends Plugin<E>>... plugins) {
PluginClosedException exceptionHolder = null;
for (List<? extends Plugin<E>> pluginList : plugins) {
for (Plugin<E> plugin : pluginList) {
try (Plugin<?> closed = plugin) {
// ignore
} catch (Exception e) {
exceptionHolder = exceptionHolder == null ?
new PluginClosedException("below plugins closed error:") : exceptionHolder;
exceptionHolder.addSuppressed(new PluginClosedException(
String.format("plugin %s closed error", plugin.getClass()), e));
}
}
}
if (exceptionHolder != null) {
throw exceptionHolder;
}
}