flink run 参数_聊聊flink的Parallel Execution

最新推荐文章于 2024-07-02 23:24:06 发布

高编辑

最新推荐文章于 2024-07-02 23:24:06 发布

阅读量236

点赞数

文章标签： flink run 参数

本文链接：https://blog.csdn.net/weixin_36044301/article/details/112103129

版权

序

本文主要研究一下flink的Parallel Execution

实例

Operator Level

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();DataStream text = [...]DataStream> wordCounts = text .flatMap(new LineSplitter()) .keyBy(0) .timeWindow(Time.seconds(5)) .sum(1).setParallelism(5);wordCounts.print();env.execute("Word Count Example");

operators、data sources、data sinks都可以调用setParallelism()方法来设置parallelism

Execution Environment Level

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();env.setParallelism(3);DataStream text = [...]DataStream> wordCounts = [...]wordCounts.print();env.execute("Word Count Example");

在ExecutionEnvironment里头可以通过setParallelism来给operators、data sources、data sinks设置默认的parallelism；如果operators、data sources、data sinks自己有设置parallelism则会覆盖ExecutionEnvironment设置的parallelism

Client Level

./bin/flink run -p 10 ../examples/*WordCount-java*.jar

或者

try { PackagedProgram program = new PackagedProgram(file, args); InetSocketAddress jobManagerAddress = RemoteExecutor.getInetFromHostport("localhost:6123"); Configuration config = new Configuration(); Client client = new Client(jobManagerAddress, config, program.getUserCodeClassLoader()); // set the parallelism to 10 here client.run(program, 10, true);} catch (ProgramInvocationException e) { e.printStackTrace();}

使用CLI client，可以在命令行调用是用-p来指定，或者Java/Scala调用时在Client.run的参数中指定parallelism

System Level

# The parallelism used for programs that did not specify and other parallelism.parallelism.default: 1

可以在flink-conf.yaml中通过parallelism.default配置项给所有execution environments指定系统级的默认parallelism

ExecutionEnvironment

flink-java-1.7.1-sources.jar!/org/apache/flink/api/java/ExecutionEnvironment.java

@Publicpublic abstract class ExecutionEnvironment { //...... private final ExecutionConfig config = new ExecutionConfig(); /** * Sets the parallelism for operations executed through this environment. * Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with * x parallel instances. * *

This method overrides the default parallelism for this environment. * The {@link LocalEnvironment} uses by default a value equal to the number of hardware * contexts (CPU cores / threads). When executing the program via the command line client * from a JAR file, the default parallelism is the one configured for that setup. * * @param parallelism The parallelism */ public void setParallelism(int parallelism) { config.setParallelism(parallelism); } @Internal public Plan createProgramPlan(String jobName, boolean clearSinks) { if (this.sinks.isEmpty()) { if (wasExecuted) { throw new RuntimeException("No new data sinks have been defined since the " + "last execution. The last execution refers to the latest call to " + "'execute()', 'count()', 'collect()', or 'print()'."); } else { throw new RuntimeException("No data sinks have been created yet. " + "A program needs at least one sink that consumes data. " + "Examples are writing the data set or printing it."); } } if (jobName == null) { jobName = getDefaultName(); } OperatorTranslation translator = new OperatorTranslation(); Plan plan = translator.translateToPlan(this.sinks, jobName); if (getParallelism() > 0) { plan.setDefaultParallelism(getParallelism()); } plan.setExecutionConfig(getConfig()); // Check plan for GenericTypeInfo's and register the types at the serializers. if (!config.isAutoTypeRegistrationDisabled()) { plan.accept(new Visitor>() { private final Set> registeredTypes = new HashSet<>(); private final Set> visitedOperators = new HashSet<>(); @Override public boolean preVisit(org.apache.flink.api.common.operators.Operator> visitable) { if (!visitedOperators.add(visitable)) { return false; } OperatorInformation> opInfo = visitable.getOperatorInfo(); Serializers.recursivelyRegisterType(opInfo.getOutputType(), config, registeredTypes); return true; } @Override public void postVisit(org.apache.flink.api.common.operators.Operator> visitable) {} }); } try { registerCachedFilesWithPlan(plan); } catch (Exception e) { throw new RuntimeException("Error while registering cached files: " + e.getMessage(), e); } // clear all the sinks such that the next execution does not redo everything if (clearSinks) { this.sinks.clear(); wasExecuted = true; } // All types are registered now. Print information. int registeredTypes = config.getRegisteredKryoTypes().size() + config.getRegisteredPojoTypes().size() + config.getRegisteredTypesWithKryoSerializerClasses().size() + config.getRegisteredTypesWithKryoSerializers().size(); int defaultKryoSerializers = config.getDefaultKryoSerializers().size() + config.getDefaultKryoSerializerClasses().size(); LOG.info("The job has {} registered types and {} default Kryo serializers

高编辑

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫