前面介绍了批量处理的WorkCount是如何执行的
这篇从WordCount的流式处理开始
/** * Implements the "WordCount" program that computes a simple word occurrence * histogram over text files in a streaming fashion. * *
The input is a plain text file with lines separated by newline characters. * *
Usage:
WordCount --input --output
* If no parameters are provided, the program is run with default data from * {@link WordCountData}. * *
This example shows how to: *
- *
- write a simple Flink Streaming program, *
- use tuple data types, *
- write and use user-defined functions. *
整个执行流程如下图所示:
![497ab32949cfb554d28412461c39d616.png](https://i-blog.csdnimg.cn/blog_migrate/f2c4698e7453d51211432401ba77c68b.jpeg)
第1~4步:main方法读取文件,增加算子
private DataStreamSource createFileInput(FileInputFormat inputFormat, TypeInformation typeInfo, String sourceName, FileProcessingMode monitoringMode, long interval) { Preconditions.checkNotNull(inputFormat, "Unspecified file input format."); Preconditions.checkNotNull(typeInfo, "Unspecified output type information."); Preconditions.checkNotNull(sourceName, "Unspecified name for the source."); Preconditions.checkNotNull(monitoringMode, "Unspecified monitoring mode."); Preconditions.checkArgument(monitoringMode.equals(FileProcessingMode.PROCESS_ONCE) || interval >= ContinuousFileMonitoringFunction.MIN_MONITORING_INTERVAL, "The path monitoring interval cannot be less than " + ContinuousFileMonitoringFunction.MIN_MONITORING_INTERVAL + " ms."); ContinuousFileMonitoringFunction monitoringFunction = new ContinuousFileMonitoringFunction<>(inputFormat, monitoringMode, getParallelism(), interval); ContinuousFileReaderOperator reader = new ContinuousFileReaderOperator<>(inputFormat); SingleOutputStreamOperator source = addSource(monitoringFunction, sourceName) .transform("Split Reader: " + sourceName, typeInfo, reader); //1 return new DataStreamSource<>(source); }
增加算子的方法,当调用execute方法时,此时增加的算子会被执行。
/** * Adds an operator to the list of operators that should be executed when calling * {@link #execute}. * *
When calling {@link #execute()} only the operators that where previously added to the list * are executed. * *
This is not meant to be used by users. The API methods that create operators must call * this method. */ @Internal public void addOperator(StreamTransformation> transformation) { Preconditions.checkNotNull(transformation, "transformation must not be null."); this.transformations.add(transformation); }
第5步:产生StreamGraph,从而可以得到JobGraph,即将Stream程序转换成JobGraph
// transform the streaming program into a JobGraph StreamGraph streamGraph = getStreamGraph(); streamGraph.setJobName(jobName); JobGraph jobGraph = streamGraph.getJobGraph(); jobGraph.setAllowQueuedScheduling(true);
第6~8步启动MiniCluster,为执行job做准备
/** * Starts the mini cluster, based on the configured properties. * * @throws Exception This method passes on any exception that occurs during the startup of * the mini cluster. */ public void start() throws Exception { synchronized (lock) { checkState(!running, "MiniCluster is already running"); LOG.info("Starting Flink Mini Cluster"); LOG.debug("Using configuration {}