Flink 源码入门01

最新推荐文章于 2024-04-28 06:16:09 发布

黄瓜炖啤酒鸭

最新推荐文章于 2024-04-28 06:16:09 发布

阅读量265

点赞数

分类专栏： Flink源码

Flink源码专栏收录该内容

8 篇文章 2 订阅

订阅专栏

1，下载官网代码，在本地编译，成功之后，可以看到分为很多个包，我们首先入门得进简单一点的，进入example案例里面

2，我们找到经典代码：

package org.apache.flink.streaming.examples.wordcount;

public class WordCount {

   // *************************************************************************
   // PROGRAM
   // *************************************************************************

   public static void main(String[] args) throws Exception {

      // Checking input parameters
      final ParameterTool params = ParameterTool.fromArgs(args);

      // set up the execution environment
      final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

      // make parameters available in the web interface
      env.getConfig().setGlobalJobParameters(params);

      // get input data
      DataStream<String> text;
      if (params.has("input")) {
         // read the text file from given input path
         text = env.readTextFile(params.get("input"));
      } else {
         System.out.println("Executing WordCount example with default input data set.");
         System.out.println("Use --input to specify file input.");
         // get default test text data
         text = env.fromElements(WordCountData.WORDS);
      }

      DataStream<Tuple2<String, Integer>> counts =
         // split up the lines in pairs (2-tuples) containing: (word,1)
         text.flatMap(new Tokenizer())
         // group by the tuple field "0" and sum up tuple field "1"
         .keyBy(0).sum(1);

      // emit result
      if (params.has("output")) {
         counts.writeAsText(params.get("output"));
      } else {
         System.out.println("Printing result to stdout. Use --output to specify output path.");
         counts.print();
      }

      // execute program
      env.execute("Streaming WordCount");
   }

   // *************************************************************************
   // USER FUNCTIONS
   // *************************************************************************

   /**
    * Implements the string tokenizer that splits sentences into words as a
    * user-defined FlatMapFunction. The function takes a line (String) and
    * splits it into multiple pairs in the form of "(word,1)" ({@code Tuple2<String,
    * Integer>}).
    */
   public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {

      @Override
      public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
         // normalize and split the line
         String[] tokens = value.toLowerCase().split("\\W+");

         // emit the pairs
         for (String token : tokens) {
            if (token.length() > 0) {
               out.collect(new Tuple2<>(token, 1));
            }
         }
      }
   }

}

1）程序开始于final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment()。

StreamExecutionEnvironmentFactory 创建对象：

我们看到

StreamExecutionEnvironment 这个类包含很多属性，需要可以设置哪些环境参数，可以在这里查看

2) 算子的注册

我们看到FliatMap算子具体内容：

里面完成了两件事，一是用反射拿到了flatMap算子的输出类型，

public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T, R> flatMapper) {

   TypeInformation<R> outType = TypeExtractor.getFlatMapReturnTypes(clean(flatMapper),
         getType(), Utils.getCallLocationName(), true);

   return transform("Flat Map", outType, new StreamFlatMap<>(clean(flatMapper)));

}

二是生成了一个Operator ：

flink流式计算的核心概念，就是将数据从输入流一个个传递给Operator进行链式处理，最后交给输出流的过程。对数据的每一次处理在逻辑上成为一个operator，并且为了本地化处理的效率起见，operator之间也可以串成一个chain一起处理（可以参考责任链模式帮助理解）。下面这张图表明了flink是如何看待用户的处理流程的：抽象化为一系列operator，以source开始，以sink结尾，中间的operator做的操作叫做transform，并且可以把几个操作串在一起执行。
image_1cae39t06eoo3ml1be8o0412c69.png-43.5kB
我们也可以更改flink的设置，要求它不要对某个操作进行chain处理，或者从某个操作开启一个新chain等。
上面代码中的最后一行transform方法的作用是返回一个SingleOutputStreamOperator，它继承了Datastream类并且定义了一些辅助方法，方便对流的操作。在返回之前，transform方法还把它注册到了执行环境中（如下图）。其他的操作，包括keyBy，sum和print，都只是不同的算子，在这里出现都是一样的效果，即生成一个operator并注册给执行环境用于生成DAG。

3）最后我们看执行代码：

env.execute("Streaming WordCount");

这行代码主要做了以下事情：

生成StreamGraph。代表程序的拓扑结构，是从用户代码直接生成的图。
生成JobGraph。这个图是要交给flink去生成task的图。
生成一系列配置
将JobGraph和配置交给flink集群去运行。如果不是本地运行的话，还会把jar文件通过网络发给其他节点。
以本地模式运行的话，可以看到启动过程，如启动性能度量、web模块、JobManager、ResourceManager、taskManager等等
启动任务。值得一提的是在启动任务之前，先启动了一个用户类加载器，这个类加载器可以用来做一些在运行时动态加载类的工作。

execute抽象类
public abstract JobExecutionResult execute(String jobName) throws Exception;

JobExecutionResult 类：

进入源码

LocalStreamEnvironment.java类

public JobExecutionResult execute(String jobName) throws Exception {
   // transform the streaming program into a JobGraph
   StreamGraph streamGraph = getStreamGraph();
   streamGraph.setJobName(jobName);

   JobGraph jobGraph = streamGraph.getJobGraph();
   jobGraph.setAllowQueuedScheduling(true);

   Configuration configuration = new Configuration();
   configuration.addAll(jobGraph.getJobConfiguration());
   configuration.setString(TaskManagerOptions.MANAGED_MEMORY_SIZE, "0");

   // add (and override) the settings with what the user defined
   configuration.addAll(this.configuration);

   if (!configuration.contains(RestOptions.BIND_PORT)) {
      configuration.setString(RestOptions.BIND_PORT, "0");
   }

   int numSlotsPerTaskManager = configuration.getInteger(TaskManagerOptions.NUM_TASK_SLOTS, jobGraph.getMaximumParallelism());

   MiniClusterConfiguration cfg = new MiniClusterConfiguration.Builder()
      .setConfiguration(configuration)
      .setNumSlotsPerTaskManager(numSlotsPerTaskManager)
      .build();

   if (LOG.isInfoEnabled()) {
      LOG.info("Running job on local embedded Flink mini cluster");
   }

   MiniCluster miniCluster = new MiniCluster(cfg);

   try {
      miniCluster.start();
      configuration.setInteger(RestOptions.PORT, miniCluster.getRestAddress().get().getPort());

      return miniCluster.executeJobBlocking(jobGraph);
   }
   finally {
      transformations.clear();
      miniCluster.close();
   }
}

继续进入

MiniCluster.java 类：

这一段代码核心逻辑就是调用那个submitJob方法

public JobExecutionResult executeJobBlocking(JobGraph job) throws JobExecutionException, InterruptedException {
   checkNotNull(job, "job is null");

   final CompletableFuture<JobSubmissionResult> submissionFuture = submitJob(job);

   final CompletableFuture<JobResult> jobResultFuture = submissionFuture.thenCompose(
      (JobSubmissionResult ignored) -> requestJobResult(job.getJobID()));

   final JobResult jobResult;

   try {
      jobResult = jobResultFuture.get();
   } catch (ExecutionException e) {
      throw new JobExecutionException(job.getJobID(), "Could not retrieve JobResult.", ExceptionUtils.stripExecutionException(e));
   }

   try {
      return jobResult.toJobExecutionResult(Thread.currentThread().getContextClassLoader());
   } catch (IOException | ClassNotFoundException e) {
      throw new JobExecutionException(job.getJobID(), e);
   }
}

submitJob 是方法：

public CompletableFuture<JobSubmissionResult> submitJob(JobGraph jobGraph) {
   final CompletableFuture<DispatcherGateway> dispatcherGatewayFuture = getDispatcherGatewayFuture();

   // we have to allow queued scheduling in Flip-6 mode because we need to request slots
   // from the ResourceManager
   jobGraph.setAllowQueuedScheduling(true);

   final CompletableFuture<InetSocketAddress> blobServerAddressFuture = createBlobServerAddress(dispatcherGatewayFuture);

   final CompletableFuture<Void> jarUploadFuture = uploadAndSetJobFiles(blobServerAddressFuture, jobGraph);

   final CompletableFuture<Acknowledge> acknowledgeCompletableFuture = jarUploadFuture
      .thenCombine(
         dispatcherGatewayFuture, 
         //todo 这里真正的执行submit操作
         (Void ack, DispatcherGateway dispatcherGateway) -> dispatcherGateway.submitJob(jobGraph, rpcTimeout))
      .thenCompose(Function.identity());

   return acknowledgeCompletableFuture.thenApply(
      (Acknowledge ignored) -> new JobSubmissionResult(jobGraph.getJobID()));
}

这里的Dispatcher是一个接收job，然后指派JobMaster去启动任务的类,我们可以看看它的类结构，有两个实现。在本地环境下启动的是MiniDispatcher，在集群上提交任务时，集群上启动的是StandaloneDispatcher。
image_1cenfj3p9fp110p0a8unn1mrh9.png-27.4kB

那么这个Dispatcher又做了什么呢？它启动了一个JobManagerRunner（这里我要吐槽Flink的命名，这个东西应该叫做JobMasterRunner才对，flink里的JobMaster和JobManager不是一个东西），委托JobManagerRunner去启动该Job的JobMaster。我们看一下对应的代码：

//jobManagerRunner.java

然后，JobMaster经过了一堆方法嵌套之后，执行到了这里：

private void scheduleExecutionGraph() {
   checkState(jobStatusListener == null);
   // register self as job status change listener
   jobStatusListener = new JobManagerJobStatusListener();
   executionGraph.registerJobStatusListener(jobStatusListener);

   try {
      executionGraph.scheduleForExecution(); //启动Executor
   }
   catch (Throwable t) {
      executionGraph.failGlobal(t);
   }
}

scheduleForExecution（）方法：

public void scheduleForExecution() throws JobException {

   assertRunningInJobMasterMainThread();

   final long currentGlobalModVersion = globalModVersion;

   if (transitionState(JobStatus.CREATED, JobStatus.RUNNING)) {

      final CompletableFuture<Void> newSchedulingFuture;

      switch (scheduleMode) {

         case LAZY_FROM_SOURCES:
            newSchedulingFuture = scheduleLazy(slotProvider);
            break;

         case EAGER:
            newSchedulingFuture = scheduleEager(slotProvider, allocationTimeout);
            break;

         default:
            throw new JobException("Schedule mode is invalid.");
      }

      if (state == JobStatus.RUNNING && currentGlobalModVersion == globalModVersion) {
         schedulingFuture = newSchedulingFuture;
         newSchedulingFuture.whenComplete(
            (Void ignored, Throwable throwable) -> {
               if (throwable != null && !(throwable instanceof CancellationException)) {
                  // only fail if the scheduling future was not canceled
                  failGlobal(ExceptionUtils.stripCompletionException(throwable));
               }
            });
      } else {
         newSchedulingFuture.cancel(false);
      }
   }
   else {
      throw new IllegalStateException("Job may only be scheduled from state " + JobStatus.CREATED);
   }
}

总结：

我们知道，flink的框架里有三层图结构，其中ExecutionGraph就是真正被执行的那一层，所以到这里为止，一个任务从提交到真正执行的流程就走完了，我们再回顾一下（顺便提一下远程提交时的流程区别）：

客户端代码的execute方法执行；
本地环境下，MiniCluster完成了大部分任务，直接把任务委派给了MiniDispatcher；
远程环境下，启动了一个RestClusterClient，这个类会以HTTP Rest的方式把用户代码提交到集群上；
远程环境下，请求发到集群上之后，必然有个handler去处理，在这里是JobSubmitHandler。这个类接手了请求后，委派StandaloneDispatcher启动job，到这里之后，本地提交和远程提交的逻辑往后又统一了；
Dispatcher接手job之后，会实例化一个JobManagerRunner，然后用这个runner启动job；
JobManagerRunner接下来把job交给了JobMaster去处理；
JobMaster使用ExecutionGraph的方法启动了整个执行图；整个任务就启动起来了。