flink源码解析3 ExecutionGraph的形成与物理执行

flink在client端形成jobGraph之后会提交给JobMaster ,在这里会形成ExecutionGraph

JobMaster的构造函数中有这么一句话:

this.executionGraph = this.createAndRestoreExecutionGraph(this.jobManagerJobMetricGroup);

一直追踪导EecutionGraphBulider#buildGraph

这个方法中有比较重要的一句话

  // 根据JobVertex列表,生成execution graph
        executionGraph.attachJobGraph(sortedTopology);

 根据jobGraph生成executionGraph的大部分逻辑都在这个方法中

for (JobVertex jobVertex : topologiallySorted) {

   if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {
      this.isStoppable = false;
   }

   // create the execution job vertex and attach it to the graph
   ExecutionJobVertex ejv = new ExecutionJobVertex(
      this,
      jobVertex,
      1,
      rpcTimeout,
      globalModVersion,
      createTimestamp);

   ejv.connectToPredecessors(this.intermediateResults);

   ExecutionJobVertex previousTask = this.tasks.putIfAbsent(jobVertex.getID(), ejv);
   if (previousTask != null) {
      throw new JobException(String.format("Encountered two job vertices with ID %s : previous=[%s] / new=[%s]",
            jobVertex.getID(), ejv, previousTask));
   }

看上面这段代码的逻辑:

首先会遍历所有的JobVertex,根据每一个JobVertex生成一个ExecutionJobVertex。重点在ExecutionJobVertex的构造函数中:

重要的代码片段:

this.producedDataSets = new IntermediateResult[jobVertex.getNumberOfProducedIntermediateDataSets
for (int i = 0; i < jobVertex.getProducedDataSets().size(); i++) {
   final IntermediateDataSet result = jobVertex.getProducedDataSets().get(i);

   this.producedDataSets[i] = new IntermediateResult(
         result.getId(),
         this,
         numTaskVertices,
         result.getResultType());
}

首先会创建一个producedDataSets列表,然后根据JobVertext中的ProducedDataSet变量,给produceDatSets列表赋值

for (int i = 0; i < numTaskVertices; i++) {
   ExecutionVertex vertex = new ExecutionVertex(
         this,
         i,
         producedDataSets,
         timeout,
         initialGlobalModVersion,
         createTimestamp,
         maxPriorAttemptsHistoryLength);

   this.taskVertices[i] = vertex;
}

这里则是根据并行度,创建一个ExecutionVertex,每个并行度就是一个ExecutionVertex,然后放在taskVertices数组中。

ExecutionJobVertex创建完毕之后会进入
ejv.connectToPredecessors(this.intermediateResults);

追踪导ExectionVertex#connectSource

public void connectSource(int inputNumber, IntermediateResult source, JobEdge edge, int consumerNumber) {

   final DistributionPattern pattern = edge.getDistributionPattern();
   final IntermediateResultPartition[] sourcePartitions = source.getPartitions();

   ExecutionEdge[] edges;

   switch (pattern) {
      case POINTWISE:
         edges = connectPointwise(sourcePartitions, inputNumber);
         break;

      case ALL_TO_ALL:
         edges = connectAllToAll(sourcePartitions, inputNumber);
         break;

      default:
         throw new RuntimeException("Unrecognized distribution pattern.");

   }

这里根据switch case 有两个分支,先看第一个分支

private ExecutionEdge[] connectPointwise(IntermediateResultPartition[] sourcePartitions, int inputNumber) {
   final int numSources = sourcePartitions.length;
   final int parallelism = getTotalNumberOfParallelSubtasks();

   // simple case same number of sources as targets
   if (numSources == parallelism) {
      return new ExecutionEdge[] { new ExecutionEdge(sourcePartitions[subTaskIndex], this, inputNumber) };
   }
   else if (numSources < parallelism) {

      int sourcePartition;

      // check if the pattern is regular or irregular
      // we use int arithmetics for regular, and floating point with rounding for irregular
      if (parallelism % numSources == 0) {
         // same number of targets per source
         int factor = parallelism / numSources;
         sourcePartition = subTaskIndex / factor;
      }
      else {
         // different number of targets per source
         float factor = ((float) parallelism) / numSources;
         sourcePartition = (int) (subTaskIndex / factor);
      }

      return new ExecutionEdge[] { new ExecutionEdge(sourcePartitions[sourcePartition], this, inputNumber) };
   }
   else {
      if (numSources % parallelism == 0) {
         // same number of targets per source
         int factor = numSources / parallelism;
         int startIndex = subTaskIndex * factor;

         ExecutionEdge[] edges = new ExecutionEdge[factor];
         for (int i = 0; i < factor; i++) {
            edges[i] = new ExecutionEdge(sourcePartitions[startIndex + i], this, inputNumber);
         }
         return edges;
      }
      else {
         float factor = ((float) numSources) / parallelism;

         int start = (int) (subTaskIndex * factor);
         int end = (subTaskIndex == getTotalNumberOfParallelSubtasks() - 1) ?
               sourcePartitions.length :
               (int) ((subTaskIndex + 1) * factor);

         ExecutionEdge[] edges = new ExecutionEdge[end - start];
         for (int i = 0; i < edges.length; i++) {
            edges[i] = new ExecutionEdge(sourcePartitions[start + i], this, inputNumber);
         }

         return edges;
      }
   }
}

逻辑比较复杂,描述一下:

这里会获取到ExecutionVertex的并行度和上游的IntermediateResultPartition的数目来执行不同的策略:

(1) 如果并发数等于partition数,则一对一进行连接。如下图所示:
numSources == parallelism

(2) 如果并发数大于partition数,则一对多进行连接。如下图所示:
numSources < parallelism,且parallelism % numSources == 0

(3) 如果并发数小于partition数,则多对一进行连接。如下图所示:
numSources > parallelism,且numSources % parallelism == 0

 

再看conectAlltoAll,全连接模式

        ExecutionEdge[] edges = new ExecutionEdge[sourcePartitions.length];
 
        for (int i = 0; i < sourcePartitions.length; i++) {
            IntermediateResultPartition irp = sourcePartitions[i];
            edges[i] = new ExecutionEdge(irp, this, inputNumber);
        }
 
        return edges;

这就有点类似sql中的join操作的笛卡尔积模式

ExecutionVertex有两个不同的输入:输入A和B。其中输入A的partition=1, 输入B的partition=8,那么这个二维数组inputEdges如下(为简短,以irp代替IntermediateResultPartition)

[ ExecutionEdge[ A.irp[0]] ]
[ ExecutionEdge[ B.irp[0], B.irp[1], ..., B.irp[7] ]

------------------------------------

接着看物理执行图

找到ExecutionGraph#scheduleForExecution方法

通常都是

case EAGER:
   newSchedulingFuture = scheduleEager(slotProvider, allocationTimeout);

这种模式

private CompletableFuture<Void> scheduleEager(SlotProvider slotProvider, final Time timeout) {
   checkState(state == JobStatus.RUNNING, "job is not running currently");

   // Important: reserve all the space we need up front.
   // that way we do not have any operation that can fail between allocating the slots
   // and adding them to the list. If we had a failure in between there, that would
   // cause the slots to get lost
   final boolean queued = allowQueuedScheduling;

   // collecting all the slots may resize and fail in that operation without slots getting lost
   final ArrayList<CompletableFuture<Execution>> allAllocationFutures = new ArrayList<>(getNumberOfExecutionJobVertices());

   // allocate the slots (obtain all their futures
   for (ExecutionJobVertex ejv : getVerticesTopologically()) {
      // these calls are not blocking, they only return futures
      Collection<CompletableFuture<Execution>> allocationFutures = ejv.allocateResourcesForAll(
         slotProvider,
         queued,
         LocationPreferenceConstraint.ALL,
         allocationTimeout);

      allAllocationFutures.addAll(allocationFutures);
   }

   // this future is complete once all slot futures are complete.
   // the future fails once one slot future fails.
   final ConjunctFuture<Collection<Execution>> allAllocationsFuture = FutureUtils.combineAll(allAllocationFutures);

   final CompletableFuture<Void> currentSchedulingFuture = allAllocationsFuture
      .thenAccept(
         (Collection<Execution> executionsToDeploy) -> {
            for (Execution execution : executionsToDeploy) {
               try {
                  execution.deploy();
               } catch (Throwable t) {
                  throw new CompletionException(
                     new FlinkException(
                        String.format("Could not deploy execution %s.", execution),
                        t));
               }
            }
         })
      // Generate a more specific failure message for the eager scheduling
      .exceptionally(
         (Throwable throwable) -> {
            final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);
            final Throwable resultThrowable;

            if (strippedThrowable instanceof TimeoutException) {
               int numTotal = allAllocationsFuture.getNumFuturesTotal();
               int numComplete = allAllocationsFuture.getNumFuturesCompleted();
               String message = "Could not allocate all requires slots within timeout of " +
                  timeout + ". Slots required: " + numTotal + ", slots allocated: " + numComplete;

               resultThrowable = new NoResourceAvailableException(message);
            } else {
               resultThrowable = strippedThrowable;
            }

            throw new CompletionException(resultThrowable);
         });

   return currentSchedulingFuture;
}

这段代码大量使用jdk8新增的CompleteFuture特性,这里不做介绍,网上有大量文章介绍这个组件

这里会根据

ExecutionJobVertices的数量创建异步任务。并且给每个ExecutionJobVertices分配适当的slot,然后调用
execution.deploy();方法

上面方法进入之后截取主要的几句话:

final TaskDeploymentDescriptor deployment = vertex.createDeploymentDescriptor(
   attemptId,
   slot,
   taskRestore,
   attemptNumber);
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();

final CompletableFuture<Acknowledge> submitResultFuture = taskManagerGateway.submitTask(deployment, rpcTimeout);

第一句话

包含了从Execution Graph到真正物理执行图的转换。如将IntermediateResultPartition转化成ResultPartition,ExecutionEdge转成InputChannelDeploymentDescriptor(最终会在执行时转化成InputGate)。

最后通过RPC方法提交task,实际会调用到TaskExecutor.submitTask方法中。
这个方法会创建真正的Task,然后调用task.startTaskThread();开始task的执行。

在Task构造函数中,会根据输入的参数,创建InputGate, ResultPartition, ResultPartitionWriter等。

startTaskThread方法,则会执行executingThread.start,从而调用Task.run方法。

进入TaskExecutor#submitTask

找到task.startTaskThread();

进入之后找到

executingThread.start();

我们要看executingThread的run方法

找到几句核心代码:

invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass, env);

这里的invokable即为operator对象实例,通过反射创建

那么用户真正写的逻辑代码在哪里呢?比如word count中的Tokenizer,去了哪里呢?
OneInputStreamTask的基类StreamTask,包含了headOperator和operatorChain。当我们调用dataStream.flatMap(new Tokenizer())的时候,会生成一个StreamFlatMap的operator,这个operator是一个AbstractUdfStreamOperator,而用户的代码new Tokenizer,即为它的userFunction。

所以再串回来,以OneInputStreamTask为例,Task的核心执行代码即为OneInputStreamTask.invoke方法,它会调用StreamTask.run方法,这是个抽象方法,最终会调用其派生类的run方法,即OneInputStreamTask, SourceStreamTask等。

OneInputStreamTask的run方法代码如下:

final OneInputStreamOperator<IN, OUT> operator = this.headOperator;
    final StreamInputProcessor<IN> inputProcessor = this.inputProcessor;
    final Object lock = getCheckpointLock();
        
    while (running && inputProcessor.processInput(operator, lock)) {
        // all the work happens in the "processInput" method
    }

就是一直不停地循环调用inputProcessor.processInput(operator, lock)方法,即StreamInputProcessor.processInput方法:

   public boolean processInput(OneInputStreamOperator<IN, ?> streamOperator, final Object lock) throws Exception {
     // ...
     
        while (true) {
            if (currentRecordDeserializer != null) {
           // ...
           
                if (result.isFullRecord()) {
                    StreamElement recordOrMark = deserializationDelegate.getInstance();
                    
              // 处理watermark,则框架处理
                    if (recordOrMark.isWatermark()) {
                       // watermark处理逻辑
                       // ...
                        continue;
                    } else if(recordOrMark.isLatencyMarker()) {
                        // 处理latency mark,也是由框架处理
                        synchronized (lock) {
                            streamOperator.processLatencyMarker(recordOrMark.asLatencyMarker());
                        }
                        continue;
                    } else {
                        // ***** 这里是真正的用户逻辑代码 *****
                        StreamRecord<IN> record = recordOrMark.asRecord();
                        synchronized (lock) {
                            numRecordsIn.inc();
                            streamOperator.setKeyContextElement1(record);
                            streamOperator.processElement(record);
                        }
                        return true;
                    }
                }
            }

        // 其他处理逻辑
        // ...
        }
    }

上面的代码中,streamOperator.processElement(record);才是真正处理用户逻辑的代码,以StreamFlatMap为例,即为它的processElement方法:

 public void processElement(StreamRecord<IN> element) throws Exception {
        collector.setTimestamp(element);
        userFunction.flatMap(element.getValue(), collector);
    }
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值