Spark on Yarn 任务提交流程源码分析

最新推荐文章于 2023-08-23 22:46:05 发布

clygm22002

最新推荐文章于 2023-08-23 22:46:05 发布

阅读量511

点赞数

文章标签：大数据 shell ui

流程分析，以作备忘：

shell调用org.apache.spark.deploy.SparkSubmit > org.apache.spark.deploy.yarn.Client

打开 org.apache.spark.deploy.yarn. Client 的main方法：

def main(argStrings: Array[String]) {
    if (!sys.props.contains("SPARK_SUBMIT")) {
      logWarning("WARNING: This client is deprecated and will be removed in a " +
        "future version of Spark. Use ./bin/spark-submit with \"--master yarn\"")
    }

    // Set an env variable indicating we are running in YARN mode.
    // Note that any env variable with the SPARK_ prefix gets propagated to all (remote) processes
    System.setProperty("SPARK_YARN_MODE", "true")
    val sparkConf = new SparkConf

    val args = new ClientArguments(argStrings, sparkConf)
    // to maintain backwards-compatibility
    if (!Utils.isDynamicAllocationEnabled(sparkConf)) {
      sparkConf.setIfMissing("spark.executor.instances", args.numExecutors.toString)
    }
    new Client(args, sparkConf).run()
  }

进入run（）方法：

def run(): Unit = {
    val appId = submitApplication()
//判断是否等待完成，根据提交模式判断（client、cluster两种模式）   // private val fireAndForget = isClusterMode && !sparkConf.getBoolean("spark.yarn.submit.waitAppCompletion", true)
if (fireAndForget) {
      val report = getApplicationReport(appId) //ApplicationReport是应用程序的报告（包括程序用户、程序队列、程序名称等等）       val state = report.getYarnApplicationState   //得到应用程序的完成状态
      logInfo(s"Application report for $appId (state: $state)")
      logInfo(formatReportDetails(report))
      if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {    //判断状态
        throw new SparkException(s"Application $appId finished with status: $state")
      }
    } else {
      val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId) //client模式提交的处理（博文后边细说此方法）涉及两个对象YarnApplicationState（对于yarn来说任务的状态）、FinalApplicationStatus（对于任务来说任务的运行状态）
      if (yarnApplicationState == YarnApplicationState.FAILED ||
        finalApplicationStatus == FinalApplicationStatus.FAILED) {
        throw new SparkException(s"Application $appId finished with failed status")
      }
      if (yarnApplicationState == YarnApplicationState.KILLED ||
        finalApplicationStatus == FinalApplicationStatus.KILLED) {
        throw new SparkException(s"Application $appId is killed")
      }
      if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) {
        throw new SparkException(s"The final status of application $appId is undefined")
      }
    }
  }

跟进submitApplication（）方法：

def submitApplication(): ApplicationId = {
    var appId: ApplicationId = null
    try {
      // Setup the credentials before doing anything else,
      // so we have don't have issues at any point.
      setupCredentials()
      yarnClient.init(yarnConf)
      yarnClient.start()

      logInfo("Requesting a new application from cluster with %d NodeManagers"
        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()       val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()

      // Verify whether the cluster has enough resources for our AM
      verifyClusterResources(newAppResponse) //内存判断 (博文后边细讲此方法)

      // Set up the appropriate contexts to launch our AM
      val containerContext = createContainerLaunchContext(newAppResponse) // 构建ApplicationMaster的container(包括jar包路径 userClas等)
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application
      logInfo(s"Submitting application ${appId.getId} to ResourceManager")
      yarnClient.submitApplication(appContext)
      appId
    } catch {
      case e: Throwable =>
        if (appId != null) {
          cleanupStagingDir(appId)
        }
        throw e
    }
  }

看一下yarnClient什么来路：

import org.apache.hadoop.yarn.client.api.{YarnClient, YarnClientApplication}
private val yarnClient = YarnClient.createYarnClient

找一下hadoop源码：

@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class YarnClient extends AbstractService
{
  @InterfaceAudience.Public
  public static YarnClient createYarnClient()
  {
    YarnClient client = new YarnClientImpl();
    return client;
  }

  @InterfaceAudience.Private
  protected YarnClient(String name) {
    super(name);
  } ...... }

看一下createApplication（）方法：

public YarnClientApplication createApplication()
    throws YarnException, IOException
  {
    ApplicationSubmissionContext context = (ApplicationSubmissionContext)Records.newRecord(ApplicationSubmissionContext.class);

    GetNewApplicationResponse newApp = getNewApplication();
    ApplicationId appId = newApp.getApplicationId();
    context.setApplicationId(appId);
    return new YarnClientApplication(newApp, context);
  }

private GetNewApplicationResponse getNewApplication() throws YarnException, IOException
{
GetNewApplicationRequest request = (GetNewApplicationRequest)Records.newRecord(GetNewApplicationRequest.class);

return this.rmClient.getNewApplication(request);
}

看一下ApplicationClientProtocol的getNewApplication(request)方法：

@InterfaceAudience.Public
@InterfaceStability.Stable
@Idempotent
public abstract GetNewApplicationResponse getNewApplication(GetNewApplicationRequest paramGetNewApplicationRequest)
throws YarnException, IOException;

最后再回到run（）方法看一下如果为client模式提交的处理逻辑进入monitorApplication（）方法：

  def monitorApplication(
      appId: ApplicationId,
      returnOnRunning: Boolean = false,
      logApplicationReport: Boolean = true): (YarnApplicationState, FinalApplicationStatus) = {
    val interval = sparkConf.getLong("spark.yarn.report.interval", 1000)    //app运行监控的间隔时间ms
    var lastState: YarnApplicationState = null
    while (true) { //写死一直等到程序完成才返回
      Thread.sleep(interval)
      val report: ApplicationReport =
        try {
          getApplicationReport(appId)
        } catch {
          case e: ApplicationNotFoundException =>
            logError(s"Application $appId not found.")
            return (YarnApplicationState.KILLED, FinalApplicationStatus.KILLED)
          case NonFatal(e) =>
            logError(s"Failed to contact YARN for application $appId.", e)
            return (YarnApplicationState.FAILED, FinalApplicationStatus.FAILED)
        }
      val state = report.getYarnApplicationState

      if (logApplicationReport) {
        logInfo(s"Application report for $appId (state: $state)")

        // If DEBUG is enabled, log report details every iteration
        // Otherwise, log them every time the application changes state
        if (log.isDebugEnabled) {
          logDebug(formatReportDetails(report))
        } else if (lastState != state) {
          logInfo(formatReportDetails(report))
        }
      }

      if (state == YarnApplicationState.FINISHED ||
        state == YarnApplicationState.FAILED ||
        state == YarnApplicationState.KILLED) {
        cleanupStagingDir(appId)
        return (state, report.getFinalApplicationStatus) //返回运行结果
      }

      if (returnOnRunning && state == YarnApplicationState.RUNNING) {
        return (state, report.getFinalApplicationStatus)
      }

      lastState = state
    }

    // Never reached, but keeps compiler happy
    throw new SparkException("While loop is depleted! This should never happen...")
  }

至此，流程结束 yarn利用分布式缓存机制将application部署到各个计算节点

深入看一下verifyClusterResources(newAppResponse)方法：

private def verifyClusterResources(newAppResponse: GetNewApplicationResponse): Unit = {
    val maxMem = newAppResponse.getMaximumResourceCapability().getMemory()   //每个task最多可申请的内存 container的最大值
    logInfo("Verifying our application has not requested more than the maximum " +
      s"memory capability of the cluster ($maxMem MB per container)")
    val executorMem = args.executorMemory + executorMemoryOverhead   //1024(M) +设置的executor的值 args.executorMemory在1.5版本中为写死1024单位M
    if (executorMem > maxMem) { //如果executor所需要的内存大于container的最大值
      throw new IllegalArgumentException(s"Required executor memory (${args.executorMemory}" +
        s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " +
        "Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.")
    }
    val amMem = args.amMemory + amMemoryOverhead    //args.amMemory在1.5版本中为写死512单位M args.amMemoryOverhead: //    if (isClusterMode) driverMemOverheadKey else amMemOverheadKey //     val driverMemOverheadKey = "spark.yarn.driver.memoryOverhead"
  //   val amMemOverheadKey = "spark.yarn.am.memoryOverhead"

    if (amMem > maxMem) {
      throw new IllegalArgumentException(s"Required AM memory (${args.amMemory}" +
        s"+$amMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " +
        "Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.")
    }
    logInfo("Will allocate AM container, with %d MB memory including %d MB overhead".format(
      amMem,
      amMemoryOverhead))

    // We could add checks to make sure the entire cluster has enough resources but that involves
    // getting all the node reports and computing ourselves.
  }

总结：

Cluster模式:

客户端操作：
　 1、SparkSubmit中根据yarnConf来初始化yarnClient，并启动yarnClient 　　
2、创建客户端Application，并获取Application的ID，进一步判断集群中的资源是否满足executor和ApplicationMaster申请的资源，如果不满足则抛出IllegalArgumentException；
　 3、设置资源、环境变量：其中包括了设置Application的Staging目录、准备本地资源（jar文件、log4j.properties）、设置Application其中的环境变量、创建Container启动的Context等；
　 4、设置Application提交的Context，包括设置应用的名字、队列、AM的申请的Container、标记该作业的类型为spark；
　 5、申请Memory，并最终通过submitApplication方法向ResourceManager提交该Application。　　
当作业提交到YARN上之后，客户端就没事了，会关闭此进程，因为整个作业运行在YARN集群上进行，运行的结果将会保存到HDFS或者日志中。
Yarn操作：
   1、运行ApplicationMaster的run方法；
　　2、设置好相关的环境变量。
　　3、创建amClient，并启动；
　　4、在Spark UI启动之前设置Spark UI的AmIpFilter；
　　5、在startUserClass函数专门启动了一个线程（名称为Driver的线程）来启动用户提交的Application，也就是启动了Driver。在Driver中将会初始化SparkContext；
　　6、等待SparkContext初始化完成，最多等待spark.yarn.applicationMaster.waitTries次数（默认为10），如果等待了的次数超过了配置的，程序将会退出；否则用SparkContext初始化yarnAllocator；
怎么知道SparkContext初始化完成？
　　    其实在5步骤中启动Application的过程中会初始化SparkContext，在初始化SparkContext的时候将会创建YarnClusterScheduler，在SparkContext初始化完成的时候，
会调用YarnClusterScheduler类中的postStartHook方法，而该方法会通知ApplicationMaster已经初始化好了SparkContext

为何要等待SparkContext初始化完成？
CoarseGrainedExecutorBackend启动后需要向CoarseGrainedSchedulerBackend注册
　　7、当SparkContext初始化完成的时候，通过amClient向ResourceManager注册ApplicationMaster
　　8、分配并启动Executeors。在启动Executeors之前，先要通过yarnAllocator获取到numExecutors个Container，然后在Container中启动Executeors。如果在启动Executeors的过程中失败的次数达到了maxNumExecutorFailures的次数，
　　那么这个Application将失败，将Application Status标明为FAILED，并将关闭SparkContext。其实，启动Executeors是通过ExecutorRunnable实现的，而ExecutorRunnable内部是启动CoarseGrainedExecutorBackend的，
CoarseGrainedExecutorBackend启动后会向SchedulerBackend注册。(resourceManager是如何决定该分配几个container？在shell提交时跟参数默认启动两个executor)
　　9、最后，Task将在CoarseGrainedExecutorBackend里面运行，然后运行状况会通过Akka通知CoarseGrainedScheduler，直到作业运行完成。

Client模式:

客户端操作：
1、通过SparkSubmit类的launch的函数直接调用作业的main函数（通过反射机制实现），如果是集群模式就会调用Client的main函数。
　　2、而应用程序的main函数一定都有个SparkContent，并对其进行初始化；
　　3、在SparkContent初始化中将会依次做如下的事情：设置相关的配置、注册MapOutputTracker、BlockManagerMaster、BlockManager，创建taskScheduler和dagScheduler；其中比较重要的是创建taskScheduler和dagScheduler。在创建taskScheduler的时候会根据我们传进来的master来选择Scheduler和SchedulerBackend。由于我们选择的是yarn-client模式，程序会选择YarnClientClusterScheduler和YarnClientSchedulerBackend，并将YarnClientSchedulerBackend的实例初始化YarnClientClusterScheduler，上面两个实例的获取都是通过反射机制实现的， YarnClientSchedulerBackend类是CoarseGrainedSchedulerBackend类的子类，YarnClientClusterScheduler是TaskSchedulerImpl的子类，仅仅重写了TaskSchedulerImpl中的getRackForHost方法。
　　4、初始化完taskScheduler后，将创建dagScheduler，然后通过taskScheduler.start()启动taskScheduler，而在taskScheduler启动的过程中也会调用SchedulerBackend的start方法。在SchedulerBackend启动的过程中将会初始化一些参数，封装在ClientArguments中，并将封装好的ClientArguments传进Client类中，并client.submitApplication()方法获取Application ID。

　　
   Yarn操作：
1、运行ApplicationMaster的run方法（runExecutorLauncher）；
2、无需等待SparkContext初始化完成（因为 YarnClientClusterScheduler已启动完成），向sparkYarnAM注册该Application
　 3、分配Executors，这里面的分配逻辑和yarn-cluster里面类似，就不再说了。
　 4、最后，Task将在CoarseGrainedExecutorBackend里面运行，然后运行状况会通过Akka通知CoarseGrainedScheduler，直到作业运行完成。
　 5、在作业运行的时候，YarnClientSchedulerBackend会每隔1秒通过client获取到作业的运行状况，并打印出相应的运行信息，当Application的状态是FINISHED、FAILED和KILLED中的一种，那么程序将退出等待。
　 6、最后有个线程会再次确认Application的状态，当Application的状态是FINISHED、FAILED和KILLED中的一种，程序就运行完成，并停止SparkContext。整个过程就结束了。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/29754888/viewspace-1815323/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/29754888/viewspace-1815323/