Spark on Yarn 任务提交流程源码分析


流程分析,以作备忘:

shell调用org.apache.spark.deploy.SparkSubmit   >  org.apache.spark.deploy.yarn.Client


打开 org.apache.spark.deploy.yarn. Client 的main方法:
def main(argStrings: Array[String]) {
    if (!sys.props.contains("SPARK_SUBMIT")) {
      logWarning("WARNING: This client is deprecated and will be removed in a " +
        "future version of Spark. Use ./bin/spark-submit with \"--master yarn\"")
    }

    // Set an env variable indicating we are running in YARN mode.
    // Note that any env variable with the SPARK_ prefix gets propagated to all (remote) processes
    System.setProperty("SPARK_YARN_MODE", "true")
    val sparkConf = new SparkConf

    val args = new ClientArguments(argStrings, sparkConf)
    // to maintain backwards-compatibility
    if (!Utils.isDynamicAllocationEnabled(sparkConf)) {
      sparkConf.setIfMissing("spark.executor.instances", args.numExecutors.toString)
    }
    new Client(args, sparkConf).run()
  }
进入run()方法:
def run(): Unit = {
    val appId = submitApplication()
   //判断是否等待完成,根据提交模式判断(client、cluster两种模式      // private val fireAndForget = isClusterMode && !sparkConf.getBoolean("spark.yarn.submit.waitAppCompletion", true)
   if (fireAndForget) {                             
      val report = getApplicationReport(appId)                               //ApplicationReport是应用程序的报告(包括程序用户、程序队列、程序名称等等       val state = report.getYarnApplicationState                             //得到应用程序的完成状态
      logInfo(s"Application report for $appId (state: $state)")
      logInfo(formatReportDetails(report)
      if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {    //判断状态
        throw new SparkException(s"Application $appId finished with status: $state")
      }
    } else {
      val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId)               //client模式提交的处理 (博文后边细说此方法)  涉及两个对象YarnApplicationState(对于yarn来说任务的状态)、FinalApplicationStatus(对于任务来说任务的运行状态)
      if (yarnApplicationState == YarnApplicationState.FAILED ||
        finalApplicationStatus == FinalApplicationStatus.FAILED) {
        throw new SparkException(s"Application $appId finished with failed status")
      }
      if (yarnApplicationState == YarnApplicationState.KILLED ||
        finalApplicationStatus == FinalApplicationStatus.KILLED) {
        throw new SparkException(s"Application $appId is killed")
      }
      if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) {
        throw new SparkException(s"The final status of application $appId is undefined")
      }
    }
  }
跟进submitApplication()方法:
def submitApplication(): ApplicationId = {
    var appId: ApplicationId = null
    try {
      // Setup the credentials before doing anything else,
      // so we have don't have issues at any point.
      setupCredentials()
      yarnClient.init(yarnConf)
      yarnClient.start()

      logInfo("Requesting a new application from cluster with %d NodeManagers"
        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()       val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()

      // Verify whether the cluster has enough resources for our AM
      verifyClusterResources(newAppResponse)                                  //内存判断   (博文后边细讲此方法)

      // Set up the appropriate contexts to launch our AM
      val containerContext = createContainerLaunchContext(newAppResponse)      // 构建ApplicationMaster的container(包括jar包路径 userClas等)
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application
      logInfo(s"Submitting application ${appId.getId} to ResourceManager")
      yarnClient.submitApplication(appContext)
      appId
    } catch {
      case e: Throwable =>
        if (appId != null) {
          cleanupStagingDir(appId)
        }
        throw e
    }
  }
看一下yarnClient什么来路:
import org.apache.hadoop.yarn.client.api.{YarnClient, YarnClientApplication}
private
val yarnClient = YarnClient.createYarnClient
找一下hadoop源码:
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class YarnClient extends AbstractService
{
  @InterfaceAudience.Public
  public static YarnClient createYarnClient()
  {
    YarnClient client = new YarnClientImpl();
    return client;
  }

  @InterfaceAudience.Private
  protected YarnClient(String name) {
    super(name);
  }  ...... }

看一下createApplication()方法:
public YarnClientApplication createApplication()
    throws YarnException, IOException
  {
    ApplicationSubmissionContext context = (ApplicationSubmissionContext)Records.newRecord(ApplicationSubmissionContext.class);

    GetNewApplicationResponse newApp = getNewApplication();
    ApplicationId appId = newApp.getApplicationId();
    context.setApplicationId(appId);
    return new YarnClientApplication(newApp, context);
  }
   private GetNewApplicationResponse getNewApplication() throws YarnException, IOException
  {
    GetNewApplicationRequest request = (GetNewApplicationRequest)Records.newRecord(GetNewApplicationRequest.class);

    return this.rmClient.getNewApplication(request);
  }

  看一下ApplicationClientProtocol的getNewApplication(request)方法:
 
  @InterfaceAudience.Public
  @InterfaceStability.Stable
  @Idempotent
  public abstract GetNewApplicationResponse getNewApplication(GetNewApplicationRequest paramGetNewApplicationRequest)
    throws YarnException, IOException;



最后再回到run()方法看一下如果为client模式提交的处理逻辑 进入monitorApplication()方法:

  def monitorApplication(
      appId: ApplicationId,
      returnOnRunning: Boolean = false,
      logApplicationReport: Boolean = true): (YarnApplicationState, FinalApplicationStatus) = {
    val interval = sparkConf.getLong("spark.yarn.report.interval", 1000)    //app运行监控的间隔时间ms
    var lastState: YarnApplicationState = null
    while (true) {                                                          //写死 一直等到程序完成才返回
      Thread.sleep(interval)
      val report: ApplicationReport =
        try {
          getApplicationReport(appId)
        } catch {
          case e: ApplicationNotFoundException =>
            logError(s"Application $appId not found.")
            return (YarnApplicationState.KILLED, FinalApplicationStatus.KILLED)
          case NonFatal(e) =>
            logError(s"Failed to contact YARN for application $appId.", e)
            return (YarnApplicationState.FAILED, FinalApplicationStatus.FAILED)
        }
      val state = report.getYarnApplicationState                        

      if (logApplicationReport) {
        logInfo(s"Application report for $appId (state: $state)")

        // If DEBUG is enabled, log report details every iteration
        // Otherwise, log them every time the application changes state
        if (log.isDebugEnabled) {
          logDebug(formatReportDetails(report))
        } else if (lastState != state) {
          logInfo(formatReportDetails(report))
        }
      }

      if (state == YarnApplicationState.FINISHED ||
        state == YarnApplicationState.FAILED ||
        state == YarnApplicationState.KILLED) {
        cleanupStagingDir(appId)
        return (state, report.getFinalApplicationStatus)   //返回运行结果
      }

      if (returnOnRunning && state == YarnApplicationState.RUNNING) {
        return (state, report.getFinalApplicationStatus)
      }

      lastState = state
    }

    // Never reached, but keeps compiler happy
    throw new SparkException("While loop is depleted! This should never happen...")
  }

至此,流程结束  yarn利用分布式缓存机制将application部署到各个计算节点

深入看一下verifyClusterResources(newAppResponse)方法:
private def verifyClusterResources(newAppResponse: GetNewApplicationResponse): Unit = {
    val maxMem = newAppResponse.getMaximumResourceCapability().getMemory()              //每个task最多可申请的内存 container的最大值
    logInfo("Verifying our application has not requested more than the maximum " +
      s"memory capability of the cluster ($maxMem MB per container)"
    val executorMem = args.executorMemory + executorMemoryOverhead     //1024(M) +设置的executor的值    args.executorMemory在1.5版本中为写死1024单位M 
    if (executorMem > maxMem) {                   //如果executor所需要的内存大于container的最大值
      throw new IllegalArgumentException(s"Required executor memory (${args.executorMemory}" +
        s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " +
        "Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.")
    }
    val amMem = args.amMemory + amMemoryOverhead            //args.amMemory在1.5版本中为写死512单位M args.amMemoryOverhead:   //                                                            if (isClusterMode) driverMemOverheadKey else amMemOverheadKey   //                                                               val driverMemOverheadKey = "spark.yarn.driver.memoryOverhead"
  //                                                               val amMemOverheadKey = "spark.yarn.am.memoryOverhead"
 

    if (amMem > maxMem) {
      throw new IllegalArgumentException(s"Required AM memory (${args.amMemory}" +
        s"+$amMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " +
        "Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.")
    }
    logInfo("Will allocate AM container, with %d MB memory including %d MB overhead".format(
      amMem,
      amMemoryOverhead))

    // We could add checks to make sure the entire cluster has enough resources but that involves
    // getting all the node reports and computing ourselves.
  }


    总结:

  Cluster模式:

    客户端操作:
   1、SparkSubmit中根据yarnConf来初始化yarnClient,并启动yarnClient   
     2、创建客户端Application,并获取Application的ID,进一步判断集群中的资源是否满足executor和ApplicationMaster申请的资源,如果不满足则抛出IllegalArgumentException;
   3、设置资源、环境变量:其中包括了设置Application的Staging目录、准备本地资源(jar文件、log4j.properties)、设置Application其中的环境变量、创建Container启动的Context等;
   4、设置Application提交的Context,包括设置应用的名字、队列、AM的申请的Container、标记该作业的类型为spark;
   5、申请Memory,并最终通过submitApplication方法向ResourceManager提交该Application。  
       当作业提交到YARN上之后,客户端就没事了,会关闭此进程,因为整个作业运行在YARN集群上进行,运行的结果将会保存到HDFS或者日志中。
    Yarn操作:
       1、运行ApplicationMaster的run方法;
  2、设置好相关的环境变量。
  3、创建amClient,并启动;
  4、在Spark UI启动之前设置Spark UI的AmIpFilter;
  5、在startUserClass函数专门启动了一个线程(名称为Driver的线程)来启动用户提交的Application,也就是启动了Driver。在Driver中将会初始化SparkContext;
  6、等待SparkContext初始化完成,最多等待spark.yarn.applicationMaster.waitTries次数(默认为10),如果等待了的次数超过了配置的,程序将会退出;否则用SparkContext初始化yarnAllocator;
                怎么知道SparkContext初始化完成?
            其实在5步骤中启动Application的过程中会初始化SparkContext,在初始化SparkContext的时候将会创建YarnClusterScheduler,在SparkContext初始化完成的时候,
               会调用YarnClusterScheduler类中的postStartHook方法,而该方法会通知ApplicationMaster已经初始化好了SparkContext
          
              为何要等待SparkContext初始化完成?
                 CoarseGrainedExecutorBackend启动后需要向CoarseGrainedSchedulerBackend注册
  7、当SparkContext初始化完成的时候,通过amClient向ResourceManager注册ApplicationMaster    
  8、分配并启动Executeors。在启动Executeors之前,先要通过yarnAllocator获取到numExecutors个Container,然后在Container中启动Executeors。如果在启动Executeors的过程中失败的次数达到了maxNumExecutorFailures的次数,
  那么这个Application将失败,将Application Status标明为FAILED,并将关闭SparkContext。其实,启动Executeors是通过ExecutorRunnable实现的,而ExecutorRunnable内部是启动CoarseGrainedExecutorBackend的,
      CoarseGrainedExecutorBackend启动后会向SchedulerBackend注册。(resourceManager是如何决定该分配几个container?  在shell提交时跟参数 默认启动两个executor)
  9、最后,Task将在CoarseGrainedExecutorBackend里面运行,然后运行状况会通过Akka通知CoarseGrainedScheduler,直到作业运行完成。



      Client模式:

      客户端操作:

       1、通过SparkSubmit类的launch的函数直接调用作业的main函数(通过反射机制实现),如果是集群模式就会调用Client的main函数。
   2、而应用程序的main函数一定都有个SparkContent,并对其进行初始化;
   3、在SparkContent初始化中将会依次做如下的事情:设置相关的配置、注册MapOutputTracker、BlockManagerMaster、BlockManager,创建taskScheduler和dagScheduler;其中比较重要的是创建taskScheduler和dagScheduler。在创建taskScheduler的时候会根据我们传进来的master来选择Scheduler和SchedulerBackend。由于我们选择的是yarn-client模式,程序会选择YarnClientClusterScheduler和YarnClientSchedulerBackend,并将YarnClientSchedulerBackend的实例初始化YarnClientClusterScheduler,上面两个实例的获取都是通过反射机制实现的, YarnClientSchedulerBackend类是CoarseGrainedSchedulerBackend类的子类,YarnClientClusterScheduler是TaskSchedulerImpl的子类,仅仅重写了TaskSchedulerImpl中的getRackForHost方法。
   4、初始化完taskScheduler后,将创建dagScheduler,然后通过taskScheduler.start()启动taskScheduler,而在taskScheduler启动的过程中也会调用SchedulerBackend的start方法。在SchedulerBackend启动的过程中将会初始化一些参数,封装在ClientArguments中,并将封装好的ClientArguments传进Client类中,并client.submitApplication()方法获取Application ID。

   
       Yarn操作:
     
1、运行ApplicationMaster的run方法(runExecutorLauncher);
     2、无需等待SparkContext初始化完成(因为 YarnClientClusterScheduler已启动完成), 向sparkYarnAM注册该Application
    3、分配Executors,这里面的分配逻辑和yarn-cluster里面类似,就不再说了。
    4、最后,Task将在CoarseGrainedExecutorBackend里面运行,然后运行状况会通过Akka通知CoarseGrainedScheduler,直到作业运行完成。
    5、在作业运行的时候,YarnClientSchedulerBackend会每隔1秒通过client获取到作业的运行状况,并打印出相应的运行信息,当Application的状态是FINISHED、FAILED和KILLED中的一种,那么程序将退出等待。
    6、最后有个线程会再次确认Application的状态,当Application的状态是FINISHED、FAILED和KILLED中的一种,程序就运行完成,并停止SparkContext。整个过程就结束了。
 
 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29754888/viewspace-1815323/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29754888/viewspace-1815323/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值