SPARK YARN cCLUSTER模式及CLINET模式的启动过程

Cluster模式

ApplicationMaster的启动

通过submit命令启动后

${SPARK_HOME}/bin/spark-submit --master yarn-cluster --class com.bigdata.WordCount --executor-memory 2G \
--num-executors 4  ${SPARK_HOME}/wordcount-1.0-SNAPSHOT.jar hdfs://spark-master:9000 /temp/inputdir /temp/outputdir

实际上启动的是org.apache.spark.deploy.SparkSubmit类,

在prepareSubmitEnviroment()这个函数中,有一段取得YARNCLUSTER启动类的代码,

private[deploy] def prepareSubmitEnvironment(
      args: SparkSubmitArguments,
      conf: Option[HadoopConfiguration] = None)
      : (Seq[String], Seq[String], SparkConf, String) = {


    // In yarn-cluster mode, use yarn.Client as a wrapper around the user class
    if (isYarnCluster) {
      childMainClass = YARN_CLUSTER_SUBMIT_CLASS
     
    }


}


object SparkSubmit extends CommandLineUtils with Logging {

  private val CLASS_NOT_FOUND_EXIT_STATUS = 101
  // Following constants are visible for testing.
  private[deploy] val YARN_CLUSTER_SUBMIT_CLASS =
    "org.apache.spark.deploy.yarn.YarnClusterApplication"
  private[deploy] val REST_CLUSTER_SUBMIT_CLASS = classOf[RestSubmissionClientApp].getName()
  private[deploy] val STANDALONE_CLUSTER_SUBMIT_CLASS = classOf[ClientApp].getName()
  private[deploy] val KUBERNETES_CLUSTER_SUBMIT_CLASS =
    "org.apache.spark.deploy.k8s.submit.KubernetesClientApplication"
}

从中可以看出,会启动org.apache.spark.deploy.yarn.YarnClusterApplication这个类,而这个类在org.apache.spark.deploy.yarn.Client中,Client会通过向YARN提交应用程序,YARN会选择其中一个NodeManager启动一个container,container中再用ApplicationMasterLaunch类启动or.apache.spark.delploy.yarn.ApplicationMaster,ApplicationMaster最终会直接运行你的应用程序。

Excutor的启动

在ApplicationMaster启动后,会调用runDriver函数,里面有createAllocator函数,再会调用

new ExecutorRunnable,ExecutorRunnable里面会组成

 

private def prepareCommand(): List[String] = {
YarnSparkHadoopUtil.addOutOfMemoryErrorArgument(javaOpts)
    val commands = prefixEnv ++
      Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
      javaOpts ++
      Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",
        "--driver-url", masterAddress,
        "--executor-id", executorId,
        "--hostname", hostname,
        "--cores", executorCores.toString,
        "--app-id", appId) ++
      userClassPath ++
      Seq(
        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")

    // TODO: it would be nicer to just make sure there are no null commands here
    commands.map(s => if (s == null) "null" else s).toList
}

这样的命令,然后

通过nmClient.startContainer(container.get, ctx:ContainerLaunchContext),v发送RPC请求给Container,Container再执行命令,启动CoarseGrainedExecutorBackend,真正的executor便被启动了

 

YARN client模式

1 直接启动driver程序

2 driver程序在启动SarkContext的时候,通过SPI的方式讯启动

org.apache.spark.scheduler.cluster.YarnClusterManager

然后调用createSchedulerBackend

override def createSchedulerBackend(sc: SparkContext,
      masterURL: String,
      scheduler: TaskScheduler): SchedulerBackend = {
    sc.deployMode match {
      case "cluster" =>
        new YarnClusterSchedulerBackend(scheduler.asInstanceOf[TaskSchedulerImpl], sc)
      case "client" =>
        new YarnClientSchedulerBackend(scheduler.asInstanceOf[TaskSchedulerImpl], sc)
      case  _ =>
        throw new SparkException(s"Unknown deploy mode '${sc.deployMode}' for Yarn")
    }
  }

里面的YarnClientSchedulerBackend会调用 org.apache.spark.deploy.yarn.Client,负责资源的申请和管理

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值