spark源码-任务提交流程之-4-container中启动executor

1.概述

​ 在spark源码-任务提交流程之ApplicationMaster分析到,在AM中,做了3件事情:

​ 1、创建driver线程并启动driver线程,执行用户类定义的main方法;

​ 2、向RM注册AM;

​ 3、AM向RM申请资源,根据资源(containers)匹配规则选择可用资源,并在分配的container中启动executor;

​ 下面分析在container中启动executor的情况;

2.入口

​ 此段代码在AM向RM申请资源的时候调用;

​ AM向RM申请资源,RM返回资源列表containers,由分配器调用此段代码,实现资源筛选及executor启动;详情见spark源码-任务提交流程之ApplicationMaster:AM向RM申请资源

private[yarn] class YarnAllocator(
    driverUrl: String,
    driverRef: RpcEndpointRef,
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    amClient: AMRMClient[ContainerRequest],
    appAttemptId: ApplicationAttemptId,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource],
    resolver: SparkRackResolver,
    clock: Clock = new SystemClock)
  extends Logging {
  
  def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
    //可用容器列表
    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)
		
    //......不相关代码省略
    
		//在分配的containers中启动executors
    runAllocatedContainers(containersToUse)

    //......不相关代码省略
  }
}

3.runAllocatedContainers

​ 当前方法会在分配的containers中启动executors;

​ 轮询可用容器,每个容器对应启动一个线程,在线程中启动executor;

private[yarn] class YarnAllocator(
    driverUrl: String,
    driverRef: RpcEndpointRef,
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    amClient: AMRMClient[ContainerRequest],
    appAttemptId: ApplicationAttemptId,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource],
    resolver: SparkRackResolver,
    clock: Clock = new SystemClock)
  extends Logging {
  
  private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = {
    //轮询可用容器
    for (container <- containersToUse) {
      executorIdCounter += 1
      val executorHostname = container.getNodeId.getHost
      val containerId = container.getId
      val executorId = executorIdCounter.toString
      assert(container.getResource.getMemory >= resource.getMemory)
      logInfo(s"Launching container $containerId on host $executorHostname " +
        s"for executor with ID $executorId")

      def updateInternalState(): Unit = synchronized {
        runningExecutors.add(executorId)
        numExecutorsStarting.decrementAndGet()
        executorIdToContainer(executorId) = container
        containerIdToExecutorId(container.getId) = executorId

        val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
          new HashSet[ContainerId])
        containerSet += containerId
        allocatedContainerToHostMap.put(containerId, executorHostname)
      }

      //判断是否可以新启动executor:正在运行的executor数 < 目标executor数
      if (runningExecutors.size() < targetNumExecutors) {
        //记录启动的executor数
        numExecutorsStarting.incrementAndGet()
        if (launchContainers) {
          //线程池中启用一个新的线程,并执行新线程的run方法
          launcherPool.execute(new Runnable {
            override def run(): Unit = {
              try {
                //实例化ExecutorRunnable对象并执行对象的run方法,在run方法中启动executor
                new ExecutorRunnable(
                  Some(container),
                  conf,
                  sparkConf,
                  driverUrl,
                  executorId,
                  executorHostname,
                  executorMemory,
                  executorCores,
                  appAttemptId.getApplicationId.toString,
                  securityMgr,
                  localResources
                ).run()
                //更新内置状态
                updateInternalState()
              } catch {
                case e: Throwable =>
                  numExecutorsStarting.decrementAndGet()
                  if (NonFatal(e)) {
                    logError(s"Failed to launch executor $executorId on container $containerId", e)
                    // Assigned container should be released immediately
                    // to avoid unnecessary resource occupation.
                    amClient.releaseAssignedContainer(containerId)
                  } else {
                    throw e
                  }
              }
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      } else {
        logInfo(("Skip launching executorRunnable as running executors count: %d " +
          "reached target executors count: %d.").format(
          runningExecutors.size, targetNumExecutors))
      }
    }
  }
}

4.ExecutorRunnable.run

private[yarn] class ExecutorRunnable(
    container: Option[Container],
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    masterAddress: String,
    executorId: String,
    hostname: String,
    executorMemory: Int,
    executorCores: Int,
    appId: String,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource]) extends Logging {

  var rpc: YarnRPC = YarnRPC.create(conf)
  var nmClient: NMClient = _

  def run(): Unit = {
    logDebug("Starting Executor Container")
    //NameNode创建、初始化、启动一条龙
    nmClient = NMClient.createNMClient()
    nmClient.init(conf)
    nmClient.start()
    //启动容器
    startContainer()
  }
}

4.1.startContainer启动容器

​ 在当前方法中,主要做了如下3件事请:

​ 1、准备容器中启动executor的环境;

​ 2、封装容器中启动executor的命令;

​ 3、NameNode中启动Executor;

private[yarn] class ExecutorRunnable(
    container: Option[Container],
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    masterAddress: String,
    executorId: String,
    hostname: String,
    executorMemory: Int,
    executorCores: Int,
    appId: String,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource]) extends Logging {

  def startContainer(): java.util.Map[String, ByteBuffer] = {
    //容器启动上下文
    val ctx = Records.newRecord(classOf[ContainerLaunchContext])
      .asInstanceOf[ContainerLaunchContext]
    //准备环境
    val env = prepareEnvironment().asJava

    ctx.setLocalResources(localResources.asJava)
    ctx.setEnvironment(env)

    val credentials = UserGroupInformation.getCurrentUser().getCredentials()
    val dob = new DataOutputBuffer()
    credentials.writeTokenStorageToStream(dob)
    ctx.setTokens(ByteBuffer.wrap(dob.getData()))

    //封装参数
    val commands = prepareCommand()

    ctx.setCommands(commands.asJava)
    ctx.setApplicationACLs(
      YarnSparkHadoopUtil.getApplicationAclsForYarn(securityMgr).asJava)

    // If external shuffle service is enabled, register with the Yarn shuffle service already
    // started on the NodeManager and, if authentication is enabled, provide it with our secret
    // key for fetching shuffle files later
    if (sparkConf.get(SHUFFLE_SERVICE_ENABLED)) {
      val secretString = securityMgr.getSecretKey()
      val secretBytes =
        if (secretString != null) {
          // This conversion must match how the YarnShuffleService decodes our secret
          JavaUtils.stringToBytes(secretString)
        } else {
          // Authentication is not enabled, so just provide dummy metadata
          ByteBuffer.allocate(0)
        }
      ctx.setServiceData(Collections.singletonMap("spark_shuffle", secretBytes))
    }

    // Send the start request to the ContainerManager
    try {
      //NM启动容器
      nmClient.startContainer(container.get, ctx)
    } catch {
      case ex: Exception =>
        throw new SparkException(s"Exception while starting container ${container.get.getId}" +
          s" on host $hostname", ex)
    }
  }
}

4.1.1.prepareEnvironment

​ 将系统属性spark开头的参数、sparkConf中executor执行环境相关参数、log相关参数封装到hashMap构建的缓存中;

private[yarn] class ExecutorRunnable(
    container: Option[Container],
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    masterAddress: String,
    executorId: String,
    hostname: String,
    executorMemory: Int,
    executorCores: Int,
    appId: String,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource]) extends Logging {

  private def prepareEnvironment(): HashMap[String, String] = {
    val env = new HashMap[String, String]()
    Client.populateClasspath(null, conf, sparkConf, env, sparkConf.get(EXECUTOR_CLASS_PATH))

    // http模式获取
    val yarnHttpPolicy = conf.get(
      YarnConfiguration.YARN_HTTP_POLICY_KEY,
      YarnConfiguration.YARN_HTTP_POLICY_DEFAULT
    )
    val httpScheme = if (yarnHttpPolicy == "HTTPS_ONLY") "https://" else "http://"

    //将系统属性中spark开头的参数封装到env缓存中;
    System.getenv().asScala.filterKeys(_.startsWith("SPARK"))
      .foreach { case (k, v) => env(k) = v }

    //将sparkConf中executor执行环境变量封装到env缓存中
    sparkConf.getExecutorEnv.foreach { case (key, value) =>
      if (key == Environment.CLASSPATH.name()) {
        // If the key of env variable is CLASSPATH, we assume it is a path and append it.
        // This is kept for backward compatibility and consistency with hadoop
        YarnSparkHadoopUtil.addPathToEnvironment(env, key, value)
      } else {
        // For other env variables, simply overwrite the value.
        env(key) = value
      }
    }

    // 设置log相关参数到env缓存中
    container.foreach { c =>
      sys.env.get("SPARK_USER").foreach { user =>
        val containerId = ConverterUtils.toString(c.getId)
        val address = c.getNodeHttpAddress
        val baseUrl = s"$httpScheme$address/node/containerlogs/$containerId/$user"

        env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
        env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
      }
    }

    env
  }
}

4.1.2.prepareCommand封装命令

​ 当前方法主要是将容器中启动executor的命令进行组装;

​ executor内存参数、外部类库路径、容器临时目录、日志目录、backend、driver相关信息、executor相关信息等;

​ 由参数设置可知,启动executor后,执行task的backend为org.apache.spark.executor.CoarseGrainedExecutorBackend,由/bin/java方式启动的一个backend进场;

private[yarn] class ExecutorRunnable(
    container: Option[Container],
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    masterAddress: String,
    executorId: String,
    hostname: String,
    executorMemory: Int,
    executorCores: Int,
    appId: String,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource]) extends Logging {

  private def prepareCommand(): List[String] = {
    // JVM命令缓存
    val javaOpts = ListBuffer[String]()

    // 将executor的内存参数添加到命令缓存中
    val executorMemoryString = executorMemory + "m"
    javaOpts += "-Xmx" + executorMemoryString

    // 解析sparkConf中spark.executor.extraLibraryPath(外部类库路径)参数,添加到命令缓存中
    sparkConf.get(EXECUTOR_JAVA_OPTIONS).foreach { opts =>
      val subsOpt = Utils.substituteAppNExecIds(opts, appId, executorId)
      javaOpts ++= Utils.splitCommandString(subsOpt).map(YarnSparkHadoopUtil.escapeForShell)
    }

    // 加工命令前缀
    val prefixEnv = sparkConf.get(EXECUTOR_LIBRARY_PATH).map { libPath =>
      Client.createLibraryPathPrefix(libPath, sparkConf)
    }

    //添加容器临时目录
    javaOpts += "-Djava.io.tmpdir=" +
      new Path(Environment.PWD.$$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR)

    // Certain configs need to be passed here because they are needed before the Executor
    // registers with the Scheduler and transfers the spark configs. Since the Executor backend
    // uses RPC to connect to the scheduler, the RPC settings are needed as well as the
    // authentication settings.
    sparkConf.getAll
      .filter { case (k, v) => SparkConf.isExecutorStartupConf(k) }
      .foreach { case (k, v) => javaOpts += YarnSparkHadoopUtil.escapeForShell(s"-D$k=$v") }

    

    //将日志目录设置到命令中
    javaOpts += ("-Dspark.yarn.app.container.log.dir=" + ApplicationConstants.LOG_DIR_EXPANSION_VAR)

    //加工用户类路径
    val userClassPath = Client.getUserClasspath(sparkConf).flatMap { uri =>
      val absPath =
        if (new File(uri.getPath()).isAbsolute()) {
          Client.getClusterPath(sparkConf, uri.getPath())
        } else {
          Client.buildPath(Environment.PWD.$(), uri.getPath())
        }
      Seq("--user-class-path", "file:" + absPath)
    }.toSeq

    YarnSparkHadoopUtil.addOutOfMemoryErrorArgument(javaOpts)
      
    //组装命令
    val commands = prefixEnv ++
      Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
      javaOpts ++
      Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",
        "--driver-url", masterAddress,
        "--executor-id", executorId,
        "--hostname", hostname,
        "--cores", executorCores.toString,
        "--app-id", appId) ++
      userClassPath ++
      Seq(
        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")

    // TODO: it would be nicer to just make sure there are no null commands here
    commands.map(s => if (s == null) "null" else s).toList
  }
  
}

4.1.3.nmClient.startContainer启动容器

public class NMClientImpl extends NMClient {
    protected ConcurrentMap<ContainerId, NMClientImpl.StartedContainer> startedContainers = new ConcurrentHashMap();
    
	public Map<String, ByteBuffer> startContainer(Container container, ContainerLaunchContext containerLaunchContext) throws YarnException, IOException {
        //启动容器
        NMClientImpl.StartedContainer startingContainer = new NMClientImpl.StartedContainer(container.getId(), container.getNodeId());
        
        synchronized(startingContainer) {
            //将启动的容器添加到启动完成容器列表中
            this.addStartingContainer(startingContainer);
            ContainerManagementProtocolProxyData proxy = null;

            Map allServiceResponse;
            try {
                //实例化容器管理器代理,有代理来启动容器
                proxy = this.cmProxy.getProxy(container.getNodeId().toString(), container.getId());
                //封装请求
                StartContainerRequest scRequest = StartContainerRequest.newInstance(containerLaunchContext, container.getContainerToken());
                List<StartContainerRequest> list = new ArrayList();
                list.add(scRequest);
                StartContainersRequest allRequests = StartContainersRequest.newInstance(list);
                //启动容器,执行executor启动命令
                StartContainersResponse response = proxy.getContainerManagementProtocol().startContainers(allRequests);
                if (response.getFailedRequests() != null && response.getFailedRequests().containsKey(container.getId())) {
                    Throwable t = ((SerializedException)response.getFailedRequests().get(container.getId())).deSerialize();
                    this.parseAndThrowException(t);
                }

                allServiceResponse = response.getAllServicesMetaData();
                startingContainer.state = ContainerState.RUNNING;
            } catch (YarnException var19) {
                startingContainer.state = ContainerState.COMPLETE;
                this.startedContainers.remove(startingContainer.containerId);
                throw var19;
            } catch (IOException var20) {
                startingContainer.state = ContainerState.COMPLETE;
                this.startedContainers.remove(startingContainer.containerId);
                throw var20;
            } catch (Throwable var21) {
                startingContainer.state = ContainerState.COMPLETE;
                this.startedContainers.remove(startingContainer.containerId);
                throw RPCUtil.getRemoteException(var21);
            } finally {
                if (proxy != null) {
                    this.cmProxy.mayBeCloseProxy(proxy);
                }

            }

            return allServiceResponse;
        }
    }
}
4.1.3.1.StartedContainer类

​ 该类是NMClientImpl类的内部类;

​ 封装启动完成的容器信息;

protected static class StartedContainer {
        private ContainerId containerId;
        private NodeId nodeId;
        private ContainerState state;

        public StartedContainer(ContainerId containerId, NodeId nodeId) {
            this.containerId = containerId;
            this.nodeId = nodeId;
            this.state = ContainerState.NEW;
        }

        public ContainerId getContainerId() {
            return this.containerId;
        }

        public NodeId getNodeId() {
            return this.nodeId;
        }
    }

5.总结

​ 在AM中先RM申请资源,RM返回资源给AM后,由AM创建资源分配器,由资源分配器根据分配的资源启动executor;

​ executor启动过程中:

​ 将系统属性spark开头的参数、sparkConf中executor执行环境相关参数、log相关参数封装到hashMap构建的缓存中;

​ 根据executor内存参数、外部类库路径、容器临时目录、日志目录、backend、driver相关信息、executor相关信息等构建executor启动命令,其中设置以/bin/java方式启动一个org.apache.spark.executor.CoarseGrainedExecutorBackend进程,后续由该进程执行task任务;

​ 最后由NM启动executor;

6.参考资料

spark源码-任务提交流程之ApplicationMaster

Spark源码——Spark on YARN Executor执行Task的过程

### 回答1: Spark submit任务提交是指将用户编写的Spark应用程序提交到集群中运行的过程。在Spark中,用户可以通过命令行工具或API方式提交任务Spark submit命令的基本语法如下: ``` ./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ <application-jar> \ [application-arguments] ``` 其中,`--class`指定应用程序的主类,`--master`指定集群的URL,`--deploy-mode`指定应用程序的部署模式,`--conf`指定应用程序的配置参数,`<application-jar>`指定应用程序的jar包路径,`[application-arguments]`指定应用程序的命令行参数。 在Spark中,任务提交的过程主要包括以下几个步骤: 1. 创建SparkConf对象,设置应用程序的配置参数; 2. 创建SparkContext对象,连接到集群; 3. 加载应用程序的主类; 4. 运行应用程序的main方法; 5. 关闭SparkContext对象,释放资源。 在任务提交的过程中,Spark会自动将应用程序的jar包和依赖的库文件上传到集群中,并在集群中启动Executor进程来执行任务任务执行完成后,Spark会将结果返回给Driver进程,并将Executor进程关闭。 总之,Spark submit任务提交Spark应用程序运行的关键步骤,掌握任务提交的原理和方法对于开发和调试Spark应用程序非常重要。 ### 回答2: Spark 作为一款强大的分布式计算框架,提供了很多提交任务的方式,其中最常用的方法就是通过 spark-submit 命令来提交任务spark-submit 是 Spark 提供的一个命令行工具,用于在集群上提交 Spark 应用程序,并在集群上运行。 spark-submit 命令的语法如下: ``` ./bin/spark-submit [options] <app jar | python file> [app arguments] ``` 其中,[options] 为可选的参数,包括了执行模式、执行资源等等,<app jar | python file> 为提交的应用程序的文件路径,[app arguments] 为应用程序运行时的参数。 spark-submit 命令会将应用程序的 jar 文件以及所有的依赖打包成一个 zip 文件,然后将 zip 文件提交到集群上运行。在运行时,Spark 会根据指定的主类(或者 Python 脚本文件)启动应用程序。 在提交任务时,可以通过设置一些参数来控制提交任务的方式。例如: ``` --master:指定该任务运行的模式,默认为 local 模式,可设置为 Spark Standalone、YARN、Mesos、Kubernetes 等模式。 --deploy-mode:指定该任务的部署模式,默认为 client,表示该应用程序会在提交任务的机器上运行,可设置为 cluster,表示该应用程序会在集群中一台节点上运行。 --num-executors:指定该任务需要的 executor 数量,每个 executor 会占用一个计算节点,因此需要根据集群配置与任务要求确定该参数的值。 --executor-memory:指定每个 executor 可用的内存量,默认为 1g,可以适当调整该值以达到更好的任务运行效果。 ``` 此外,还有一些参数可以用来指定应用程序运行时需要传递的参数: ``` --conf:指定应用程序运行时需要的一些配置参数,比如 input 文件路径等。 --class:指定要运行的类名或 Python 脚本文件名。 --jars:指定需要使用的 Jar 包文件路径。 --py-files:指定要打包的 python 脚本,通常用于将依赖的 python 包打包成 zip 文件上传。 ``` 总之,spark-submit 是 Spark 提交任务最常用的方法之一,通过该命令能够方便地将应用程序提交到集群上运行。在提交任务时,需要根据实际场景调整一些参数,以达到更好的任务运行效果。 ### 回答3: Spark是一个高效的分布式计算框架,其中比较重要的组成部分就是任务提交。在Spark中,任务提交主要通过spark-submit来实现。本文将从两方面,即任务提交之前的准备工作和任务提交过程中的细节进行探讨。 一、任务提交之前的准备工作 1.环境配置 在执行任务提交前,需要确保所在的计算机环境已经配置好了SparkSpark的环境配置主要包括JAVA环境、Spark的二进制包、PATH路径配置、SPARK_HOME环境变量配置等。 2.编写代码 Spark任务提交是基于代码的,因此在任务提交前,需要编写好自己的代码,并上传到集群中的某个路径下,以便后续提交任务时调用。 3.参数设置 在任务提交时,需要对一些关键的参数进行设置。例如,任务名、任务对应的代码路径、任务需要的资源、任务需要的worker节点等。 二、任务提交过程中的细节 1.启动Driver 当使用spark-submit命令提交任务时,Spark启动一个Driver来运行用户的代码。这个Driver通常需要连接到Spark集群来执行任务。 2.上传文件 Spark支持在任务提交时上传所需的文件。这些文件可以用于设置Spark的环境变量、为任务提供数据源等。 3.资源需求 Spark任务执行依赖于一定的资源。每个任务可以指定自己的资源需求,例如需要多少内存、需要多少CPU等。这些资源需求通常与提交任务时需要的worker节点数量有关系。 4.监控和日志 在任务执行的过程中,Spark会收集任务的监控数据和日志信息。这些数据可用于后续的调试和性能优化。 总之,在Spark任务提交过程中,需要充分考虑任务的资源需求和监控日志信息的收集,以便更好地完成任务和优化Spark运行效率。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值