Spark学习之4:SparkContext执行过程

本文主要讲述standalone模式下,应用程序启动后,创建SparkConf、SparkContext的执行过程。
在应用程序中,一般会先创建SparkConf对象,并作相应的参数设置,然后用该对象来初始化SparkContext对象。

1. SparkConf

SparkConf管理Spark应用程序的配置,参数以key-value形式表示。
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
  import SparkConf._
  /** Create a SparkConf that loads defaults from system properties and the classpath */
  def this() = this(true)
  private val settings = new ConcurrentHashMap[String, String]()
  if (loadDefaults) {
    // Load any spark.* system properties
    for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
      set(key, value)
    }
  }
如果loadDefaults=true,则会通过System.getProperties获取SparkSubmit设置的Spark属性参数。
所有的参数以key-value形式存储在ConcurrentHashMap对象中。

2. SparkContext

2.1 创建SparkEnv

  private[spark] def createSparkEnv(
      conf: SparkConf,
      isLocal: Boolean,
      listenerBus: LiveListenerBus): SparkEnv = {
    SparkEnv.createDriverEnv(conf, isLocal, listenerBus)
  }
  private[spark] val env = createSparkEnv(conf, isLocal, listenerBus)
  SparkEnv.set(env)

2.2 创建TaskScheduler

  private[spark] var (schedulerBackend, taskScheduler) =
    SparkContext.createTaskScheduler(this, master)

2.2.1 SparkContext.createTaskScheduler

在standalone摸下执行:
  private def createTaskScheduler(
      sc: SparkContext,
      master: String): (SchedulerBackend, TaskScheduler) = {
      ...
    master match {
      ...
      case SPARK_REGEX(sparkUrl) =>
        val scheduler = new TaskSchedulerImpl(sc)
        val masterUrls = sparkUrl.split(",").map("spark://" + _)
        val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
        scheduler.initialize(backend)
        (backend, scheduler)
      ...
  }
(1)创建TaskSchedulerImpl对象,该类继承trait TaskScheduler;
(2)创建SparkDeploySchedulerBackend对象,继承结构:


(3)初始化 TaskSchedulerImpl, 将backend设置为 SparkDeploySchedulerBackend

2.3 创建DAGScheduler

  @volatile private[spark] var dagScheduler: DAGScheduler = _
  try {
    dagScheduler = new DAGScheduler(this)
  } catch {
    ...
  }

2.4 启动TaskScheduler

  // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
  // constructor
  taskScheduler.start()
实际调用的方法是:TaskSchedulerImpl.start。
从此方法发起调用的流程:


图中涉及多个Actor之间的交互:
(1)DriverActor;
(2)ClientActor;
(3)Master;
(4)Worker;
(5)CoarseGrainedExecutorBackend。
所有的交互有ClientActor发起,ClientActor由AppClient创建。

2.4.1 SparkDeploySchedulerBackend.start

  override def start() {
    super.start()
    // The endpoint for executors to talk to us
    val driverUrl = AkkaUtils.address(
      AkkaUtils.protocol(actorSystem),
      SparkEnv.driverActorSystemName,
      conf.get("spark.driver.host"),
      conf.get("spark.driver.port"),
      CoarseGrainedSchedulerBackend.ACTOR_NAME)
    val args = Seq(
      "--driver-url", driverUrl,
      "--executor-id", "{{EXECUTOR_ID}}",
      "--hostname", "{{HOSTNAME}}",
      "--cores", "{{CORES}}",
      "--app-id", "{{APP_ID}}",
      "--worker-url", "{{WORKER_URL}}")
    val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
      .map(Utils.splitCommandString).getOrElse(Seq.empty)
    val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")
      .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
    val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")
      .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
    // When testing, expose the parent class path to the child. This is processed by
    // compute-classpath.{cmd,sh} and makes all needed jars available to child processes
    // when the assembly is built with the "*-provided" profiles enabled.
    val testingClassPath =
      if (sys.props.contains("spark.testing")) {
        sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
      } else {
        Nil
      }
    // Start executors with a few necessary configs for registering with the scheduler
    val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
    val javaOpts = sparkJavaOpts ++ extraJavaOpts
    val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
      args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
    val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
    val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
      appUIAddress, sc.eventLogDir, sc.eventLogCodec)
    client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
    client.start()
    waitForRegistration()
  }
方法职责:
(1)调用超类的start方法,即CoarseGrainedSchedulerBackend.start,来创建DriverActor;
(2)组织启动Executor需要的参数,并创建ApplicationDescription对象
(3)创建AppClient对象,并启动。

2.4.2 AppClient.start

  def start() {
    // Just launch an actor; it will call back into the listener.
    actor = actorSystem.actorOf(Props(new ClientActor))
  }
创建ClientActor。
ClientActor的preStart方法中将向Master发起应用程序注册流程。

2.5 Driver端BlockManager初始化

  // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
  // constructor
  taskScheduler.start()
  val applicationId: String = taskScheduler.applicationId()
  conf.set("spark.app.id", applicationId)
  env.blockManager.initialize(applicationId)
(1)taskScheduler.start(),即2.4节启动TaskScheduler;
(2)获取应用程序id,由Master分配;
(3)初始化BlockManager。

2.6 Executor端BlockManager初始化

DriverActor收到 CoarseGrainedExecutorBackend发送的RegisterExecutor消息后,正常返回一个RegisteredExecutor消息。 CoarseGrainedExecutorBackend将对该消息进行处理。
  override def receiveWithLogging = {
    case RegisteredExecutor =>
      logInfo("Successfully registered with driver")
      val (hostname, _) = Utils.parseHostPort(hostPort)
      executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
创将Executor对象,在主构造函数中发起BlockManager的初始化:
  if (!isLocal) {
    env.metricsSystem.registerSource(executorSource)
    env.blockManager.initialize(conf.getAppId)
  }
conf的AppId是从Master传递给Worker,由CoarseGrainedExecutorBackend单例对象设置到SparkConf对象中。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值