Spark源码解读之SparkContext初始化

最新推荐文章于 2022-07-02 11:09:38 发布

Wang_AI

最新推荐文章于 2022-07-02 11:09:38 发布

阅读量8.5k

点赞数 2

分类专栏： spark 文章标签： Spark SparkContext

本文链接：https://blog.csdn.net/Xw_Classmate/article/details/53408245

版权

本文深入解析SparkContext的初始化过程，包括SparkConf配置信息的复制和校验、SparkEnv的创建、MetadataCleaner的建立、SparkStatusTracker的初始化，以及TaskScheduler的构建等关键步骤。通过理解这些细节，可以更好地掌握Spark应用程序的执行前提和内部工作机制。

摘要由CSDN通过智能技术生成

SparkContext初始化是Driver应用程序提交执行的前提，这里以local模式来了解SparkContext的初始化过程。

本文以

val conf = new SparkConf().setAppName("mytest").setMaster("local[2]")

val sc = new SparkContext(conf)

为例，打开debug模式，然后进行分析。

一、SparkConf概述

SparkContext需要传入SparkConf来进行初始化，SparkConf用于维护Spark的配置属性。官方解释：

Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.

简单看下SparkConf的源码：

class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {

  import SparkConf._

  /** Create a SparkConf that loads defaults from system properties and the classpath */
  def this() = this(true)

  private val settings = new ConcurrentHashMap[String, String]()

  if (loadDefaults) {
    // Load any spark.* system properties
    for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
      set(key, value)
    }
  }

  /** Set a configuration variable. */
  def set(key: String, value: String): SparkConf = {
    if (key == null) {
      throw new NullPointerException("null key")
    }
    if (value == null) {
      throw new NullPointerException("null value for " + key)
    }
    logDeprecationWarning(key)
    settings.put(key, value)
    this
  }

  /**
   * The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
   * run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
   */
  def setMaster(master: String): SparkConf = {
    set("spark.master", master)
  }

  /** Set a name for your application. Shown in the Spark web UI. */
  def setAppName(name: String): SparkConf = {
    set("spark.app.name", name)
  }
//省略
}

SparkConf内部使用ConcurrentHashMap来维护所有的配置。由于SparkConf提供的setter方法返回的是this，也就是

一个SparkConf对象，所有它允许使用链式来设置属性。

如：new SparkConf().setAppName("mytest").setMaster("local[2]")

二、SparkContext的初始化

SparkContext的初始化步骤主要包含以下几步：

1）创建JobProgressListener

2）创建SparkEnv

3）创建

1. 复制SparkConf配置信息，然后校验或者添加新的配置信息

SparkContext的住构造器参数为SparkConf：

class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {

  // The call site where this SparkContext was constructed.
  private val creationSite: CallSite = Utils.getCallSite()

  // If true, log warnings instead of throwing exceptions when multiple SparkContexts are active
  private val allowMultipleContexts: Boolean =
    config.getBoolean("spark.driver.allowMultipleContexts", false)

  // In order to prevent multiple SparkContexts from being active at the same time, mark this
  // context as having started construction.
  // NOTE: this must be placed at the beginning of the SparkContext constructor.
  SparkContext.markPartiallyConstructed(this, allowMultipleContexts)
//省略
}

getCallSite方法会得到一个CallSite对象，改对象存储了线程栈中最靠近栈顶的用户类及最靠近栈底的Scala或者Spark核心类信息。SparkContext默认只有一个实

最低0.47元/天解锁文章

Wang_AI

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
Spark源码解读之SparkContext初始化

SparkContext初始化是Driver应用程序提交执行的前提，这里以local模式来了解SparkContext的初始化过程。本文以val conf = new SparkConf().setAppName("mytest").setMaster("local[2]")val sc = new SparkContext(conf)为例，打开debug模式，然后进行分析。
复制链接

扫一扫