SparkContext初始化是Driver应用程序提交执行的前提,这里以local模式来了解SparkContext的初始化过程。
本文以
val conf = new SparkConf().setAppName("mytest").setMaster("local[2]")
val sc = new SparkContext(conf)
为例,打开debug模式,然后进行分析。
一、SparkConf概述
SparkContext需要传入SparkConf来进行初始化,SparkConf用于维护Spark的配置属性。官方解释:
Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.
简单看下SparkConf的源码:
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
import SparkConf._
/** Create a SparkConf that loads defaults from system properties and the classpath */
def this() = this(true)
private val settings = new ConcurrentHashMap[String, String]()
if (loadDefaults) {
// Load any spark.* system properties
for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
set(key, value)
}
}
/** Set a configuration variable. */
def set(key: String, value: String): SparkConf = {
if (key == null) {
throw new NullPointerException("null key")
}
if (value == null) {
throw new NullPointerException("null value for " + key)
}
logDeprecationWarning(key)
settings.put(key, value)
this
}
/**
* The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
* run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
*/
def setMaster(master: String): SparkConf = {
set("spark.master", master)
}
/** Set a name for your application. Shown in the Spark web UI. */
def setAppName(name: String): SparkConf = {
set("spark.app.name", name)
}
//省略
}
SparkConf内部使用ConcurrentHashMap来维护所有的配置。由于SparkConf提供的setter方法返回的是this,也就是
一个SparkConf对象,所有它允许使用链式来设置属性。
如:new SparkConf().setAppName("mytest").setMaster("local[2]")
二、SparkContext的初始化
SparkContext的初始化步骤主要包含以下几步:
1)创建JobProgressListener
2)创建SparkEnv
3)创建
1. 复制SparkConf配置信息,然后校验或者添加新的配置信息
SparkContext的住构造器参数为SparkConf:
class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {
// The call site where this SparkContext was constructed.
private val creationSite: CallSite = Utils.getCallSite()
// If true, log warnings instead of throwing exceptions when multiple SparkContexts are active
private val allowMultipleContexts: Boolean =
config.getBoolean("spark.driver.allowMultipleContexts", false)
// In order to prevent multiple SparkContexts from being active at the same time, mark this
// context as having started construction.
// NOTE: this must be placed at the beginning of the SparkContext constructor.
SparkContext.markPartiallyConstructed(this, allowMultipleContexts)
//省略
}
getCallSite方法会得到一个CallSite对象,改对象存储了线程栈中最靠近栈顶的用户类及最靠近栈底的Scala或者Spark核心类信息。SparkContext默认只有一个实