SparkContext 的构建的过程
SparkContext的初始化综述
- SparkContext是进行Spark应用开发的主要接口,是Spark上层应用与底层应用实现的中转站,即整个应用的上下文,控制应用的生命周期。
- SparkContext在初始化的过程中,主要涉及以下内容
- SparkEnv:线程级别的上下文,存储运行时的重要组件的引用。
- MapOutPutTracker:负责Shuffle元信息的存储
- BroadcastManager:负责广播变量的控制与元信息的存储。
- BlockManager: 负责存储管理,创建和查找块。
- MetricsSystem:监控运行时性能指标信息。
- SparkConf: 负责存储配置信息
- DAGScheduler: 根据作业(Job)构建基于Stage的DAG,并提交Stage给TaskScheduler
- TaskScheduler: 将任务(Task)分发给Executor执行。
- SchedulerBackend:在TaskScheduler下层,用于对接不同的资源管理系统,SchedulerBackend是个接口
- WebUI
SparkContext的构造函数中最重要的传入参数是SparkConf
SparkContext是我们Spark程序在创建过程非常非常非常重要的一个步骤。
* 1)是我们Spark程序执行各种操作的主要的入口点。
* 2)在SparkContext中我们主要用来加载数据并创建RDD,连接Spark集群,同时我们还可以创建共享变量:累加器,广播变量
* sparkContext(conf.setMaster(“local”))
* 3)在创建SparkContext的时候,需要传入一个SparkConf参数,这个参数是我们对当前application所持有的配置信息的一个描述
* 4)最最重要的还是创建TaskScheduler和DAGScheduler
- 步骤1:根据初始化入参生成SparkConf,再根据SparkConf来创建SparkEnv。SparkEnv中主要包含以下关键性组件:BlockManager、MapOUtputTracker、ShuffleFetcher、ConnectionManager。
// Create the Spark execution environment (cache, map output tracker, etc)
// This function allows components created by SparkEnv to be mocked in unit tests:
private[spark] def createSparkEnv(
conf: SparkConf,
isLocal: Boolean,
listenerBus: LiveListenerBus): SparkEnv = {
SparkEnv.createDriverEnv(conf, isLocal, listenerBus)
}
private[spark] val env = createSparkEnv(conf, isLocal, listenerBus)
SparkEnv.set(env)
SparkEnv
@DeveloperApi
class SparkEnv (
val executorId: String,
val actorSystem: ActorSystem,
val serializer: Serializer,
val closureSerializer: Serializer,
val cacheManager: CacheManager, //用以计算中间计算结果
val mapOutputTracker: MapOutputTracker,//用来缓存MapStatus信息,并提供从MapOutput获取信息的功能
val shuffleManager: ShuffleManager, //路由维护表
val broadcastManager: BroadcastManager, //广播
val blockTransferService: BlockTransferService,
val blockManager: BlockManager, //块管理
val securityManager: SecurityManager, //安全管理
val httpFileServer: HttpFileServer, //文件存储服务器
val sparkFilesDir: String, //文件存储目录
val metricsSystem: MetricsSystem, //测量
val shuffleMemoryManager: ShuffleMemoryManager,
val outputCommitCoordinator: OutputCommitCoordinator,
val conf: SparkConf) extends Logging { //配置文件
- 步骤2:创建TaskScheduler,根据Spark的运行模式来选择相应的SchedulerBackend,同时启动TaskScheduler,这一步至关重要。
生成TaskScheduler
/**
* SparkContext.createTaskScheduler这个方法是在我们SparkContext构造器执行之后就会来执行
* 需要传递两个参数(SparkContext对象,spark.master[相当于sparkconf.setMaster])
*/
private[spark] var (schedulerBackend, taskScheduler) = SparkContext.createTaskScheduler(this, master)
private val heartbeatReceiver = env.actorSystem.actorOf(
Props(new HeartbeatReceiver(taskScheduler)), "HeartbeatReceiver")
@volatile private[spark] var dagScheduler: DAGScheduler = _
try {
dagScheduler = new DAGScheduler(this)
} catch {
case e: Exception => {
try {
stop()
} finally {
throw new SparkException("Error while constructing DAGScheduler", e)
}
}
}
// start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
// constructor
taskScheduler.start()
createTaskScheduler最为关键的一点就是根据Master环境变量来判断Spark当前的部署方式,进而生成相应的SchedulerBackend的不同子类。创建的SchedulerBackend放置到TaskScheduler中,在后续的Task分发过程中扮演重要作用。
/**
* createTaskScheduler基于master url地址创建TaskScheduler,SchedulerBackend
* (SchedulerBackend, TaskScheduler)
*/
/**
* Create a task scheduler based on a given master URL.
* Return a 2-tuple of the scheduler backend and the task scheduler.
*/
private def createTaskScheduler(
sc: SparkContext,
master: String): (SchedulerBackend, TaskScheduler) = {
/**
* 这里就设置了我们的Spark程序的调度运行方式
* local:spark程序在本地创建SparkContext,在本地运行,而且单线程执行spark application
* local[N]/local[*]
* local[N]--》spark程序在本地创建SparkContext,在本地运行,而且开启N线程执行spark application
* local[*]--》spark程序在本地创建SparkContext,在本地运行,而且根据当前机器的CPU CORE的信息来创建相应数目的线程来运行我们spark application
****上述这些local模式,有一个问题,就是说当我们的rdd执行失败,不会重新计算并执行
* local[N, M]-->--》spark程序在本地创建SparkContext,在本地运行,而且开启N线程执行spark application
* 但是和上述local模式有不同,当spark程序执行失败的时候,可以有最多M次机会重新启动执行
****local-cluster[numSlaves, coresPerSlave, memoryPerSlave]-->伪分布的部署方式,numSlaves-->worker的数量
* coresPerSlave-->每个worker有几个core,memoryPerSlave-->每个worker分配内存
* spark://master:7077 -->
* yarn--->yarn-client/cluster
* mesos
*/
// Regular expression used for local[N] and local[*] master formats
val LOCAL_N_REGEX = """local\[([0-9]+|\*)\]""".r
// Regular expression for local[N, maxRetries], used in tests with failing tasks
val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+|\*)\s*,\s*([0-9]+)\]""".r
// Regular expression for simulating a Spark cluster of [N, cores, memory] locally
val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
// Regular expression for connecting to Spark deploy clusters
val SPARK_REGEX = """spark://(.*)""".r
// Regular expression for connection to Mesos cluster by mesos:// or zk:// url
val MESOS_REGEX = """(mesos|zk)://.*""".r
// Regular expression for connection to Simr cluster
val SIMR_REGEX = """simr://(.*)""".r
// When running locally, don't try to re-execute tasks on failure.
val MAX_LOCAL_TASK_FAILURES = 1
//spark.master
下面是根据以上的变量进行判断,确定当前Spark的部署方式
master match {
case "local" =>
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(scheduler, 1)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_N_REGEX(threads) =>
//获取当前可运行的线程(core)数量
def localCpuCount = Runtime.getRuntime.availableProcessors()
// local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
val threadCount = if (threads == "*") localCpuCount else threads.toInt
if (threadCount <= 0) {
throw new SparkException(s"Asked to run locally with $threadCount threads")
}
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(scheduler, threadCount)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
def localCpuCount = Runtime.getRuntime.availableProcessors()
// local[*, M] means the number of cores on the computer with M failures
// local[N, M] means exactly N threads with M failures
val threadCount = if (threads == "*") localCpuCount else threads.toInt
val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, isLocal = true)
val backend = new LocalBackend(scheduler, threadCount)
scheduler.initialize(backend)
(backend, scheduler)
case SPARK_REGEX(sparkUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
// Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang.
val memoryPerSlaveInt = memoryPerSlave.toInt
if (sc.executorMemory > memoryPerSlaveInt) {
throw new SparkException(
"Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format(
memoryPerSlaveInt, sc.executorMemory))
}
val scheduler = new TaskSchedulerImpl(sc)
val localCluster = new LocalSparkCluster(
numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt, sc.conf)
val masterUrls = localCluster.start()
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
backend.shutdownCallback = (backend: SparkDeploySchedulerBackend) => {
localCluster.stop()
}
(backend, scheduler)
case "yarn-standalone" | "yarn-cluster" =>
if (master == "yarn-standalone") {
logWarning(
"\"yarn-standalone\" is deprecated as of Spark 1.0. Use \"yarn-cluster\" instead.")
}
val scheduler = try {
val clazz = Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
val cons = clazz.getConstructor(classOf[SparkContext])
cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
} catch {
// TODO: Enumerate the exact reasons why it can fail
// But irrespective of it, it means we cannot proceed !
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
val backend = try {
val clazz =
Class.forName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
scheduler.initialize(backend)
(backend, scheduler)
case "yarn-client" =>
val scheduler = try {
val clazz =
Class.forName("org.apache.spark.scheduler.cluster.YarnScheduler")
val cons = clazz.getConstructor(classOf[SparkContext])
cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
val backend = try {
val clazz =
Class.forName("org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend")
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
scheduler.initialize(backend)
(backend, scheduler)
case mesosUrl @ MESOS_REGEX(_) =>
MesosNativeLibrary.load()
val scheduler = new TaskSchedulerImpl(sc)
val coarseGrained = sc.conf.getBoolean("spark.mesos.coarse", false)
val url = mesosUrl.stripPrefix("mesos://") // strip scheme from raw Mesos URLs
val backend = if (coarseGrained) {
new CoarseMesosSchedulerBackend(scheduler, sc, url)
} else {
new MesosSchedulerBackend(scheduler, sc, url)
}
scheduler.initialize(backend)
(backend, scheduler)
case SIMR_REGEX(simrUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val backend = new SimrSchedulerBackend(scheduler, sc, simrUrl)
scheduler.initialize(backend)
(backend, scheduler)
case _ =>
throw new SparkException("Could not parse Master URL: '" + master + "'")
}
}
_taskScheduler.start()的目的是启动相应的SchedulerBackend,并启动定时器进行检测。
生成SchedulerBackend , 这里会根据具体的调度方式,setMaster(),local –>LocalBackend,standalone–>SparkDeployeBackend ,然后创建一个Application主要用来负责和Master进行通信,在SparkDeployeSechduler内部的start里面创建了Driver的Actor,主要用来和Master进行通信
其实是调用父类的start方法
initialize()方法主要来创建SchedulerBackend有两种策略,FIFO,Fait策略
def initialize(backend: SchedulerBackend) {
this.backend = backend
// temporarily set rootPool name to empty
rootPool = new Pool("", schedulingMode, 0, 0)
schedulableBuilder = {
schedulingMode match {
case SchedulingMode.FIFO =>
new FIFOSchedulableBuilder(rootPool)
case SchedulingMode.FAIR =>
new FairSchedulableBuilder(rootPool, conf)
}
}
schedulableBuilder.buildPools()
}
override def start() {
super.start()
// The endpoint for executors to talk to us
val driverUrl = AkkaUtils.address(
AkkaUtils.protocol(actorSystem),
SparkEnv.driverActorSystemName,
conf.get("spark.driver.host"),
conf.get("spark.driver.port"),
CoarseGrainedSchedulerBackend.ACTOR_NAME)
val args = Seq(
"--driver-url", driverUrl,
"--executor-id", "{{EXECUTOR_ID}}",
"--hostname", "{{HOSTNAME}}",
"--cores", "{{CORES}}",
"--app-id", "{{APP_ID}}",
"--worker-url", "{{WORKER_URL}}")
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
.map(Utils.splitCommandString).getOrElse(Seq.empty)
val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
// When testing, expose the parent class path to the child. This is processed by
// compute-classpath.{cmd,sh} and makes all needed jars available to child processes
// when the assembly is built with the "*-provided" profiles enabled.
val testingClassPath =
if (sys.props.contains("spark.testing")) {
sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
} else {
Nil
}
// Start executors with a few necessary configs for registering with the scheduler
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
appUIAddress, sc.eventLogDir, sc.eventLogCodec)
//sc.env.actorSystem 已经创建的Actor通信模型, masters 传进来的Master地址, appDesc Application 描述信息, this 当前对象 , conf 配置文件
client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
client.start()
waitForRegistration() //等待接收注册返回的信息,以为注册这个过程是开了一个线程去做的,
}
在Application中和master进行通信,持有master地址,application描述器,监听事件,收到具体的事件信息时候,回调相应的接口
**
* Interface allowing applications to speak with a Spark deploy cluster. Takes a master URL,
* an app description, and a listener for cluster events, and calls back the listener when various
* events occur.
*
* @param masterUrls Each url should look like spark://host:port.
*/
private[spark] class AppClient(
actorSystem: ActorSystem,
masterUrls: Array[String],
appDescription: ApplicationDescription,
listener: AppClientListener,
conf: SparkConf)
extends Logging {
val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, AkkaUtils.protocol(actorSystem)))
val REGISTRATION_TIMEOUT = 20.seconds
val REGISTRATION_RETRIES = 3
var masterAddress: Address = null
var actor: ActorRef = null
var appId: String = null
var registered = false
var activeMasterUrl: String = null
class ClientActor extends Actor with ActorLogReceive with Logging {
var master: ActorSelection = null
var alreadyDisconnected = false // To avoid calling listener.disconnected() multiple times
var alreadyDead = false // To avoid calling listener.dead() multiple times
var registrationRetryTimer: Option[Cancellable] = None
override def preStart() {
context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
try {
registerWithMaster()
} catch {
case e: Exception =>
logWarning("Failed to connect to master", e)
markDisconnected()
context.stop(self)
}
}
// 因为我们在配置master的时候,设置了HA ,可以有多个master地址
//获取master的代理对象,同时向master发送注册通知信息RegisterApplication(appDescription)
def tryRegisterAllMasters() {
for (masterAkkaUrl <- masterAkkaUrls) {
logInfo("Connecting to master " + masterAkkaUrl + "...")
val actor = context.actorSelection(masterAkkaUrl)
actor ! RegisterApplication(appDescription)
}
}
def registerWithMaster() {
tryRegisterAllMasters()
import context.dispatcher
var retries = 0
registrationRetryTimer = Some {
context.system.scheduler.schedule(REGISTRATION_TIMEOUT, REGISTRATION_TIMEOUT) {
Utils.tryOrExit {
retries += 1
if (registered) {
registrationRetryTimer.foreach(_.cancel())
} else if (retries >= REGISTRATION_RETRIES) {
markDead("All masters are unresponsive! Giving up.")
} else {
tryRegisterAllMasters()
}
}
}
}
}
def changeMaster(url: String) {
// activeMasterUrl is a valid Spark url since we receive it from master.
activeMasterUrl = url
master = context.actorSelection(
Master.toAkkaUrl(activeMasterUrl, AkkaUtils.protocol(actorSystem)))
masterAddress = Master.toAkkaAddress(activeMasterUrl, AkkaUtils.protocol(actorSystem))
}
private def isPossibleMaster(remoteUrl: Address) = {
masterAkkaUrls.map(AddressFromURIString(_).hostPort).contains(remoteUrl.hostPort)
}
override def receiveWithLogging = {
case RegisteredApplication(appId_, masterUrl) =>
appId = appId_
registered = true
changeMaster(masterUrl)
listener.connected(appId)
case ApplicationRemoved(message) =>
markDead("Master removed our application: %s".format(message))
context.stop(self)
case ExecutorAdded(id: Int, workerId: String, hostPort: String, cores: Int, memory: Int) =>
val fullId = appId + "/" + id
logInfo("Executor added: %s on %s (%s) with %d cores".format(fullId, workerId, hostPort,
cores))
master ! ExecutorStateChanged(appId, id, ExecutorState.RUNNING, None, None)
listener.executorAdded(fullId, workerId, hostPort, cores, memory)
case ExecutorUpdated(id, state, message, exitStatus) =>
val fullId = appId + "/" + id
val messageText = message.map(s => " (" + s + ")").getOrElse("")
logInfo("Executor updated: %s is now %s%s".format(fullId, state, messageText))
if (ExecutorState.isFinished(state)) {
listener.executorRemoved(fullId, message.getOrElse(""), exitStatus)
}
case MasterChanged(masterUrl, masterWebUiUrl) =>
logInfo("Master has changed, new master is at " + masterUrl)
changeMaster(masterUrl)
alreadyDisconnected = false
sender ! MasterChangeAcknowledged(appId)
case DisassociatedEvent(_, address, _) if address == masterAddress =>
logWarning(s"Connection to $address failed; waiting for master to reconnect...")
markDisconnected()
case AssociationErrorEvent(cause, _, address, _, _) if isPossibleMaster(address) =>
logWarning(s"Could not connect to $address: $cause")
case StopAppClient =>
markDead("Application has been stopped.")
sender ! true
context.stop(self)
}
/**
* Notify the listener that we disconnected, if we hadn't already done so before.
*/
def markDisconnected() {
if (!alreadyDisconnected) {
listener.disconnected()
alreadyDisconnected = true
}
}
def markDead(reason: String) {
if (!alreadyDead) {
listener.dead(reason)
alreadyDead = true
}
}
override def postStop() {
registrationRetryTimer.foreach(_.cancel())
}
}
def start() {
// Just launch an actor; it will call back into the listener.
actor = actorSystem.actorOf(Props(new ClientActor))
}
def stop() {
if (actor != null) {
try {
val timeout = AkkaUtils.askTimeout(conf)
val future = actor.ask(StopAppClient)(timeout)
Await.result(future, timeout)
} catch {
case e: TimeoutException =>
logInfo(“Stop request to Master timed out; it may already be shut down.”)
}
actor = null
}
}
}
步骤3:以上一步创建的TaskScheduler实例为入参创建DAGScheduler并启动运行。
_dagScheduler = new DAGScheduler(this)
DAGScheduler是最高级别的scheduler调度层的抽象,主要用来对我们操作划分stage,能够为我们的每一个job划分
DAG的stage,同时跟踪每一个rdd 和 每一个阶段的输出被物化(就是说数据写到磁盘或者内存),同时为我们的job找到一条最佳的调度方法,然后以一批taskSets的方式作为stage 提交给TaskScheduler
/**
* The high-level scheduling layer that implements stage-oriented scheduling. It computes a DAG of
* stages for each job, keeps track of which RDDs and stage outputs are materialized, and finds a
* minimal schedule to run the job. It then submits stages as TaskSets to an underlying
* TaskScheduler implementation that runs them on the cluster.
*
* In addition to coming up with a DAG of stages, this class also determines the preferred
* locations to run each task on, based on the current cache status, and passes these to the
* low-level TaskScheduler. Furthermore, it handles failures due to shuffle output files being
* lost, in which case old stages may need to be resubmitted. Failures *within* a stage that are
* not caused by shuffle file loss are handled by the TaskScheduler, which will retry each task
* a small number of times before cancelling the whole stage.
*
* Here's a checklist to use when making or reviewing changes to this class:
*
* - When adding a new data structure, update `DAGSchedulerSuite.assertDataStructuresEmpty` to
* include the new structure. This will help to catch memory leaks.
*/
步骤4:启动WebUI(SparkUI)
_ui =
if (conf.getBoolean("spark.ui.enabled", true)) {
Some(SparkUI.createLiveUI(this, _conf, listenerBus, _jobProgressListener,
_env.securityManager, appName, startTime = startTime))
} else {
// For tests, do not enable the UI
None
}
// Bind the UI before starting the task scheduler to communicate
// the bound port to the cluster manager properly
_ui.foreach(_.bind())