1. Spark DAG引擎介绍
DAG,中文含义是有向无环图,主要是用来描述任务之间的先后关系。spark中的DAGScheduler主要是负责任务的逻辑调度。负责将job拆分成不同阶段的具有依赖关系的多批次任务,并且指定调度逻辑。具体特点如下
- DAG图有向无环,无循环依赖
- 多个没有关系的stage之间可以并行调度
- 支持基于血缘的任务恢复
- spark可以通过将中间结果持久化到内存中用于提升程序的效率
2. Spark Runtime中的重要组件
2.1 DAG Scheduler
工作机制:
DAG Scheduler会维护waiting jobs 和 active jobs 两个队列,
维护 waiting stages、active stages 和 failed stages,以及与 jobs 的映射关系
2.2 TaskScheduler
工作机制:
TaskScheduler的主要任务是将Task和集群中的资源进行有效结合。内部维护了一个任务队列,用于存放需要执行的任务。
TaskScheduler 本身是个接口,Spark 里只实现了一个 TaskSchedulerImpl,理论上任务调度可以定制
2.3 SchedulerBackend & ExecutorBackend
两者都是通信组件,负责driver和任务executor之间的通信。SchedulerBackend存在于Driver中,ExecutorBackend存在于Executor中。
2.4 SparkContext
SparkContext是Spark Application运行的上下文对象。包含了很多必要的组件
-
SparkContext 是用户通往 Spark 集群的唯一入口,可以用来在 Spark 集群中创建 RDD 、累加器 Accumulator 和广播变量 Braodcast Variable
-
SparkContext 在实例化的过程中会初始化 DAGScheduler 、 TaskScheduler 和 SchedulerBackend
-
SparkContext 会调用 DAGScheduler 将整个 Job 划分成几个小的阶段 (Stage) , TaskScheduler 会调度每个 Stage 的任务 (Task)给到合适的executor执行 。另外,SchedulerBackend 管理整个集群中为这个当前的应用分配的计算资源 (Executor)。
2.5 总结
spark driver中最重要的三个组件分别是DAGScheducer、TaskScheduler、SchedulerBackend。
三者之间的关系可以这样理解。如果将spark比作一家公司。DAGScheducer就是总设计师,负责将一个大的任务分别成一个个的阶段,把控整体进度;SchedulerBackend就是人事和行政经理,负责统筹整个公司的人力和行政资源;TaskScheduler就是具体运作的经理,将总设计师的各个阶段分解成一个个的子任务,并从人事经理中获取相应的人力资源和行政资源,将活派给合适的人。
3. Spark Application提交消息流程
- spark通过spark-submit命令提交代码,并启动client,创建clientEndpoint,通过clientEndpoint组件发送RequestSubmitDriver消息到master
- master接收到RequestSubmitDriver消息后,发送launchDriver消息到worker,启动driver
- driver接收到launchDriver消息后开始启动driver,启动完毕后给master返回DriverStateChanged消息,这是倒数第二的消息
- master接收到DriverStateChanged消息后创建SubmitDriverResponse消息发送给client,这是最后一步消息
- 在driver中初始化clientEndpoint负责和master进行通信,通过clientEndpoint的onstart方法发送registerApplication消息到master
- master接收到 registerApplication消息后做相关处理并返回registeredApplication消息给到driver所在的worker
- master同时又根据registerApplication消息创建lanuchExecutor消息去启动executor
- executor启动后会把executor的相关信息注册到driver中,发送registerExecutor消息给driver的driverEndpoint
- driver的driverEndpoint收到registeredExecutor消息后做相关处理发送registeredExecutor给到executor通信组件
- executor上报自己的状态给到worker的endpoint组件 ,发送executorStateChanged消息
- worker上报executorStateChanged消息给到master
- master会将可以使用的executor信息再发送给driver的clientEndpoint组件
- driver的driverEndpoint组件启动后会发送lauchTask消息给到executor用于任务的启动
4. Spark Application 源码执行流程
4.1 spark-submit发送RequestDriverSubmit消息到master
- 通过spark summit脚本提交一个app应用,核心是SparkSubmit类,通过这个类去启动Client,Client的核心组件是ClientEndPoint。
- ClientEndpoint类中重点关注onStart方法,初始化的时候会封装RequestDriverSubmit消息发送到master
/**
1.首先执行sparkSubmit的main方法
*/
object SparkSubmit extends CommandLineUtils with Logging {
// main方法执行入口
override def main(args: Array[String]): Unit = {
//创建SparkSubmit对象
val submit = new SparkSubmit() {
self =>
//继承父类的doSubmit方法
override def doSubmit(args: Array[String]): Unit = {
try {
super.doSubmit(args)
} catch {
case e: SparkUserAppException =>
exitFn(e.exitCode)
}
}
}
//执行doSubmit方法
submit.doSubmit(args)
}
}
//点进doSubmit方法中
def doSubmit(args: Array[String]): Unit = {
//获取解析后的参数对象
val appArgs = parseArguments(args)
if (appArgs.verbose) {
logInfo(appArgs.toString)
}
//对配置中的actions参数进行参数匹配
appArgs.action match {
//submit 任务提交
case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
//kill 杀死程序
case SparkSubmitAction.KILL => kill(appArgs)
case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
case SparkSubmitAction.PRINT_VERSION => printVersion()
}
}
}
//从submit(appArgs, uninitLog)方法进入
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
//根据spark-submit的参数决定怎样去做app的初始化工作
val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
}
//进入prepareSubmitEnvironment方法中
private[deploy] def prepareSubmitEnvironment(
//这里选取yarn-cluster模式做例子
if (isYarnCluster) {
//yarn集群运行主类是YARN_CLUSTER_SUBMIT_CLASS
// org.apache.spark.deploy.yarn.YarnClusterApplication
childMainClass = YARN_CLUSTER_SUBMIT_CLASS
if (args.isPython) {
childArgs += ("--primary-py-file", args.primaryResource)
childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
} else if (args.isR) {
val mainFile = new Path(args.primaryResource).getName
childArgs += ("--primary-r-file", mainFile)
childArgs += ("--class", "org.apache.spark.deploy.RRunner")
} else {
if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
childArgs += ("--jar", args.primaryResource)
}
//常规程序运行主类加到childArgs变量中
childArgs += ("--class", args.mainClass)
}
if (args.childArgs != null) {
args.childArgs.foreach { arg => childArgs += ("--arg", arg) }
}
}
//返回四个参数
(childArgs, childClasspath, sparkConf, childMainClass)
}
//继续在submit中看下面的domain方法
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
def doRunMain(): Unit = {
//最重要的是runMain方法,借助prepareSubmitEnvironment方法确定的参数执行runMain
runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
}
}
//在runMain方法中启动RPC客户端
private def runMain(
childArgs: Seq[String],
childClasspath: Seq[String],
sparkConf: SparkConf,
childMainClass: String,
verbose: Boolean): Unit = {
app.start(childArgs.toArray, sparkConf)
}
//进入Client类的start方法中
override def start(args: Array[String], conf: SparkConf): Unit = {
val driverArgs = new ClientArguments(args)
//启动rpcEnv,通信组件的运行环境
val rpcEnv =
RpcEnv.create("driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf))
//创建和master通信的组件ClientEndpoint
rpcEnv.setupEndpoint("client", new ClientEndpoint(rpcEnv, driverArgs, masterEndpoints, conf))
rpcEnv.awaitTermination()
}
//进入clientEndPoint类中,重点关注onStart方法和onReceive方法
private class ClientEndpoint(
override val rpcEnv: RpcEnv,
driverArgs: ClientArguments,
masterEndpoints: Seq[RpcEndpointRef],
conf: SparkConf)
extends ThreadSafeRpcEndpoint with Logging {
//onStart方法,初始化的时候执行一次
override def onStart(): Unit = {
//封装DriverDescription消息,里面保存了driver的相关信息,包括jar的url,所需的资源cpu和momery等
val driverDescription = new DriverDescription(
driverArgs.jarUrl,
driverArgs.memory,
driverArgs.cores,
driverArgs.supervise,
command)
//发送driverDescription 消息给master,并且等待SubmitDriverResponse回复
asyncSendToMasterAndForwardReply[SubmitDriverResponse](
RequestSubmitDriver(driverDescription))
}
}
4.2 Master接收RequestDriverSubmit消息,并发送lauchDriver到worker
- 进入Master类中的receiveAndReply方法中,找到RequestDriverSubmit的相关内容
- Master接收RequestDriverSubmit对象中的driver相关信息,封装到driverInfo中
- 将driverInfo的信息加入到waitingDrivers数组中,waitingDrivers保存了等待启动的所有drivers信息
- receiveAndReply方法中的schedule方法有两个作用:1. 给waitingDrivers找合适的worker进行启动 2. 给waitingApp找合适的executor启动(后续解释)
- 给waitingDrivers找合适的worker的主要方式是通过对活着的worker进行随机洗牌,然后从中抽取一个worer和waitingDrivers中的driver的cpu和men进行对比,如果合适就放在该worker中启动,启动driver是发送lauchDriver消息给到worker,里面封装了worker信息driver信息。
private[deploy] class Master(
override val rpcEnv: RpcEnv,
address: RpcAddress,
webUiPort: Int,
val securityMgr: SecurityManager,
val conf: SparkConf)
extends ThreadSafeRpcEndpoint with Logging with LeaderElectable {
//关注receiveAndReply方法中的RequestSubmitDriver消息相关内容
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RequestSubmitDriver(description) =>
//创建一个driverInfo对象
val driver: DriverInfo = createDriver(description)
//将driverInfo对象加入到持久化引擎中
persistenceEngine.addDriver(driver)
//将driverInfo对象加入到等待启动driver的waitingDrivers数组中
waitingDrivers += driver
drivers.add(driver)
// schedule()方法,用于调度driver和executor
schedule()
//对RequestSubmitDriver消息做一个回复,回复是SubmitDriverResponse对象,一般是最后进行回复
context.reply(SubmitDriverResponse(self, true, Some(driver.id),
s"Driver successfully submitted as ${driver.id}"))
}
}
// 进入schedule方法中
private def schedule(): Unit = {
//shuffledAliveWorkers 是对所有活着的worker进行随机洗牌
val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
//统计能使用的worker的数量
val numWorkersAlive = shuffledAliveWorkers.size
var curPos = 0
//对waitingDrivers数组中等待启动的driver进行遍历
for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
var launched = false
var numWorkersVisited = 0
//numWorkersVisited 小于numWorkersAlive 的数量并且launched为false的时候进入
while (numWorkersVisited < numWorkersAlive && !launched) {
//获取某个worker
val worker = shuffledAliveWorkers(curPos)
numWorkersVisited += 1
//当worker的内存和cores的空闲数量都符合启动driver的要求时
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
//发送launchDriver到对应到worker中去启动driver
launchDriver(worker, driver)
//从待启动的driver数组中移除该driver
waitingDrivers -= driver
launched = true
}
curPos = (curPos + 1) % numWorkersAlive
}
}
//在worker中启动executor,下面会有详解
startExecutorsOnWorkers()
}
}
//发送launchDriver给到对应的worker
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
logInfo("Launching driver " + driver.id + " on worker " + worker.id)
worker.addDriver(driver)
driver.worker = Some(worker)
//发送launchDriver给到对应的worker
worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
driver.state = DriverState.RUNNING
}
4.3 Worker接收launchDriver消息并启动Driver
-
当一个 Worker 接收到 LaunchDriver 消息的时候,就要启动一个 Driver JVM 进程,核心入口是launchDriver
- Worker接收到launchDriver消息后将driver信息封装成DriverRunner,并执行start方法去启动driver
- 关注DriverRunner的start方法,这个方法用于启动driver,主要关注两个方法
- prepareAndRunDriver()用于启动driver
- worker.send(DriverStateChanged(...))用于通知worker,driver已经启动
- 关注DriverStateChanged消息在worker中处理,worker会将该消息转发给master,sendToMaster(driverStateChanged)
private[deploy] class Worker() extends ThreadSafeRpcEndpoint with Logging {
//已经使用的cores数量
var coresUsed = 0
//已经使用到的内存量
var memoryUsed = 0
case LaunchDriver(driverId, driverDesc) =>
logInfo(s"Asked to launch driver $driverId")
//将driver信息封装到DriverRunner中
val driver = new DriverRunner(
conf,
driverId,
workDir,
sparkHome,
driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
self,
workerUri,
securityMgr)
drivers(driverId) = driver
//启动driver
driver.start()
//记录已经使用的内存量和cpu核数
coresUsed += driverDesc.cores
memoryUsed += driverDesc.mem
}
//进入到DriverRunner类中的start方法
private[deploy] class DriverRunner( conf: SparkConf,
val driverId: String,
val workDir: File,
val sparkHome: File,
val driverDesc: DriverDescription,
val worker: RpcEndpointRef,
val workerUrl: String,
val securityManager: SecurityManager)
extends Logging {
private[worker] def start() = {
new Thread("DriverRunner for " + driverId) {
override def run() {
var shutdownHook: AnyRef = null
try {
//添加钩子函数用于杀死driver进程
shutdownHook = ShutdownHookManager.addShutdownHook { () =>
logInfo(s"Worker shutting down, killing driver $driverId")
kill()
}
// 为driver启动做前置准备并启动driver
val exitCode = prepareAndRunDriver()
// driver给worker发送driverStateChanged消息用于通知driver已经启动
worker.send(DriverStateChanged(driverId, finalState.get, finalException))
}
}.start()
}
//进入方法prepareAndRunDriver
private[worker] def prepareAndRunDriver(): Int = {
// TODO: If we add ability to submit multiple jars they should also be added here
val builder = CommandUtils.buildProcessBuilder(driverDesc.command, securityManager,
driverDesc.mem, sparkHome.getAbsolutePath, substituteVariables)
//真正启动driver的方法
runDriver(builder, driverDir, driverDesc.supervise)
}
}
// 进入runDriver方法中
private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: Boolean): Int = {
// 启动driver
runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
}
// 进入runCommandWithRetry方法
private[worker] def runCommandWithRetry(
command: ProcessBuilderLike, initialize: Process => Unit, supervise: Boolean): Int = {
var exitCode = -1
// Time to wait between submission retries.
var waitSeconds = 1
// A run of this many seconds resets the exponential back-off.
val successfulRunDuration = 5
var keepTrying = !killed
while (keepTrying) {
logInfo("Launch Command: " + command.command.mkString("\"", "\" \"", "\""))
synchronized {
if (killed) { return exitCode }
//启动方法start
process = Some(command.start())
initialize(process.get)
}
val processStart = clock.getTimeMillis()
exitCode = process.get.waitFor()
}
}
exitCode
}
}
//关注driverStateChanged消息在worker中的处理
case driverStateChanged @ DriverStateChanged(driverId, state, exception) =>
handleDriverStateChanged(driverStateChanged)
//进入handleDriverStateChanged方法中
private[worker] def handleDriverStateChanged(driverStateChanged: DriverStateChanged): Unit = {
val driverId = driverStateChanged.driverId
val exception = driverStateChanged.exception
val state = driverStateChanged.state
//给master返回driver已经启动的消息driverStateChanged
sendToMaster(driverStateChanged)
val driver = drivers.remove(driverId).get
finishedDrivers(driverId) = driver
trimFinishedDriversIfNecessary()
memoryUsed -= driver.driverDesc.mem
coresUsed -= driver.driverDesc.cores
}
4.4 启动Driver,执行DriverWrapper的main方法
- 启动Drirver并启动通信组件rpcEndPoint
- 通过反射获取运行程序的主类 ,然后运行我们自己编写的spark代码
//接上文中的 driver启动命令process = Some(command.start())
//真实启动的主类是DriverWrapper
object DriverWrapper extends Logging {
def main(args: Array[String]) {
args.toList match {
//一些变量的赋值操作
case workerUrl :: userJar :: mainClass :: extraArgs =>
val conf = new SparkConf()
val host: String = Utils.localHostName()
val port: Int = sys.props.getOrElse("spark.driver.port", "0").toInt
//创建rpcEnv
val rpcEnv = RpcEnv.create("Driver", host, port, conf, new SecurityManager(conf))
logInfo(s"Driver address: ${rpcEnv.address}")
//创建rpcEndpoint
rpcEnv.setupEndpoint("workerWatcher", new WorkerWatcher(rpcEnv, workerUrl))
//通过反射启动程序主类
val clazz = Utils.classForName(mainClass)
val mainMethod = clazz.getMethod("main", classOf[Array[String]])
mainMethod.invoke(null, extraArgs.toArray[String])
rpcEnv.shutdown()
}
}
4.5 sparkContext的初始化
- 运行自己代码的第一件事就是初始化sparkContext
- 初始化sparkContext 这里只重点介绍DAGScheduler、TaskScheduler、SchedulerBackend
- 首先是TaskScheduler对象的创建 ,通过SparkContext.createTaskScheduler入口进入,主要是初始化taskScheduler和StandaloneSchedulerBackend
- 接下来是DAGScheduler的初始化,重点是DAGSchedulerEventProcessLoop事件处理器的初始化
- _taskScheduler.start()
- backend.start() 方法主要是启动driverEndpoint和clientEndpoint
- driverEndpoint负责和其他worker进行通信,关注onStart方法,主要是启动定时任务去lanuchTasks,用于将任务发送给合适的executor执行
- clientEndpoint 负责和master通信 ,关注onStart方法,会向master进行注册,发送registerApplication消息,并且启动executor
- backend.start() 方法主要是启动driverEndpoint和clientEndpoint
//核心入口
new SparkContext(sparkconf)
class SparkContext(config: SparkConf) extends Logging {
//sparkConf变量
private var _conf: SparkConf = _
private var _schedulerBackend: SchedulerBackend = _
private var _taskScheduler: TaskScheduler = _
private var _dagScheduler: DAGScheduler = _
val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
_schedulerBackend = sched
_taskScheduler = ts
//创建DAGScheduler
_dagScheduler = new DAGScheduler(this)
_taskScheduler.start()
}
//进入createTaskScheduler方法查看taskScheduler对象的创建和scheduleBackend对象创建
private def createTaskScheduler(
sc: SparkContext,
master: String,
deployMode: String): (SchedulerBackend, TaskScheduler) = {
import SparkMasterRegex._
// 根据传入的master参数去进行匹配,这里匹配上的是spark集群的方式
master match {
case SPARK_REGEX(sparkUrl) =>
//创建TaskScheduler对象
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
//创建backend对象
val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
(backend, scheduler)
}
}
//进入_dagScheduler = new DAGScheduler(this)中,重点关注DAGSchedulerEventProcessLoop
private[spark] class DAGScheduler(
private[scheduler] val sc: SparkContext,
private[scheduler] val taskScheduler: TaskScheduler,
listenerBus: LiveListenerBus,
mapOutputTracker: MapOutputTrackerMaster,
blockManagerMaster: BlockManagerMaster,
env: SparkEnv,
clock: Clock = new SystemClock())
extends Logging {
// 初始化一个事件处理器DAGSchedulerEventProcessLoop,是一个异步事件驱动模型
private[spark] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)
taskScheduler.setDAGScheduler(this)
//并启动异步事件驱动模型
eventProcessLoop.start()
}
//分析_taskScheduler.start()
override def start() {
//schedulerBackend的启动
backend.start()
if (!isLocal && conf.getBoolean("spark.speculation", false)) {
logInfo("Starting speculative execution thread")
speculationScheduler.scheduleWithFixedDelay(new Runnable {
override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
checkSpeculatableTasks()
}
}, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
}
}
//分析backend.start() 进入的是StandaloneSchedulerBackend类的start方法
override def start() {
super.start()
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appDesc = ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
webUrl, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
//创建StandaloneAppClient
client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
client.start()
launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
waitForRegistration()
launcherBackend.setState(SparkAppHandle.State.RUNNING)
}
// StandaloneSchedulerBackend.start方法的父类CoarseGrainedSchedulerBackend中的start方法
override def start() {
//将spark的配置信息加入到properties类中
val properties = new ArrayBuffer[(String, String)]
for ((key, value) <- scheduler.sc.conf.getAll) {
if (key.startsWith("spark.")) {
properties += ((key, value))
}
}
// 创建driverEndpoint ,负责和其他的worker进行通信
driverEndpoint = createDriverEndpointRef(properties)
}
//进入client.start()方法
def start() {
// 创建了clientEndPoint对象,负责和master进行通信
endpoint.set(rpcEnv.setupEndpoint("AppClient", new ClientEndpoint(rpcEnv)))
}
//进入clientEndPoint对象的onStart方法,看初始化过程
private[spark] class StandaloneAppClient(
rpcEnv: RpcEnv,
masterUrls: Array[String],
appDescription: ApplicationDescription,
listener: StandaloneAppClientListener,
conf: SparkConf)
extends Logging {
override def onStart(): Unit = {
try {
//向master进行注册
registerWithMaster(1)
} catch {
case e: Exception =>
logWarning("Failed to connect to master", e)
markDisconnected()
stop()
}
}
// 进入 registerWithMaster(1)方法
private def registerWithMaster(nthRetry: Int) {
//向所有的master进行注册
registerMasterFutures.set(tryRegisterAllMasters())
}
// 进入tryRegisterAllMasters
private def tryRegisterAllMasters(): Array[JFuture[_]] = {
for (masterAddress <- masterRpcAddresses) yield {
registerMasterThreadPool.submit(new Runnable {
override def run(): Unit = try {
if (registered.get) {
return
}
//创建rpcEnvRef
val masterRef = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
//向master发送RegisterApplication消息并携带app信息
masterRef.send(RegisterApplication(appDescription, self))
} catch {
case ie: InterruptedException => // Cancelled
case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
}
})
}
}
}
// 进入master类的关于RegisterApplication消息的处理代码
case RegisterApplication(description, driver) =>
if (state == RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
//封装app信息到ApplicationInfo 类中
val app: ApplicationInfo = createApplication(description, driver)
//向 waitingApps数组中加入该待启动的app
registerApplication(app)
persistenceEngine.addApplication(app)
// 向driver发送RegisteredApplication消息
driver.send(RegisteredApplication(app.id, self))
schedule()
}
// 进入和master交互的StandaloneAppClient类中关于RegisteredApplication消息的处理代码
override def receive: PartialFunction[Any, Unit] = {
case RegisteredApplication(appId_, masterRef) =>
//建立和master之间的联系
appId.set(appId_)
registered.set(true)
master = Some(masterRef)
listener.connected(appId.get)
}
// 进入RegisterApplication中的schedule方法 并关注startExecutorsOnWorkers()方法
// 这个方法其实就是给app任务找合适的worker去启动executor
private def startExecutorsOnWorkers(): Unit = {
//遍历waitingApps中的待启动的app
for (app <- waitingApps) {
val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
// 当剩余的core能够满足app任务的启动的时候
if (app.coresLeft >= coresPerExecutor) {
//寻找能用的worker并根据core的数量进行排序,从大到小排序
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
worker.coresFree >= coresPerExecutor)
.sortBy(_.coresFree).reverse
//分配executor的方法 ,具体进入scheduleExecutorsOnWorkers方法中
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
// Now that we've decided how many cores to allocate on each worker, let's allocate them
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
}
}
}
}
// 进入scheduleExecutorsOnWorkers方法中
//返回的结果是每个worker中分配的cores数量
private def scheduleExecutorsOnWorkers(
app: ApplicationInfo,
usableWorkers: Array[WorkerInfo],
spreadOutApps: Boolean): Array[Int] = {
val coresPerExecutor = app.desc.coresPerExecutor
val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
val oneExecutorPerWorker = coresPerExecutor.isEmpty
val memoryPerExecutor = app.desc.memoryPerExecutorMB
val numUsable = usableWorkers.length
val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker
val assignedExecutors = new Array[Int](numUsable) // Number of new executors on each worker
var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
/** Return whether the specified worker can launch an executor for this app. */
def canLaunchExecutor(pos: Int): Boolean = {
val keepScheduling = coresToAssign >= minCoresPerExecutor
val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor
val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
if (launchingNewExecutor) {
val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
keepScheduling && enoughCores && enoughMemory && underLimit
} else {
keepScheduling && enoughCores
}
}
var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
while (freeWorkers.nonEmpty) {
freeWorkers.foreach { pos =>
var keepScheduling = true
while (keepScheduling && canLaunchExecutor(pos)) {
coresToAssign -= minCoresPerExecutor
assignedCores(pos) += minCoresPerExecutor
// If we are launching one executor per worker, then every iteration assigns 1 core
// to the executor. Otherwise, every iteration assigns cores to a new executor.
if (oneExecutorPerWorker) {
assignedExecutors(pos) = 1
} else {
assignedExecutors(pos) += 1
}
if (spreadOutApps) {
keepScheduling = false
}
}
}
freeWorkers = freeWorkers.filter(canLaunchExecutor)
}
assignedCores
}
// 进入allocateWorkerResourceToExecutors方法
// 方法本质是给对应的worker发送launchExecutor消息启动executor并将对应的app状态改为running
private def allocateWorkerResourceToExecutors(
app: ApplicationInfo,
assignedCores: Int,
coresPerExecutor: Option[Int],
worker: WorkerInfo): Unit = {
val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
for (i <- 1 to numExecutors) {
val exec = app.addExecutor(worker, coresToAssign)
// 向对应worker发送lanuchExecutor消息
launchExecutor(worker, exec)
app.state = ApplicationState.RUNNING
}
}
// 进入launchExecutor方法
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
worker.addExecutor(exec)
//给对应的worker发送LaunchExecutor消息启动executor
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
//给driver发送executorAdd消息
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}
// 进入worker类中的LaunchExecutor相关消息代码处理
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
...
//封装一个executorRunner对象,里面保存了app的相关信息和相关配置
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
workerUri,
conf,
appLocalDirs, ExecutorState.RUNNING)
executors(appId + "/" + execId) = manager
// 启动executor
manager.start()
coresUsed += cores_
memoryUsed += memory_
//给master发送ExecutorStateChanged状态变更消息
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
} catch {
case e: Exception =>
logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
if (executors.contains(appId + "/" + execId)) {
executors(appId + "/" + execId).kill()
executors -= appId + "/" + execId
}
sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None))
}
}
//进入 manager.start()方法
private[worker] def start() {
// 创建一个worker线程用于启动executor
workerThread = new Thread("ExecutorRunner for " + fullId) {
override def run() {
// 真正的启动方法
fetchAndRunExecutor() }
}
workerThread.start()
// Shutdown hook that kills actors on shutdown.
shutdownHook = ShutdownHookManager.addShutdownHook { () =>
// It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
// be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
if (state == ExecutorState.RUNNING) {
state = ExecutorState.FAILED
}
killProcess(Some("Worker shutting down")) }
}
// 进入fetchAndRunExecutor方法
private def fetchAndRunExecutor() {
val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),
memory, sparkHome.getAbsolutePath, substituteVariables)
//启动executor
process = builder.start()
//等待启动结束
val exitCode = process.waitFor()
// 启动完毕后向worker发送ExecutorStateChanged消息
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
}
//进入worker类中的关于 executorStateChanged 消息的处理
case executorStateChanged @ ExecutorStateChanged(appId, execId, state, message, exitStatus) =>
handleExecutorStateChanged(executorStateChanged)
private[worker] def handleExecutorStateChanged(executorStateChanged: ExecutorStateChanged):
Unit = {
// 向master发送executorStateChanged消息
sendToMaster(executorStateChanged)
val state = executorStateChanged.state
}
//进入master类中的ExecutorStateChanged消息处理
case ExecutorStateChanged(appId, execId, state, message, exitStatus) =>
...
//给driver发送ExecutorUpdated消息
exec.application.driver.send((execId, state, message, exitStatus, false))
}
//进入standaloneAppclient类中找到ExecutorUpdated相关的消息处理代码
case ExecutorUpdated(id, state, message, exitStatus, workerLost) =>
val fullId = appId + "/" + id
val messageText = message.map(s => " (" + s + ")").getOrElse("")
logInfo("Executor updated: %s is now %s%s".format(fullId, state, messageText))
if (ExecutorState.isFinished(state)) {
listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, workerLost)
}
//接下来就是driverEndpoint对象的初始化过程
// 进入CoarseGrainedSchedulerBackend类中的driverEndpoint的onStart方法中
override def onStart() {
// Periodically revive offers to allow delay scheduling to work
val reviveIntervalMs = conf.getTi meAsMs("spark.scheduler.revive.interval", "1s")
//定时调度任务,重点关注send方法,参数是ReviveOffers,类似于任务
reviveThread.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
//自己给自己发送ReviveOffers消息
Option(self).foreach(_.send(ReviveOffers))
}
}, 0, reviveIntervalMs, TimeUnit.MILLISECONDS)
}
case ReviveOffers =>
makeOffers()
// 进入makeOffers方法中,这个方法的主要作用就是将任务发送到指定的executor中执行
private def makeOffers() {
// Make sure no executor is killed while some task is launching on it
val taskDescs = withLock {
// Filter out executors under killing
val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
val workOffers = activeExecutors.map {
case (id, executorData) =>
new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
Some(executorData.executorAddress.hostPort))
}.toIndexedSeq
scheduler.resourceOffers(workOffers)
}
if (!taskDescs.isEmpty) {
//发送任务给到worker中的executor,参数是任务描述
launchTasks(taskDescs)
}
}
// 进入到launchTasks 方法中
private def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
...
//核心代码是给worker端发送相关的启动任务消息
executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
}
// 进入到CoarseGrainedExecutorBackend类中的LaunchTask相关消息处理
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
//任务描述解码
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
//执行相关任务
executor.launchTask(this, taskDesc)
}
//进入launchTask,任务调度是通过线程池调度任务,线程池是newCachedThreadPool
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
val tr = new TaskRunner(context, taskDescription)
runningTasks.put(taskDescription.taskId, tr)
threadPool.execute(tr)
}