本文根据spark1.6源码。
worker位于org.apache.spark.deploy.worker包下,本文分析worker对driver和executor的启动过程。前文分析中,当schedule方法会调用master的launchDriver方法以便启动driver:
private def schedule(): Unit = {
//如果当前的master处于standby状态,那么什么都不做
if (state != RecoveryState.ALIVE) { return }
// Drivers take strict precedence over executors
//Random.shuffle 的作用是随机打乱集合内的元素,拿到随机打乱的worker集合
val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers
//挑选出所有alive状态的worker,遍历每个等待中的driver队列,如果worker的资源可以跑这个driver,就使用这个worker拉起driver,将拉起的driver从等待队列中删除
for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
for (driver <- waitingDrivers) {
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
//调度driver,只有在yarn-cluster的情况下才会在worker启动driver,因为yarn-client和client的情况下会在本地启动driver
<span style="color:#ff0000;">launchDriver(worker, driver)</span>
waitingDrivers -= driver
}
}
}
worker对driver的启动只有在cluster的情况下发生,因为在client模式下,driver会在client端启动,而master下,driver会在某个worker管理的executor中启动。在master中,launchDriver方法通知worker启动driver,源码如下:
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
logInfo("Launching driver " + driver.id + " on worker " + worker.id)
worker.addDriver(driver)
driver.worker = Some(worker)
//向worker发送信息。通知启动driver
<span style="color:#ff0000;">worker.endpoint.send(LaunchDriver(driver.id, driver.desc))</span>
driver.state = DriverState.RUNNING
}
上面发生在master类里,下面分析在master向worker发出通信信息告知后,worker对通知launchDriver的处理。
case LaunchDriver(driverId, driverDesc) => {
logInfo(s"Asked to launch driver $driverId")
//构造一个新的driver
val driver = new DriverRunner(
conf,
driverId,
workDir,
sparkHome,
driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
self,
workerUri,
securityMgr)
//将新生成的driver加入到drivers里面,drivers是个hashmap,worker持有该变量来记录该worker持有的driver
drivers(driverId) = driver
//将调用driverRunner的run方法
<span style="color:#ff0000;">driver.start()</span>
//记录该driver所消耗的该worker上的内存和core
coresUsed += driverDesc.cores
memoryUsed += driverDesc.mem
}
上面的driver实际是个driverRunner,该类会启动一个java Thread,做一些创建本地目录,下载jar包等初始化工作后,调用launchDriver方法启动driver:
private[worker] def start() = {
//创建一个java线程
new Thread("DriverRunner for " + driverId) {
override def run() {
try {
//创建driver工作目录
val driverDir = createWorkingDirectory()
//从hdfs上下载任务jar包到上面创建的目录中 返回worker的本地文件路径
val localJarFilename = downloadUserJar(driverDir)
def substituteVariables(argument: String): String = argument match {
case "{{WORKER_URL}}" => workerUrl
case "{{USER_JAR}}" => localJarFilename
case other => other
}
//构建processBuilder
// TODO: If we add ability to submit multiple jars they should also be added here
val builder = CommandUtils.buildProcessBuilder(driverDesc.command, securityManager,
driverDesc.mem, sparkHome.getAbsolutePath, substituteVariables)
//将builder传给launchDriver方法
<span style="color:#ff0000;"> launchDriver(builder, driverDir, driverDesc.supervise)</span>
}
catch {
case e: Exception => finalException = Some(e)
}
//处理driver的一些状态
val state =
if (killed) {
DriverState.KILLED
} else if (finalException.isDefined) {
DriverState.ERROR
} else {
finalExitCode match {
case Some(0) => DriverState.FINISHED
case _ => DriverState.FAILED
}
}
finalState = Some(state)
//driverRunner向该driver所属的worker发送driverStateChanged消息,worker会接收到该消息做处理
<span style="color:#ff0000;">worker.send(DriverStateChanged(driverId, state, finalException))</span>
}
}.start()
}
上面的方法中有两个重要的法launchDriver和worker.send(DriverStateChanged),首先看launchDriver方法,首先做重定向输出流和输入流到文件中,然后调用了runCommandWithRetry函数:
private def launchDriver(builder: ProcessBuilder, baseDir: File, supervise: Boolean) {
builder.directory(baseDir)
def initialize(process: Process): Unit = {
// Redirect stdout and stderr to files
val stdout = new File(baseDir, "stdout")
CommandUtils.redirectStream(process.getInputStream, stdout)
val stderr = new File(baseDir, "stderr")
val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", "\"")
val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 40)
Files.append(header, stderr, UTF_8)
CommandUtils.redirectStream(process.getErrorStream, stderr)
}
<span style="color:#ff0000;">runCommandWithRetry</span>(ProcessBuilderLike(builder), initialize, supervise)
}
runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)函数,该函数会一直重试启动,process作为一个线程被启动:
def runCommandWithRetry(
command: ProcessBuilderLike, initialize: Process => Unit, supervise: Boolean): Unit = {
// Time to wait between submission retries.
var waitSeconds = 1
// A run of this many seconds resets the exponential back-off.
val successfulRunDuration = 5
var keepTrying = !killed
while (keepTrying) {
logInfo("Launch Command: " + command.command.mkString("\"", "\" \"", "\""))
synchronized {
if (killed) { return }
process = Some(command.start())
initialize(process.get)
}
val processStart = clock.getTimeMillis()
//启动process
val exitCode = process.get.waitFor()
if (clock.getTimeMillis() - processStart > successfulRunDuration * 1000) {
waitSeconds = 1
}
if (supervise && exitCode != 0 && !killed) {
logInfo(s"Command exited with status $exitCode, re-launching after $waitSeconds s.")
//睡眠一秒钟后重试
sleeper.sleep(waitSeconds)
//等待时间乘以2
waitSeconds = waitSeconds * 2 // exponential back-off
}
//重试机制:如果driver被监控 并且 退出代码不等于0 并且 没有被kill
keepTrying = supervise && exitCode != 0 && !killed
finalExitCode = Some(exitCode)
}
}
}
至此,driver被启动。下面分析前面提到到向worker发送状态改变消息机制,也就是当driver的状态改变的时候通知worker。下面分析worker.send(DriverStateChanged(driverId, state, finalException))方法,worker中处理的方法会调用handleDriverStateChanged
方法来处理:
case driverStateChanged @ DriverStateChanged(driverId, state, exception) => {
handleDriverStateChanged(driverStateChanged)
}
跟中handleDriverStateChanged方法,worker在接收到driver完成状态的消息后,会将该消息传递给master,并且释放该driver在该worker上使用的内存:
private[worker] def handleDriverStateChanged(driverStateChanged: DriverStateChanged): Unit = {
val driverId = driverStateChanged.driverId
val exception = driverStateChanged.exception
val state = driverStateChanged.state
state match {
case DriverState.ERROR =>
logWarning(s"Driver $driverId failed with unrecoverable exception: ${exception.get}")
case DriverState.FAILED =>
logWarning(s"Driver $driverId exited with failure")
case DriverState.FINISHED =>
logInfo(s"Driver $driverId exited successfully")
case DriverState.KILLED =>
logInfo(s"Driver $driverId was killed by user")
case _ =>
logDebug(s"Driver $driverId changed state to $state")
}
//worker接收到driver的完成/kill/失败等一些信息后,会将driver状态的变更发送给master
sendToMaster(driverStateChanged)
val driver = drivers.remove(driverId).get
//将该driver放入到已经完成的driver缓存中
finishedDrivers(driverId) = driver
trimFinishedDriversIfNecessary()
//将该driver所占用的内存和CPU信息释放出来
memoryUsed -= driver.driverDesc.mem
coresUsed -= driver.driverDesc.cores
}
master在接收到worker上传递过来的driver完成信息后,对driver进行移除处理,移除driver的一些缓存信息。然后重新调用schedule重新调度资源
case DriverStateChanged(driverId, state, exception) => {
state match {
//如果driver的状态是ERROR,Finished,killed,或者Failed,那么移除该driver信息
case DriverState.ERROR | DriverState.FINISHED | DriverState.KILLED | DriverState.FAILED =>
removeDriver(driverId, state, exception)
case _ =>
throw new Exception(s"Received unexpected state update for driver $driverId: $state")
}
}
具体的移除处理前面已经分析过:
private def removeDriver(
driverId: String,
finalState: DriverState,
exception: Option[Exception]) {
//首先,查看是否存在该driver,如果不存在日志警告;如果存在该driver,将drivers中的该driver移除
drivers.find(d => d.id == driverId) match {
case Some(driver) =>
logInfo(s"Removing driver: $driverId")
//从set中移除该driver
drivers -= driver
if (completedDrivers.size >= RETAINED_DRIVERS) {
val toRemove = math.max(RETAINED_DRIVERS / 10, 1)
completedDrivers.trimStart(toRemove)
}
//将移除的driver加入到完成的driver记录容器中
completedDrivers += driver
//移除driver的持久化信息
persistenceEngine.removeDriver(driver)
//更改driver的状态
driver.state = finalState
driver.exception = exception
//移除该driver对应的worker
driver.worker.foreach(w => w.removeDriver(driver))
schedule()
case None =>
logWarning(s"Asked to remove unknown driver: $driverId")
}
}
}
到此,worker对driver的启动,重试,状态变更跟踪并通知master,master移除缓存并重新执行schedule过程分析完毕。
总结一下:
driver的启动最初由master的调度schedule发出,master发送消息给worker,worker调用启动driver实际由DriverRunner执行,DriverRunner启动一个线程来运行driver,运行完成后将状态变更发送消息给worker,worker将消息传递给master,master做一些清除处理后再次执行schedule方法。到此为止一个循环结束。
关于worker对executor的调度启动与对driver的处理基本相同。