1.简介
上一篇博客我们讲到了Application的注册,注册完成后,需要在相应的worker上启动Executor,用来执行分发的任务。所以本篇博客我们就来了解一下Executor的启动流程。源码版本为Spark-2.4.0
2.代码详解
在完成Worker ,Driver,Application的注册后,都会调用Schedule的方法,进入到Master的schedule的方法里面:
private def schedule(): Unit = {
if (state != RecoveryState.ALIVE) {
return
}
// Drivers take strict precedence over executors
val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
val numWorkersAlive = shuffledAliveWorkers.size
var curPos = 0
for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
// We assign workers to each waiting driver in a round-robin fashion. For each driver, we
// start from the last worker that was assigned a driver, and continue onwards until we have
// explored all alive workers.
var launched = false
var numWorkersVisited = 0
while (numWorkersVisited < numWorkersAlive && !launched) {
val worker = shuffledAliveWorkers(curPos)
numWorkersVisited += 1
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
launchDriver(worker, driver)
waitingDrivers -= driver
launched = true
}
curPos = (curPos + 1) % numWorkersAlive
}
}
//启动Executor
startExecutorsOnWorkers()
}
上面的一段代码,我们前面已经讲解过,主要是在特定的Worker上启动Driver的过程,而本篇博客的重点是最后一行代码,startExecutorsOnWorkers(),进入到这个方法里面:
private def startExecutorsOnWorkers(): Unit = {
// Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
// in the queue, then the second app, etc.
for (app <- waitingApps) {
//遍历在等待队列中的Application,根据ApplicationDescription得到每个核的core数量
val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
// If the cores left is less than the coresPerExecutor,the cores left will not be allocated
//判断剩余的core数量是否大于每个Executor所需的数量
if (app.coresLeft >= coresPerExecutor) {
// Filter out workers that don't have enough resources to launch an executor
//筛选出处于Alive状态的Worker,并且空闲的内存和core的数量也要满足条件
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
worker.coresFree >= coresPerExecutor)
.sortBy(_.coresFree).reverse
//计算每个符合条件并选中的worker分配资源
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
// Now that we've decided how many cores to allocate on each worker, let's allocate them
//确定了每个worker分配的资源后,就开始给相应的worker分配资源
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
}
}
}
}
上述代码主要完成以下工作:
1,遍历Allpication队列,
2,筛选满足条件的worker,
3,为每个选中的worker计算需要分配的资源
4,开始资源分配
下面看一下是如何计算每个worker分配的资源。进入到scheduleExecutorsOnWorkers的算法:
private def scheduleExecutorsOnWorkers(
app: ApplicationInfo,
usableWorkers: Array[WorkerInfo],
spreadOutApps: Boolean): Array[Int] = {
//每个Executor的core数量
val coresPerExecutor = app.desc.coresPerExecutor
//每个Executor分配的最少core数量
val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
//每个worker一个Executor
val oneExecutorPerWorker = coresPerExecutor.isEmpty
//每个Executor的内存大小
val memoryPerExecutor = app.desc.memoryPerExecutorMB
//可用worker的数量
val numUsable = usableWorkers.length
//每个worker已经分配的core数
val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker
//每个worker的Excutor的数量
val assignedExecutors = new Array[Int](numUsable) // Number of new executors on each worker
//总共需要分配的core数量
var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
/** Return whether the specified worker can launch an executor for this app. */
def canLaunchExecutor(pos: Int): Boolean = {
//能够分配的Core数量大于没个Executork可得到的最小core数量
val keepScheduling = coresToAssign >= minCoresPerExecutor
//判断worker是否还有可分配的资源
val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor
// If we allow multiple executors per worker, then we can always launch new executors.
// Otherwise, if there is already an executor on this worker, just give it more cores.
val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
if (launchingNewExecutor) {
//计算每个worker上已经使用的内存大小
val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
//判断是否还有可分配的内存
val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
//判断是否能够满足application所需要的Executor数量需求
val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
keepScheduling && enoughCores && enoughMemory && underLimit
} else {
// We're adding cores to an existing executor, so no need
// to check memory and executor limits
keepScheduling && enoughCores
}
}
// Keep launching executors until no more workers can accommodate any
// more executors, or if we have reached this application's limits
//筛选出满足条件的workers
var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
while (freeWorkers.nonEmpty) {
freeWorkers.foreach { pos =>
var keepScheduling = true
while (keepScheduling && canLaunchExecutor(pos)) {
//更新分配的core数量
coresToAssign -= minCoresPerExecutor
assignedCores(pos) += minCoresPerExecutor
// If we are launching one executor per worker, then every iteration assigns 1 core
// to the executor. Otherwise, every iteration assigns cores to a new executor.
// 如果只允许每个worker上启动一个executor,那么worker分配的executor数
//1
// 如果允许worker启动多个executor,就是在原来的executor上加1
if (oneExecutorPerWorker) {
assignedExecutors(pos) = 1
} else {
assignedExecutors(pos) += 1
}
// Spreading out an application means spreading out its executors across as
// many workers as possible. If we are not spreading out, then we should keep
// scheduling executors on this worker until we use all of its resources.
// Otherwise, just move on to the next worker.
if (spreadOutApps) {
keepScheduling = false
}
}
}
freeWorkers = freeWorkers.filter(canLaunchExecutor)
}
assignedCores
}
上面主要完成以下几件事情:
1,计算相关的数据,包括每个Executor的core数,每个worker上需要分配的Executor数量以及每个Executor的内存大小等
2,采用Spreadout的资源调度算法,为每个满足条件的worker分配资源
3,返回给每个worker分配好的core的数组
这样就计算好了每个worke需要的资源,再回到前面,具体的开始执行资源分配的方法里面:
allocateWorkerResourceToExecutors
private def allocateWorkerResourceToExecutors(
app: ApplicationInfo,
assignedCores: Int,
coresPerExecutor: Option[Int],
worker: WorkerInfo): Unit = {
// If the number of cores per executor is specified, we divide the cores assigned
// to this worker evenly among the executors with no remainder.
// Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
//计算这个worker要分配的Executor数量
val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
//worker上每一次为executor需要分配的cores的数量
val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
for (i <- 1 to numExecutors) {
// 调用addExecutor方法,为application添加executor
val exec = app.addExecutor(worker, coresToAssign)
// 启动executor
launchExecutor(worker, exec)
// 将application的状态改为Running
app.state = ApplicationState.RUNNING
}
}
上面主要完成的事情:
1.计算这个worker需要分配executor数量
2.获取worker上每一次为executor需要分配的cores的数量
3.根据需要分配的executor数量,调用app.addExecutor为application添加executor信息
4.调用launchExecutor,启动executor
5.将application的状态修改为Running
接下来看一下如何启动Executor的,进入到LaunchExecutor:
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
//把需要启动的Executor信息添加到worker中
worker.addExecutor(exec)
//向worker发送启动Executor的消息
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
//向driver发送已经添加executor的消息
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}
上面主要完成:launchExecutor方法中,先向worker发送launchExecutor消息,然后向driver发送消息,告诉driver,executor已添加。
接下来看一下workers收到启动Executor的消息后是如何处理的:
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
//判断master是否是active状态的
if (masterUrl != activeMasterUrl) {
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
} else {
try {
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
// Create the executor's working directory
//新建Executor的工作目录
val executorDir = new File(workDir, appId + "/" + execId)
if (!executorDir.mkdirs()) {
throw new IOException("Failed to create directory " + executorDir)
}
// Create local dirs for the executor. These are passed to the executor via the
// SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the
// application finishes.
//创建application的本地目录
val appLocalDirs = appDirectories.getOrElse(appId, {
val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
val dirs = localRootDirs.flatMap { dir =>
try {
val appDir = Utils.createDirectory(dir, namePrefix = "executor")
Utils.chmod700(appDir)
Some(appDir.getAbsolutePath())
} catch {
case e: IOException =>
logWarning(s"${e.getMessage}. Ignoring this directory.")
None
}
}.toSeq
if (dirs.isEmpty) {
throw new IOException("No subfolder can be created in " +
s"${localRootDirs.mkString(",")}.")
}
dirs
})
appDirectories(appId) = appLocalDirs
//worker讲接收到的信息封装成ExecutorRunner对象
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
workerUri,
conf,
appLocalDirs, ExecutorState.RUNNING)
executors(appId + "/" + execId) = manager
//启动Executor
manager.start()
//更新core和内存的状态
coresUsed += cores_
memoryUsed += memory_
//向Master发送Excutor的状态改变
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
} catch {
case e: Exception =>
logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
if (executors.contains(appId + "/" + execId)) {
executors(appId + "/" + execId).kill()
executors -= appId + "/" + execId
}
sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None))
}
}
上面主要完成的事情如下:
1.先会判断消息发送的master是否为alive
2.接着创建executor的工作目录和本地临时目录
3.将master发送来的信息封装为ExecutorRunner对象,ExecutorRunner用来管理一个executor进程的执行
4.调用ExecutorRunner的start方法
5.更新core和内存的状态
6.向Master发送消息,报告当前executor的状态
下面看一下start方法:
private[worker] def start() {
//创建一个线程
workerThread = new Thread("ExecutorRunner for " + fullId) {
override def run() { fetchAndRunExecutor() }
}
//启动线程
workerThread.start()
// Shutdown hook that kills actors on shutdown.
shutdownHook = ShutdownHookManager.addShutdownHook { () =>
// It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
// be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
if (state == ExecutorState.RUNNING) {
//启动失败就杀掉进程
state = ExecutorState.FAILED
}
killProcess(Some("Worker shutting down")) }
}
再继续看一下fetchAndRunExecutor的方法:
private def fetchAndRunExecutor() {
try {
// Launch the process
val subsOpts = appDesc.command.javaOpts.map {
Utils.substituteAppNExecIds(_, appId, execId.toString)
}
val subsCommand = appDesc.command.copy(javaOpts = subsOpts)
//创建ProcessBuilder的执行命令
val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),
memory, sparkHome.getAbsolutePath, substituteVariables)
val command = builder.command()
val formattedCommand = command.asScala.mkString("\"", "\" \"", "\"")
logInfo(s"Launch command: $formattedCommand")
//创建执行目录
builder.directory(executorDir)
//设置环境变量
builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))
// In case we are running this from within the Spark Shell, avoid creating a "scala"
// parent process for the executor command
builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")
// Add webUI log urls
val baseUrl =
if (conf.getBoolean("spark.ui.reverseProxy", false)) {
s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
} else {
s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
}
builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
//启动ProcessBuilder
process = builder.start()
val header = "Spark Executor Command: %s\n%s\n\n".format(
formattedCommand, "=" * 40)
// Redirect its stdout and stderr to files
// 重定向进程输出流文件
val stdout = new File(executorDir, "stdout")
stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
// 重定向进程错误流文件
val stderr = new File(executorDir, "stderr")
Files.write(header, stderr, StandardCharsets.UTF_8)
stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
// Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown)
// or with nonzero exit code
val exitCode = process.waitFor()
// 如果executor的状态为退出
state = ExecutorState.EXITED
val message = "Command exited with code " + exitCode
// 向worker发送executor状态改变
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
} catch {
case interrupted: InterruptedException =>
logInfo("Runner thread for executor " + fullId + " interrupted")
state = ExecutorState.KILLED
killProcess(None)
case e: Exception =>
logError("Error running executor", e)
// ExecutorState是FAILED就杀掉进程
state = ExecutorState.FAILED
killProcess(Some(e.toString))
}
}
主要完成的事情:
1.创建ProcessBuilder,用于在本地执行命令或者执行脚本
2.为ProcessBuilder创建执行目录,该目录为executorDir目录,即worker创建的executor工作目录
3.为ProcessBuilder设置环境变量
4.启动ProcessBuilder,生成进程,
5.重定向进程输出流文件
6.重定向进程错误流文件
7.等待获取executor进程的退出状态码,等到executor的状态为已退出,向worker发送消息,executor状态改变
至此整个Executor的启动流程就完成了。