Spark- ApplicationMaster Class& ApplicationMaster Object即Spark AppMaster ,executor的启动源码解析
本作者也把代码的注释上传到 码云平台 帮助大家阅读和理解
Object ApplicationMaster
这个类是ApplicationMaster的伴生对象,当在yarn里面启动AppMaster的时候,会从Object ApplicationMaster main方法开始。
先来看看 启动时从main方法传进来的参数:
Seq(userClass) ++ userJar ++ primaryPyFile ++ primaryRFile ++ userArgs ++ Seq(–properties-file, hdfs___spark_conf__.properties)
userClass是 --class com.yyb.larn.main
userJar是 --jar hdfs://xx/yy.jar
primaryPyFile --primary-py-file None 用户的py文件
primaryRFile --primary-r-file None 用户的R文件
userArgs 是 --arg 用户自己Job的参数
–properties-file 这个是一个hdfs的zip文件,
hdfs://user/userName/.sparkStaging/appID/spark_conf/hdfs___spark_conf__.properties文件,里面包含了用户spark-submit 配置的 --conf的参数。这个文件在AppMaster运行的时候已经是本地的文件,在此节点运行目录里面。
下图的这个整个__spark_conf__.zip 有yarn负责 调度和下载。
下图是__spark_conf__.properties 文件的内容
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/lib/native
spark.yarn.jars=local\:/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2/jars/*
spark.hadoop.mapreduce.application.classpath=
spark.sql.hive.metastore.jars=${env\:HADOOP_COMMON_HOME}/../hive/lib/*\:${env\:HADOOP_COMMON_HOME}/client/*
spark.executor.memory=10g
spark.yarn.cache.types=FILE,FILE,FILE
spark.master=yarn
spark.driver.memory=4g
spark.hadoop.yarn.application.classpath=
spark.authenticate=false
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/lib/native
spark.sql.catalogImplementation=hive
spark.submit.deployMode=cluster
spark.dynamicAllocation.enabled=true
spark.sql.hive.metastore.version=1.1.0
spark.app.name=com.saic.portrait.dw.to_dw.statistical.profileAggr.ToDWFactProfileAggr_up5_Job
spark.eventLog.enabled=true
spark.shuffle.service.port=7337
spark.yarn.dist.jars=hdfs\://nameservice1/user/center/script/jars/mongo-java-driver-3.4.2.jar,hdfs\://nameservice1/user/center/script/jars/amqp-client-4.2.0.jar
spark.yarn.cache.visibilities=PUBLIC,PUBLIC,PUBLIC
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.yarn.cache.timestamps=1571901062837,1571901062692,1571901061930
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.yarn.cache.filenames=hdfs\://nameservice1/user/center/script/jars/personportrait-1.0-SNAPSHOT.jar\#__app__.jar,hdfs\://nameservice1/user/center/script/jars/mongo-java-driver-3.4.2.jar\#mongo-java-driver-3.4.2.jar,hdfs\://nameservice1/user/center/script/jars/amqp-client-4.2.0.jar\#amqp-client-4.2.0.jar
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.cache.sizes=1504481,1673643,491199
spark.yarn.cache.confArchive=hdfs\://nameservice1/user/center/.sparkStaging/application_1566690562041_9626/__spark_conf__.zip
spark.eventLog.dir=hdfs\://nameservice1/user/spark/spark2ApplicationHistory
spark.executor.instances=5
spark.ui.killEnabled=true
spark.yarn.historyServer.address=http\://njtest-cdh5-nn02.nj\:18089
下面来看看这个main方法:
def main(args: Array[String]): Unit = {
SignalUtils.registerLogger(log)
//参数解析 传进来的那些参数 这个是通过模式匹配来提取参数的
val amArgs = new ApplicationMasterArguments(args)
//见Class ApplicationMaster 部分的解读
master = new ApplicationMaster(amArgs)
//解读master的run 方法
System.exit(master.run())
}
Class ApplicationMaster
这个类是ApplicationMaster的伴生类。
先来看看这个类的构造方法:
private val isClusterMode = args.userClass != null
private val sparkConf = new SparkConf()
//--properties-file 一般不是空的,所以这个sparkConf保存用户和有关的config
//可以参考上面的__spark_conf__.properties 文件的这些属性,注意这里已经是本地文件了
if (args.propertiesFile != null) {
Utils.getPropertiesFromFile(args.propertiesFile).foreach { case (k, v) =>
sparkConf.set(k, v)
}
}
private val securityMgr = new SecurityManager(sparkConf)
sparkConf.getAll.foreach { case (k, v) =>
//把sparkConf 中的属性都设置到 sys 系统properties中,所以在运行用户Job的
//时候,需要从 sys的properties中获取config
//在 SparkConfig中 这个是默认开启 从从 sys的properties中获取config
//可以参考我的另一篇博文 https://blog.csdn.net/u010374412/article/details/103038530
//这里有一个 如果有多个driver同时运行在这个host上时,会不会出现属性冲突???
//当然不会了,这里设计到java 的System properties 这个类,这个类的实例只会在一个jvm实例中存在一份,即System properties 是在jvm内部共享的
sys.props(k) = v
}
private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
private val ugi = {}
//连接resourceMananger 以便申请资源
private val client = doAsUser { new YarnRMClient() }
//失败最大尝试次数
private val maxNumExecutorFailures = {
val effectiveNumExecutors =
if (Utils.isDynamicAllocationEnabled(sparkConf)) {
sparkConf.get(DYN_ALLOCATION_MAX_EXECUTORS)
} else {
sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0)
}
// By default, effectiveNumExecutors is Int.MaxValue if dynamic allocation is enabled. We need
// avoid the integer overflow here.
val defaultMaxNumExecutorFailures = math.max(3,
if (effectiveNumExecutors > Int.MaxValue / 2) Int.MaxValue else (2 * effectiveNumExecutors))
sparkConf.get(MAX_EXECUTOR_FAILURES).getOrElse(defaultMaxNumExecutorFailures)
}
@volatile private var exitCode = 0
@volatile private var unregistered = false
@volatile private var finished = false
@volatile private var finalStatus = getDefaultFinalStatus
@volatile private var finalMsg: String = ""
@volatile private var userClassThread: Thread = _
@volatile private var reporterThread: Thread = _
@volatile private var allocator: YarnAllocator = _
// A flag to check whether user has initialized spark context
@volatile private var registered = false
private val userClassLoader = {}
private val allocatorLock = new Object()
private val heartbeatInterval ={}
private val initialAllocationInterval = {}
private var nextAllocationInterval = initialAllocationInterval
private var rpcEnv: RpcEnv = null
private val sparkContextPromise = Promise[SparkContext]()
private var credentialRenewer: AMCredentialRenewer = _
private val localResources = {}
run方法:
final def run(): Int = {
doAsUser {
//到 runImpl方法
runImpl()
}
exitCode
}
runImpl方法:
private def runImpl(): Unit = {
try {//获取这个container的appid
val appAttemptId = client.getAttemptId()
var attemptID: Option[String] = None
if (isClusterMode) {
// Set the web ui port to be ephemeral for yarn so we don't conflict with
// other spark processes running on the same box
System.setProperty("spark.ui.port", "0")
// Set the master and deploy mode property to match the requested mode.
System.setProperty("spark.master", "yarn")
System.setProperty("spark.submit.deployMode", "cluster")
// Set this internal configuration if it is running on cluster mode, this
// configuration will be checked in SparkContext to avoid misuse of yarn cluster mode.
System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString())
attemptID = Option(appAttemptId.getAttemptId.toString)
}
new CallerContext(
"APPMASTER", sparkConf.get(APP_CALLER_CONTEXT),
Option(appAttemptId.getApplicationId.toString), attemptID).setCurrentContext()
logInfo("ApplicationAttemptId: " + appAttemptId)
// This shutdown hook should run *after* the SparkContext is shut down.
val priority = ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY - 1
ShutdownHookManager.addShutdownHook(priority) { () =>
val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
val isLastAttempt = client.getAttemptId().getAttemptId() >= maxAppAttempts
if (!finished) {
// The default state of ApplicationMaster is failed if it is invoked by shut down hook.
// This behavior is different compared to 1.x version.
// If user application is exited ahead of time by calling System.exit(N), here mark
// this application as failed with EXIT_EARLY. For a good shutdown, user shouldn't call
// System.exit(0) to terminate the application.
finish(finalStatus,
ApplicationMaster.EXIT_EARLY,
"Shutdown hook called before final status was reported.")
}
if (!unregistered) {
// we only want to unregister if we don't want the RM to retry
if (finalStatus == FinalApplicationStatus.SUCCEEDED || isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir()
}
}
}
// If the credentials file config is present, we must periodically renew tokens. So create
// a new AMDelegationTokenRenewer
if (sparkConf.contains(CREDENTIALS_FILE_PATH)) {
// Start a short-lived thread for AMCredentialRenewer, the only purpose is to set the
// classloader so that main jar and secondary jars could be used by AMCredentialRenewer.
val credentialRenewerThread = new Thread {
setName("AMCredentialRenewerStarter")
setContextClassLoader(userClassLoader)
override def run(): Unit = {
val credentialManager = new YARNHadoopDelegationTokenManager(
sparkConf,
yarnConf,
conf => YarnSparkHadoopUtil.hadoopFSsToAccess(sparkConf, conf))
val credentialRenewer =
new AMCredentialRenewer(sparkConf, yarnConf, credentialManager)
credentialRenewer.scheduleLoginFromKeytab()
}
}
credentialRenewerThread.start()
credentialRenewerThread.join()
}
if (isClusterMode) {
//这里是 主要的运行内容
runDriver()
} else {
runExecutorLauncher()
}
} catch {
case e: Exception =>
// catch everything else if not specifically handled
logError("Uncaught exception: ", e)
finish(FinalApplicationStatus.FAILED,
ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
"Uncaught exception: " + e)
}
}
runDriver:
private def runDriver(): Unit = {
addAmIpFilter(None)
//启动应用
userClassThread = startUserApplication()
// This a bit hacky, but we need to wait until the spark.driver.port property has
// been set by the Thread executing the user class.
logInfo("Waiting for spark context initialization...")
val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
try {
//等待用户Job线程完成sparkContext的初始化
//这里的sc初始化之后,会在YarnClusterScheduler类的postStartHook函数中调用(如果是yarn-cluster模式)
//ApplicationMaster Object的sparkContextInitialized方法,这个方法中
//会把初始化后的sc赋值给 ApplicationMaster 的实例的 sparkContextPromise的
//所以这里就会有sc这个用户线程初始化的sc的实例,默认sc的初始化最长100S
//但是何时调用这个YarnClusterScheduler类的postStartHook啦?
//在SparkContext的构造方法中,完成_schedulerBackend和_taskScheduler的创建后,会调用postApplicationStart()这个方法中,执行YarnClusterScheduler类的postStartHook 。。。
val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
Duration(totalWaitTime, TimeUnit.MILLISECONDS))
if (sc != null) {
//获取通信rpc环境
rpcEnv = sc.env.rpcEnv
val driverRef = createSchedulerRef(
sc.getConf.get("spark.driver.host"),
sc.getConf.get("spark.driver.port"))
//注册AppMaster
//这个里面会申请executors的资源,当sc实例化完成后,会阻塞用户线程,等待 这里申请资源
registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl))
registered = true
} else {
// Sanity check; should never happen in normal operation, since sc should only be null
// if the user app did not create a SparkContext.
throw new IllegalStateException("User did not initialize spark context!")
}
//这个触发 刚才阻塞的用户线程
resumeDriver()
//等待用户线程完成
userClassThread.join()
} catch {
case e: SparkException if e.getCause().isInstanceOf[TimeoutException] =>
logError(
s"SparkContext did not initialize after waiting for $totalWaitTime ms. " +
"Please check earlier log output for errors. Failing the application.")
finish(FinalApplicationStatus.FAILED,
ApplicationMaster.EXIT_SC_NOT_INITED,
"Timed out waiting for SparkContext.")
} finally {
resumeDriver()
}
}
startUserApplication:
private def startUserApplication(): Thread = {
logInfo("Starting the user application in a separate Thread")
var userArgs = args.userArgs//用户自己的Job参数
//scala程序的话不会走这个if条件
if (args.primaryPyFile != null && args.primaryPyFile.endsWith(".py")) {
// When running pyspark, the app is run using PythonRunner. The second argument is the list
// of files to add to PYTHONPATH, which Client.scala already handles, so it's empty.
userArgs = Seq(args.primaryPyFile, "") ++ userArgs
}
if (args.primaryRFile != null && args.primaryRFile.endsWith(".R")) {
// TODO(davies): add R dependencies here
}
//找到main 方法
val mainMethod = userClassLoader.loadClass(args.userClass)
.getMethod("main", classOf[Array[String]])
val userThread = new Thread {
override def run() {
try {
//传入参数,并且执行方法,从这个开始就是执行用户自己的Job了
mainMethod.invoke(null, userArgs.toArray)
finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
logDebug("Done running users class")
} catch {
case e: InvocationTargetException =>
e.getCause match {
case _: InterruptedException =>
// Reporter thread can interrupt to stop user class
case SparkUserAppException(exitCode) =>
val msg = s"User application exited with status $exitCode"
logError(msg)
finish(FinalApplicationStatus.FAILED, exitCode, msg)
case cause: Throwable =>
logError("User class threw exception: " + cause, cause)
finish(FinalApplicationStatus.FAILED,
ApplicationMaster.EXIT_EXCEPTION_USER_CLASS,
"User class threw exception: " + StringUtils.stringifyException(cause))
}
sparkContextPromise.tryFailure(e.getCause())
} finally {
// Notify the thread waiting for the SparkContext, in case the application did not
// instantiate one. This will do nothing when the user code instantiates a SparkContext
// (with the correct master), or when the user code throws an exception (due to the
// tryFailure above).
sparkContextPromise.trySuccess(null)
}
}
}
userThread.setContextClassLoader(userClassLoader)
userThread.setName("Driver")
userThread.start()
userThread
}
registerAM
这个方法里面是 申请和启动executors的内容,下面来详细看看这块代码:
//这个方法的执行实在 driver 的 非用户线程 执行的
private def registerAM(
_sparkConf: SparkConf,
_rpcEnv: RpcEnv,
driverRef: RpcEndpointRef,
uiAddress: Option[String]) = {
val appId = client.getAttemptId().getApplicationId().toString()/appID
val attemptId = client.getAttemptId().getAttemptId().toString()
val historyAddress = ApplicationMaster
.getHistoryServerAddress(_sparkConf, yarnConf, appId, attemptId)//Spark History 地址
//注意 这里是与 driver 的地址和端口 ref 的name 为 CoarseGrainedScheduler
//这个 会在 executor 中用到
val driverUrl = RpcEndpointAddress(
_sparkConf.get("spark.driver.host"),
_sparkConf.get("spark.driver.port").toInt,
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
logInfo {
val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt
val executorCores = sparkConf.get(EXECUTOR_CORES)
val dummyRunner = new ExecutorRunnable(None, yarnConf, sparkConf, driverUrl, "<executorId>",
"<hostname>", executorMemory, executorCores, appId, securityMgr, localResources)
dummyRunner.launchContextDebugInfo()
}
//准备 申请executor资源 返回YarnAllocator对象
allocator = client.register(driverUrl,
driverRef,
yarnConf,
_sparkConf,
uiAddress,
historyAddress,
securityMgr,
localResources)
//install YarnAM ref
//目的 在于 完成 executor的自己的启动之后, driver 发送 init 命令给 executor
rpcEnv.setupEndpoint("YarnAM", new AMEndpoint(rpcEnv, driverRef))
//开始 申请executor资源 详细的可以看看 下一个小节
allocator.allocateResources()
reporterThread = launchReporterThread()
}
YarnAllocator
方法:
//开始 申请executor资源
def allocateResources(): Unit = synchronized {
updateResourceRequests()//申请资源完毕
val progressIndicator = 0.1f
// Poll the ResourceManager. This doubles as a heartbeat if there are no pending container
// requests.
val allocateResponse = amClient.allocate(progressIndicator)
val allocatedContainers = allocateResponse.getAllocatedContainers()
if (allocatedContainers.size > 0) {
logDebug(("Allocated containers: %d. Current executor count: %d. " +
"Launching executor count: %d. Cluster resources: %s.")
.format(
allocatedContainers.size,
runningExecutors.size,
numExecutorsStarting.get,
allocateResponse.getAvailableResources))
//初始化 和 启动 executor
handleAllocatedContainers(allocatedContainers.asScala)
}
val completedContainers = allocateResponse.getCompletedContainersStatuses()
if (completedContainers.size > 0) {
logDebug("Completed %d containers".format(completedContainers.size))
processCompletedContainers(completedContainers.asScala)
logDebug("Finished processing %d completed containers. Current running executor count: %d."
.format(completedContainers.size, runningExecutors.size))
}
}
//尽可能申请 目标个executor,还未 启动
def updateResourceRequests(): Unit = {
val pendingAllocate = getPendingAllocate
val numPendingAllocate = pendingAllocate.size
//计算需要申请 多少个executor
val missing = targetNumExecutors - numPendingAllocate -
numExecutorsStarting.get - runningExecutors.size
logDebug(s"Updating resource requests, target: $targetNumExecutors, " +
s"pending: $numPendingAllocate, running: ${runningExecutors.size}, " +
s"executorsStarting: ${numExecutorsStarting.get}")
if (missing > 0) {//一般会走这一步
logInfo(s"Will request $missing executor container(s), each with " +
s"${resource.getVirtualCores} core(s) and " +
s"${resource.getMemory} MB memory (including $memoryOverhead MB of overhead)")
// Split the pending container request into three groups: locality matched list, locality
// unmatched list and non-locality list. Take the locality matched container request into
// consideration of container placement, treat as allocated containers.
// For locality unmatched and locality free container requests, cancel these container
// requests, since required locality preference has been changed, recalculating using
// container placement strategy.
val (localRequests, staleRequests, anyHostRequests) = splitPendingAllocationsByLocality(
hostToLocalTaskCounts, pendingAllocate)
// cancel "stale" requests for locations that are no longer needed
staleRequests.foreach { stale =>
amClient.removeContainerRequest(stale)
}
val cancelledContainers = staleRequests.size
if (cancelledContainers > 0) {
logInfo(s"Canceled $cancelledContainers container request(s) (locality no longer needed)")
}
// consider the number of new containers and cancelled stale containers available
val availableContainers = missing + cancelledContainers
// to maximize locality, include requests with no locality preference that can be cancelled
val potentialContainers = availableContainers + anyHostRequests.size
val containerLocalityPreferences = containerPlacementStrategy.localityOfRequestedContainers(
potentialContainers, numLocalityAwareTasks, hostToLocalTaskCounts,
allocatedHostToContainersMap, localRequests)
val newLocalityRequests = new mutable.ArrayBuffer[ContainerRequest]
containerLocalityPreferences.foreach {
case ContainerLocalityPreferences(nodes, racks) if nodes != null =>
newLocalityRequests += createContainerRequest(resource, nodes, racks)
case _ =>
}
if (availableContainers >= newLocalityRequests.size) {
// more containers are available than needed for locality, fill in requests for any host
for (i <- 0 until (availableContainers - newLocalityRequests.size)) {
newLocalityRequests += createContainerRequest(resource, null, null)
}
} else {
val numToCancel = newLocalityRequests.size - availableContainers
// cancel some requests without locality preferences to schedule more local containers
anyHostRequests.slice(0, numToCancel).foreach { nonLocal =>
amClient.removeContainerRequest(nonLocal)
}
if (numToCancel > 0) {
logInfo(s"Canceled $numToCancel unlocalized container requests to resubmit with locality")
}
}
newLocalityRequests.foreach { request =>
amClient.addContainerRequest(request)
}
if (log.isInfoEnabled()) {
val (localized, anyHost) = newLocalityRequests.partition(_.getNodes() != null)
if (anyHost.nonEmpty) {
logInfo(s"Submitted ${anyHost.size} unlocalized container requests.")
}
localized.foreach { request =>
logInfo(s"Submitted container request for host ${hostStr(request)}.")
}
}
} else if (numPendingAllocate > 0 && missing < 0) {
val numToCancel = math.min(numPendingAllocate, -missing)
logInfo(s"Canceling requests for $numToCancel executor container(s) to have a new desired " +
s"total $targetNumExecutors executors.")
val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, ANY_HOST, resource)
if (!matchingRequests.isEmpty) {
matchingRequests.iterator().next().asScala
.take(numToCancel).foreach(amClient.removeContainerRequest)
} else {
logWarning("Expected to find pending requests, but found none.")
}
}
}
//初始化 和 启动 executors
def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)
// Match incoming requests by host
val remainingAfterHostMatches = new ArrayBuffer[Container]
for (allocatedContainer <- allocatedContainers) {
matchContainerToRequest(allocatedContainer, allocatedContainer.getNodeId.getHost,
containersToUse, remainingAfterHostMatches)
}
// Match remaining by rack
val remainingAfterRackMatches = new ArrayBuffer[Container]
for (allocatedContainer <- remainingAfterHostMatches) {
val rack = resolver.resolve(conf, allocatedContainer.getNodeId.getHost)
matchContainerToRequest(allocatedContainer, rack, containersToUse,
remainingAfterRackMatches)
}
// Assign remaining that are neither node-local nor rack-local
val remainingAfterOffRackMatches = new ArrayBuffer[Container]
for (allocatedContainer <- remainingAfterRackMatches) {
matchContainerToRequest(allocatedContainer, ANY_HOST, containersToUse,
remainingAfterOffRackMatches)
}
if (!remainingAfterOffRackMatches.isEmpty) {
logDebug(s"Releasing ${remainingAfterOffRackMatches.size} unneeded containers that were " +
s"allocated to us")
for (container <- remainingAfterOffRackMatches) {
internalReleaseContainer(container)
}
}
//方法转到这里
runAllocatedContainers(containersToUse)
logInfo("Received %d containers from YARN, launching executors on %d of them."
.format(allocatedContainers.size, containersToUse.size))
}
//启动 所有的executors
private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = {
for (container <- containersToUse) {
executorIdCounter += 1
val executorHostname = container.getNodeId.getHost
val containerId = container.getId
val executorId = executorIdCounter.toString
assert(container.getResource.getMemory >= resource.getMemory)
logInfo(s"Launching container $containerId on host $executorHostname " +
s"for executor with ID $executorId")
def updateInternalState(): Unit = synchronized {
runningExecutors.add(executorId)
numExecutorsStarting.decrementAndGet()
executorIdToContainer(executorId) = container
containerIdToExecutorId(container.getId) = executorId
val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
new HashSet[ContainerId])
containerSet += containerId
allocatedContainerToHostMap.put(containerId, executorHostname)
}
if (runningExecutors.size() < targetNumExecutors) {
numExecutorsStarting.incrementAndGet()
if (launchContainers) {
//在 ContainerLauncher 线程池里面 运行 ExecutorRunnable的代码 ,最大的池数 25
launcherPool.execute(new Runnable {
override def run(): Unit = {
try {//这里又转到了 ExecutorRunnable 类 详情见 下一节
new ExecutorRunnable(
Some(container),
conf,
sparkConf,
driverUrl,
executorId,
executorHostname,
executorMemory,
executorCores,
appAttemptId.getApplicationId.toString,
securityMgr,
localResources
).run()
updateInternalState()
} catch {
case e: Throwable =>
numExecutorsStarting.decrementAndGet()
if (NonFatal(e)) {
logError(s"Failed to launch executor $executorId on container $containerId", e)
// Assigned container should be released immediately
// to avoid unnecessary resource occupation.
amClient.releaseAssignedContainer(containerId)
} else {
throw e
}
}
}
})
} else {
// For test only
updateInternalState()
}
} else {
logInfo(("Skip launching executorRunnable as running executors count: %d " +
"reached target executors count: %d.").format(
runningExecutors.size, targetNumExecutors))
}
}
}
ExecutorRunnable
这个类 负责 组装 executor 的启动命令
//开始方法
def run(): Unit = {
logDebug("Starting Executor Container")
nmClient = NMClient.createNMClient()
nmClient.init(conf)
nmClient.start()
//主要在这个方法
startContainer()
}
def startContainer(): java.util.Map[String, ByteBuffer] = {
val ctx = Records.newRecord(classOf[ContainerLaunchContext])
.asInstanceOf[ContainerLaunchContext]
val env = prepareEnvironment().asJava
ctx.setLocalResources(localResources.asJava)
ctx.setEnvironment(env)
val credentials = UserGroupInformation.getCurrentUser().getCredentials()
val dob = new DataOutputBuffer()
credentials.writeTokenStorageToStream(dob)
ctx.setTokens(ByteBuffer.wrap(dob.getData()))
//组装 启动 executor 命令
val commands = prepareCommand()
ctx.setCommands(commands.asJava)
ctx.setApplicationACLs(
YarnSparkHadoopUtil.getApplicationAclsForYarn(securityMgr).asJava)
// If external shuffle service is enabled, register with the Yarn shuffle service already
// started on the NodeManager and, if authentication is enabled, provide it with our secret
// key for fetching shuffle files later
if (sparkConf.get(SHUFFLE_SERVICE_ENABLED)) {
val secretString = securityMgr.getSecretKey()
val secretBytes =
if (secretString != null) {
// This conversion must match how the YarnShuffleService decodes our secret
JavaUtils.stringToBytes(secretString)
} else {
// Authentication is not enabled, so just provide dummy metadata
ByteBuffer.allocate(0)
}
ctx.setServiceData(Collections.singletonMap("spark_shuffle", secretBytes))
}
// Send the start request to the ContainerManager
try {
//启动这个 container
nmClient.startContainer(container.get, ctx)
} catch {
case ex: Exception =>
throw new SparkException(s"Exception while starting container ${container.get.getId}" +
s" on host $hostname", ex)
}
}
//组装 启动 executor 命令
private def prepareCommand(): List[String] = {
// Extra options for the JVM
val javaOpts = ListBuffer[String]()
// Set the environment variable through a command prefix
// to append to the existing value of the variable
var prefixEnv: Option[String] = None
// Set the JVM memory
val executorMemoryString = executorMemory + "m"
javaOpts += "-Xmx" + executorMemoryString
// Set extra Java options for the executor, if defined
sparkConf.get(EXECUTOR_JAVA_OPTIONS).foreach { opts =>
javaOpts ++= Utils.splitCommandString(opts).map(YarnSparkHadoopUtil.escapeForShell)
}
sparkConf.get(EXECUTOR_LIBRARY_PATH).foreach { p =>
prefixEnv = Some(Client.getClusterPath(sparkConf, Utils.libraryPathEnvPrefix(Seq(p))))
}
javaOpts += "-Djava.io.tmpdir=" +
new Path(Environment.PWD.$$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR)
// Certain configs need to be passed here because they are needed before the Executor
// registers with the Scheduler and transfers the spark configs. Since the Executor backend
// uses RPC to connect to the scheduler, the RPC settings are needed as well as the
// authentication settings.
sparkConf.getAll
.filter { case (k, v) => SparkConf.isExecutorStartupConf(k) }
.foreach { case (k, v) => javaOpts += YarnSparkHadoopUtil.escapeForShell(s"-D$k=$v") }
// Commenting it out for now - so that people can refer to the properties if required. Remove
// it once cpuset version is pushed out.
// The context is, default gc for server class machines end up using all cores to do gc - hence
// if there are multiple containers in same node, spark gc effects all other containers
// performance (which can also be other spark containers)
// Instead of using this, rely on cpusets by YARN to enforce spark behaves 'properly' in
// multi-tenant environments. Not sure how default java gc behaves if it is limited to subset
// of cores on a node.
/*
else {
// If no java_opts specified, default to using -XX:+CMSIncrementalMode
// It might be possible that other modes/config is being done in
// spark.executor.extraJavaOptions, so we don't want to mess with it.
// In our expts, using (default) throughput collector has severe perf ramifications in
// multi-tenant machines
// The options are based on
// http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#0.0.0.%20When%20to%20Use
// %20the%20Concurrent%20Low%20Pause%20Collector|outline
javaOpts += "-XX:+UseConcMarkSweepGC"
javaOpts += "-XX:+CMSIncrementalMode"
javaOpts += "-XX:+CMSIncrementalPacing"
javaOpts += "-XX:CMSIncrementalDutyCycleMin=0"
javaOpts += "-XX:CMSIncrementalDutyCycle=10"
}
*/
// For log4j configuration to reference
javaOpts += ("-Dspark.yarn.app.container.log.dir=" + ApplicationConstants.LOG_DIR_EXPANSION_VAR)
val userClassPath = Client.getUserClasspath(sparkConf).flatMap { uri =>
val absPath =
if (new File(uri.getPath()).isAbsolute()) {
Client.getClusterPath(sparkConf, uri.getPath())
} else {
Client.buildPath(Environment.PWD.$(), uri.getPath())
}
Seq("--user-class-path", "file:" + absPath)
}.toSeq
YarnSparkHadoopUtil.addOutOfMemoryErrorArgument(javaOpts)
//启动命令
val commands = prefixEnv ++
Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
javaOpts ++
Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",
"--driver-url", masterAddress,
"--executor-id", executorId,
"--hostname", hostname,
"--cores", executorCores.toString,
"--app-id", appId) ++
userClassPath ++
Seq(
s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")
// TODO: it would be nicer to just make sure there are no null commands here
commands.map(s => if (s == null) "null" else s).toList
}
object CoarseGrainedExecutorBackend & class CoarseGrainedExecutorBackend
当SparkAPPMaster 申请到execution,启动 execution的时候 就会来到CoarseGrainedExecutorBackend 的伴生类和伴生对象这里。这个是运行在 executor的主要类。
包含的功能包括:
- 向driver注册executor。
- 处理 向driver注册成功后的处理
- 处理 向driver注册失败后的处理
- 运行 driver 分配的Task
- kill 正在运行的Task
- Stop executor
- ShutDown executor
object CoarseGrainedExecutorBackend
看看详细的代码:
private[spark] object CoarseGrainedExecutorBackend extends Logging {
//通过上一步 解析玩参数后,来启动 executor
private def run(
driverUrl: String,
executorId: String,
hostname: String,
cores: Int,
appId: String,
workerUrl: Option[String],
userClassPath: Seq[URL]) {
Utils.initDaemon(log)
SparkHadoopUtil.get.runAsSparkUser { () =>
// Debug code
Utils.checkHost(hostname)
// Bootstrap to fetch the driver's Spark properties.
val executorConf = new SparkConf
/**
* NettyEnv 通信环境已经准备好了
* 但是这个 fetcher 之后为了获取 driver的 SparkConfig
* 之后 就会 shutdown
*/
val fetcher: RpcEnv = RpcEnv.create(
"driverPropsFetcher",
hostname,//hostname 就是 本机 name
-1,
executorConf,
new SecurityManager(executorConf),
clientMode = true)
// setupEndpointRefByURI 这个方法会 验证driverUrl 的host是否 已经install了RpcEndpointVerifier 这个ref
//如果 这个host addr是正常的话,那么这个host 必然会有host是否有RpcEndpointVerifier 这个ref 因为在 NettyEnv 中 会首先 install RpcEndpointVerifier这个 ref
val driver: RpcEndpointRef = fetcher.setupEndpointRefByURI(driverUrl)
//使用和driver 通信的 ref 和driver通信 的org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend 的driver install 过的 ref
//driver的处理过程在 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend line 231
val cfg: SparkAppConfig = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig)//获取driver 中 以spark. 开头的 SparkConfig 可以看到 2个类型都是SparkAppConfig
val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
fetcher.shutdown()
//下面才会 创建 executor 的 NettyEnv
// Create SparkEnv using properties we fetched from the driver.
val driverConf = new SparkConf()
for ((key, value) <- props) {
// this is required for SSL in standalone mode
if (SparkConf.isExecutorStartupConf(key)) {
driverConf.setIfMissing(key, value)
} else {
driverConf.set(key, value)
}
}
if (driverConf.contains("spark.yarn.credentials.file")) {
logInfo("Will periodically update credentials from: " +
driverConf.get("spark.yarn.credentials.file"))
Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil")
.getMethod("startCredentialUpdater", classOf[SparkConf])
.invoke(null, driverConf)
}
cfg.hadoopDelegationCreds.foreach { tokens =>
SparkHadoopUtil.get.addDelegationTokens(tokens, driverConf)
}
//创建 通信 环境 NettyEnv
val env = SparkEnv.createExecutorEnv(
driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)
//executor install Executor ref
//启动 CoarseGrainedExecutorBackend endPoint
//下一步可以直接 观察CoarseGrainedExecutorBackend这个endPoint了
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
workerUrl.foreach { url =>
env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
}
env.rpcEnv.awaitTermination()
if (driverConf.contains("spark.yarn.credentials.file")) {
Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil")
.getMethod("stopCredentialUpdater")
.invoke(null)
}
}
}
//yarn cluster executor 启动入口
def main(args: Array[String]) {
var driverUrl: String = null //driver 启动 executor 的时候组装命令的时候 传递过来的
var executorId: String = null //一般是从0开始递增的
var hostname: String = null //获取的是此台机器的 hostname
var cores: Int = 0
var appId: String = null
var workerUrl: Option[String] = None
val userClassPath = new mutable.ListBuffer[URL]()
var argv = args.toList
while (!argv.isEmpty) {
argv match {
case ("--driver-url") :: value :: tail =>
driverUrl = value
argv = tail
case ("--executor-id") :: value :: tail =>
executorId = value
argv = tail
case ("--hostname") :: value :: tail =>
hostname = value
argv = tail
case ("--cores") :: value :: tail =>
cores = value.toInt
argv = tail
case ("--app-id") :: value :: tail =>
appId = value
argv = tail
case ("--worker-url") :: value :: tail =>
// Worker url is used in spark standalone mode to enforce fate-sharing with worker
workerUrl = Some(value)
argv = tail
case ("--user-class-path") :: value :: tail =>
userClassPath += new URL(value)
argv = tail
case Nil =>
case tail =>
// scalastyle:off println
System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
// scalastyle:on println
printUsageAndExit()
}
}
//判断参数 是否合规
if (driverUrl == null || executorId == null || hostname == null || cores <= 0 ||
appId == null) {
printUsageAndExit() //不合规的 打印帮助命令 并且exit 和 exitCode = 1
}
//转移 到run方法
run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
System.exit(0)
}
private def printUsageAndExit() = {
// scalastyle:off println
System.err.println(
"""
|Usage: CoarseGrainedExecutorBackend [options]
|
| Options are:
| --driver-url <driverUrl>
| --executor-id <executorId>
| --hostname <hostname>
| --cores <cores>
| --app-id <appid>
| --worker-url <workerUrl>
| --user-class-path <url>
|""".stripMargin)
// scalastyle:on println
System.exit(1)
}
}
class CoarseGrainedExecutorBackend
看看这个类的代码:
/**
* 这个伴生类和伴生对象
* 是 driver 在 yarn 启动executor 的主要类
* @param rpcEnv
* @param driverUrl
* @param executorId
* @param hostname
* @param cores
* @param userClassPath
* @param env
*/
private[spark] class CoarseGrainedExecutorBackend(
override val rpcEnv: RpcEnv,
driverUrl: String,
executorId: String,
hostname: String,
cores: Int,
userClassPath: Seq[URL],
env: SparkEnv)
extends ThreadSafeRpcEndpoint with ExecutorBackend with Logging { //注意这是一个 endPoint
private[this] val stopping = new AtomicBoolean(false)
var executor: Executor = null
@volatile var driver: Option[RpcEndpointRef] = None
// If this CoarseGrainedExecutorBackend is changed to support multiple threads, then this may need
// to be changed so that we don't share the serializer instance across threads
private[this] val ser: SerializerInstance = env.closureSerializer.newInstance()
//初始化 方法
override def onStart() {
/**
* val driverUrl = RpcEndpointAddress(
* _sparkConf.get("spark.driver.host"),
* _sparkConf.get("spark.driver.port").toInt,
* CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
*/
//注意 driver url 注册的是 CoarseGrainedSchedulerBackend ref,所以下面的 对应到 driver 的处理的ref 就是 CoarseGrainedSchedulerBackend install 过的
//CoarseGrainedSchedulerBackend的内部类的 DriverEndpoint
logInfo("Connecting to driver: " + driverUrl)
val x: Future[RpcEndpointRef] = rpcEnv.asyncSetupEndpointRefByURI(driverUrl)
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
// This is a very fast action so we can use "ThreadUtils.sameThread"
driver = Some(ref)
//这里是向driver 注册 本executor,对应的处理逻辑 看 CoarseGrainedSchedulerBackend的内部类的 DriverEndpoint
//RegisterExecutor 的序列化需要注意 他里面有 executorRef: RpcEndpointRef 所以就涉及到RpcEndpointRef的序列化,在这里的话是就是 NettyRpcEndpointRef 的序列化
//可以具体的看看 NettyRpcEndpointRef 类的定义,只有 endpointAddress的参数 会被序列化 这个类也 重写了readObject 和 writeObject 方法
ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
}(ThreadUtils.sameThread).onComplete {
// This is a very fast action so we can use "ThreadUtils.sameThread"
//这里 异步等待 返回消息
case Success(msg) =>
// Always receive `true`. Just ignore it
case Failure(e) => //注册不成功的话 则会 可以设置notifyDriver=true通知driver 且自己退出 exitCode=1
exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
}(ThreadUtils.sameThread)
}
def extractLogUrls: Map[String, String] = {
val prefix = "SPARK_LOG_URL_"
sys.env.filterKeys(_.startsWith(prefix))
.map(e => (e._1.substring(prefix.length).toLowerCase(Locale.ROOT), e._2))
}
//处理 driver的 one-way 类型消息
override def receive: PartialFunction[Any, Unit] = {
//在driver 注册成功的收到成功消息后的处理逻辑
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
try {
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false) //实例化 Executor对象
} catch {
case NonFatal(e) =>
exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
}
//这里响应的是 在 driver 注册不成功 返回的具体的消息,此executor 退出,一般可以通知driver 有executor remove 的消息
case RegisterExecutorFailed(message) =>
exitExecutor(1, "Slave registration failed: " + message)
//这里是 运行 driver 分配过来的任务的
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
executor.launchTask(this, taskDesc)
}
//这里是 处理 driver kill task的
case KillTask(taskId, _, interruptThread, reason) =>
if (executor == null) {
exitExecutor(1, "Received KillTask command but executor was null")
} else {
executor.killTask(taskId, interruptThread, reason)
}
case StopExecutor =>
stopping.set(true)
logInfo("Driver commanded a shutdown")
// Cannot shutdown here because an ack may need to be sent back to the caller. So send
// a message to self to actually do the shutdown.
self.send(Shutdown) //使用 override val rpcEnv: RpcEnv 发送 Shutdown 消息到driver
case Shutdown =>
stopping.set(true)
new Thread("CoarseGrainedExecutorBackend-stop-executor") {
override def run(): Unit = {
// executor.stop() will call `SparkEnv.stop()` which waits until RpcEnv stops totally.
// However, if `executor.stop()` runs in some thread of RpcEnv, RpcEnv won't be able to
// stop until `executor.stop()` returns, which becomes a dead-lock (See SPARK-14180).
// Therefore, we put this line in a new thread.
executor.stop()
}
}.start()
case UpdateDelegationTokens(tokenBytes) =>
logInfo(s"Received tokens of ${tokenBytes.length} bytes")
SparkHadoopUtil.get.addDelegationTokens(tokenBytes, env.conf)
}
override def onDisconnected(remoteAddress: RpcAddress): Unit = {
if (stopping.get()) {
logInfo(s"Driver from $remoteAddress disconnected during shutdown")
} else if (driver.exists(_.address == remoteAddress)) {
exitExecutor(1, s"Driver $remoteAddress disassociated! Shutting down.", null,
notifyDriver = false)
} else {
logWarning(s"An unknown ($remoteAddress) driver disconnected.")
}
}
//向driver 发送更新状态的 消息
override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) {
val msg = StatusUpdate(executorId, taskId, state, data)
driver match {
case Some(driverRef) => driverRef.send(msg)
case None => logWarning(s"Drop $msg because has not yet connected to driver")
}
}
/**
* This function can be overloaded by other child classes to handle
* executor exits differently. For e.g. when an executor goes down,
* back-end may not want to take the parent process down.
*/
protected def exitExecutor(code: Int,
reason: String,
throwable: Throwable = null,
notifyDriver: Boolean = true) = {
val message = "Executor self-exiting due to : " + reason
if (throwable != null) {
logError(message, throwable)
} else {
logError(message)
}
if (notifyDriver && driver.nonEmpty) {
driver.get.send(RemoveExecutor(executorId, new ExecutorLossReason(reason)))
}
System.exit(code)
}
}
总结
Spark yarn-cluster 模式下,driver段有2个线程在运行:
1.一个是AppMaster 线程
2.一个是 用户线程
用户线程是在 AppMaster 线程 负责启动的,两个线程直接有交互的功能。
在AppMaster 线程 启动 用户线程之后,AppMaster 线程线程会阻塞等待用户线程完成SparkContext的初始化的完成,SparkContext的初始化包括 spark.driver.host,spark.driver.port(这个端口是在启动时候传入0端口,有系统随机分配的端口)等信息之后,唤醒AppMaster 线程,阻塞用户线程,AppMaster 线程完成registerAM 申请资源,再唤醒 用户线程的执行;最后AppMaster 线程等待用户线程完成。
3. executor的 启动 命令:
val commands = prefixEnv ++
Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
javaOpts ++
Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",
"--driver-url", masterAddress,
"--executor-id", executorId,
"--hostname", hostname,
"--cores", executorCores.toString,
"--app-id", appId) ++
userClassPath ++
Seq(
s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")