1.简介
源码版本spark-2.4.0
Spark 作为目前大数据处理的主流架构,吸引了不少人去学习它,本人也不例外,对Spark抱有强烈的好奇心,因此对Spark在执行我们编写的程序时,到底是怎么运行的,是我一直想要搞明白的事情。所以从本篇博客开始,我就详细的介绍Spark执行程序的流程。一方面为了巩固自己所学的知识,另一方面也是为了抛转引玉,希望能有更多的人来介绍Spark.
本篇博客为本系列的第一篇,Spark集群的启动以Standalone模式为例。欢迎不吝赐教。
我们都知道,在我们搭建好Spark 的环境后,要想运行我们的程序,首先必须要启动整个集群环境,这样我们才能提交程序,进而运行程序,得到我们想要的结果。因此我们麾下$SPARK_HOME/sbin/目录下执行start-all.sh 的命令,这个命令其实包含三个部分,一个是spark-config.sh用来检车配置参数的信息;一个是start-master.sh,用来启动master;还有一个是start-slaves.sh,用来启动worker。
2.master的启动
启动master 其实就是调用start-master.sh的脚本,下面来看一下这个脚本的内容。
#Starts the master on the machine this script is executed on.
//首先判断SPARK_HOME是否存在
if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
#NOTE: This exact class name is matched downstream by SparkSubmit.
#Any changes need to be reflected there.
CLASS="org.apache.spark.deploy.master.Master"
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
echo "Usage: ./sbin/start-master.sh [options]"
pattern="Usage:"
pattern+="\|Using Spark's default log4j profile:"
pattern+="\|Registered signal handlers for"
"${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2
exit 1
fi
ORIGINAL_ARGS="$@"
检查配置文件
. "${SPARK_HOME}/sbin/spark-config.sh"
检车环境参数
. "${SPARK_HOME}/bin/load-spark-env.sh"
Master的端口号为7077
if [ "$SPARK_MASTER_PORT" = "" ]; then
SPARK_MASTER_PORT=7077
fi
if [ "$SPARK_MASTER_HOST" = "" ]; then
case `uname` in
(SunOS)
SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
;;
(*)
SPARK_MASTER_HOST="`hostname -f`"
;;
esac
fi
if [ "$SPARK_MASTER_WEBUI_PORT" = "" ]; then
SPARK_MASTER_WEBUI_PORT=8080
fi
直接启动CLASS,也就是上面的org.apache.spark.deploy.master.Master,
"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
--host $SPARK_MASTER_HOST --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT \
$ORIGINAL_ARGS
上面的启动start-master.sh其实做了两件事,一件事是检查配置文件和环境参数,另外一个是调用org.apache.spark.deploy.master.Master,接下来进入这个类里面:
首先进入main函数:
def main(argStrings: Array[String]) {
Thread.setDefaultUncaughtExceptionHandler(new SparkUncaughtExceptionHandler(
exitOnUncaughtException = false))
Utils.initDaemon(log)
//创建了Spark 的config文件
val conf = new SparkConf
//定义了Master的参数
val args = new MasterArguments(argStrings, conf)
//创建RpcEnv环境,启动Rpc底层服务,为了和其他的Endpoint通信
val (rpcEnv, _, _) = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, conf)
//有消息处理就处理,没有就阻塞等待
rpcEnv.awaitTermination()
}
可以看到在main函数中最主要的就是创建rpcEnv的环境,启动NettyRpcEnv环境,master向NettyRpcEnv注册,形成masterEndpoint。
接下里进入到startRpcEnvAndEndpoint方法:
def startRpcEnvAndEndpoint(
host: String,
port: Int,
webUiPort: Int,
conf: SparkConf): (RpcEnv, Int, Option[Int]) = {
//创建SecurityManager
val securityMgr = new SecurityManager(conf)
//创建RpcEnv,其实是创建NettyRpcEnv
val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr)
//向NettyRpcEnv注册,形成MasterEndpoint
val masterEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME,
new Master(rpcEnv, rpcEnv.address, webUiPort, securityMgr, conf))
val portsResponse = masterEndpoint.askSync[BoundPortsResponse](BoundPortsRequest)
(rpcEnv, portsResponse.webUIPort, portsResponse.restPort)
}
这里主要完成的事情:
1,创建RpcEnv的通信环境,其实Rpc 的通信实现是NettyRpcEnv;
2,master 向NettyRpcEnv注册,调用setupEndpoint,根据出入的参数形成masterEndpoint;这里涉及到Spark 的RPC通信机制,在本篇博客里不做介绍,会在后面的博客里介绍。Master作为一个Endpoint的实例,它的生命周期同样是Onstart->reveive or reveiveAndreply->onStop;
这样,Master就已经启动了。
3,worker的启动
Worker的启动是执行脚本的顺序为start-slaves.sh->slaves.sh->start-slave.sh,最后调用org.apache.spark.deploy.worker.Worker。
接下来进入到org.apache.spark.deploy.worker.Worker类中。首先看一下main函数
def main(argStrings: Array[String]) {
Thread.setDefaultUncaughtExceptionHandler(new SparkUncaughtExceptionHandler(
exitOnUncaughtException = false))
Utils.initDaemon(log)
//创建配置文件
val conf = new SparkConf
//创建Worker的参数
val args = new WorkerArguments(argStrings, conf)
//创建RpcEnv的通信环境,并建立workerEndpoint的实例,用于和其他的Endpoint通信
val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores,
args.memory, args.masters, args.workDir, conf = conf)
// With external shuffle service enabled, if we request to launch multiple workers on one host,
// we can only successfully launch the first worker and the rest fails, because with the port
// bound, we may launch no more than one external shuffle service on each host.
// When this happens, we should give explicit reason of failure instead of fail silently. For
// more detail see SPARK-20989.
val externalShuffleServiceEnabled = conf.get(config.SHUFFLE_SERVICE_ENABLED)
val sparkWorkerInstances = scala.sys.env.getOrElse("SPARK_WORKER_INSTANCES", "1").toInt
require(externalShuffleServiceEnabled == false || sparkWorkerInstances <= 1,
"Starting multiple workers on one host is failed because we may launch no more than one " +
"external shuffle service on each host, please set spark.shuffle.service.enabled to " +
"false or set SPARK_WORKER_INSTANCES to 1 to resolve the conflict.")
//阻塞等待
rpcEnv.awaitTermination()
}
可以看到启动流程是和Master的启动类似的,都需要创建一个RpcEnv的环境,并向RpcEnv注册,形成一个Endpoint,返回一个EndpointRef的实例,用来和其他的Endpoint通信。接下来进入到startRpcEnvAndEndpoint方法中:
def startRpcEnvAndEndpoint(
host: String,
port: Int,
webUiPort: Int,
cores: Int,
memory: Int,
masterUrls: Array[String],
workDir: String,
workerNumber: Option[Int] = None,
conf: SparkConf = new SparkConf): RpcEnv = {
// The LocalSparkCluster runs multiple local sparkWorkerX RPC Environments
val systemName = SYSTEM_NAME + workerNumber.map(_.toString).getOrElse("")
//创建SecurityManager
val securityMgr = new SecurityManager(conf)
//根据传入的参数创建RpcEnv,其实也是创建NettyRpcEnv
val rpcEnv = RpcEnv.create(systemName, host, port, conf, securityMgr)
//创建Endpoint
val masterAddresses = masterUrls.map(RpcAddress.fromSparkURL(_))
rpcEnv.setupEndpoint(ENDPOINT_NAME, new Worker(rpcEnv, webUiPort, cores, memory,
masterAddresses, ENDPOINT_NAME, workDir, conf, securityMgr))
rpcEnv
}
4.Worker 向Master 注册
Worker也是Endpoint的一个实例,它的声明周期和Master一样,都是经历Onstart->reveive or reveiveAndreply->onStop,worker启动后,还需要和Master进行通信,也就是worker需要向Master进行注册。
进入onStart的方法:
override def onStart() {
//假定这个worker还没有向Master注册
assert(!registered)
logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format(
host, port, cores, Utils.megabytesToString(memory)))
logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
logInfo("Spark home: " + sparkHome)
//创建Worker的工作目录
createWorkDir()
//启动外部的Shuffle服务
startExternalShuffleService()
//创建Web的端口
webUi = new WorkerWebUI(this, workDir, webUiPort)
//绑定端口
webUi.bind()
workerWebUiUrl = shttp://$publicAddress:${webUi.boundPort}
//向Master注册
registerWithMaster()
//根据Worker的参数注册测量系统
metricsSystem.registerSource(workerSource)
//启动测量系统
metricsSystem.start()
// Attach the worker metrics servlet handler to the web ui after the metrics system is started.
metricsSystem.getServletHandlers.foreach(webUi.attachHandler)
}
上面完成Worker工作的一些参数,最终要是registerWithMaster,worker需要向Master进行注册,然后两者之间才能正常的通信。看一下registerWithMaster方法:
private def registerWithMaster() {
// onDisconnected may be triggered multiple times, so don't attempt registration
// if there are outstanding registration attempts scheduled.
registrationRetryTimer match {
case None =>
//worker没有注册
registered = false
//向所有的Master进行注册
registerMasterFutures = tryRegisterAllMasters()
//尝试连接的次数
connectionAttemptCount = 0
//重试次数如果超时,就退出
registrationRetryTimer = Some(forwordMessageScheduler.scheduleAtFixedRate(
new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
Option(self).foreach(_.send(ReregisterWithMaster))
}
},
INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
TimeUnit.SECONDS))
case Some(_) =>
logInfo("Not spawning another attempt to register with the master, since there is an" +
" attempt scheduled already.")
}
}
进入到tryRegisterAllMasters的方法中,
private def tryRegisterAllMasters(): Array[JFuture[_]] = {
//遍历所有的MasterRpcAddresses,
masterRpcAddresses.map { masterAddress =>
registerMasterThreadPool.submit(new Runnable {
override def run(): Unit = {
try {
logInfo("Connecting to master " + masterAddress + "...")
//根据传入的参数创建一个EndpointRef的一个实例,用于向Endpoint发送消息
val masterEndpoint = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
//通过EndpointRef向Master发送注册的信息。
sendRegisterMessageToMaster(masterEndpoint)
} catch {
case ie: InterruptedException => // Cancelled
case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
}
}
})
}
}
上面主要是遍历所有的Master的地址,创建相应的EndpointRef,然后向master 发送注册的消息。
再进入到sendRegisterMessageToMaster的方法:
private def sendRegisterMessageToMaster(masterEndpoint: RpcEndpointRef): Unit = {
/向Master发送RegisterWorker的消息
masterEndpoint.send(RegisterWorker(
workerId,
host,
port,
self,
cores,
memory,
workerWebUiUrl,
masterEndpoint.address))
}
Worker 根据自身的一些参数,以及Master的地址,向Master发总注册的消息。Master在接收到worker发来的消息后,因为发送是用send的方法,所以调用receive的模式匹配消息。接下来,我们就看看Master 在接收到worker发送来的消息后,是如何处理的。进入到Master的receive方法里面的case registerworker。
case RegisterWorker(
id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl, masterAddress) =>
logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
workerHost, workerPort, cores, Utils.megabytesToString(memory)))
//首先判断master的状态,如果是SRANDBY,就向Worker发送master处于STANDBY的状态
if (state == RecoveryState.STANDBY) {
workerRef.send(MasterInStandby)
//如果这个worker的ID已经存在,则向master发送注册失败的消息
} else if (idToWorker.contains(id)) {
workerRef.send(RegisterWorkerFailed("Duplicate worker ID"))
} else {
//如果不是上面的两种情况,那就正常注册worker,把worker的信息封装成WorkerInfo,赋给worker
val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
workerRef, workerWebUiUrl)
//根据参数,现在开始注册worker
if (registerWorker(worker)) {
//把worker用持久化引擎进行持久化
persistenceEngine.addWorker(worker)
//向worker发送消息,根据情况,是发送失败还是成功的消息
workerRef.send(RegisteredWorker(self, masterWebUiUrl, masterAddress))
//最后进行资源的调度
schedule()
} else {
val workerAddress = worker.endpoint.address
logWarning("Worker registration failed. Attempted to re-register worker at same " +
"address: " + workerAddress)
workerRef.send(RegisterWorkerFailed("Attempted to re-register worker at same address: "
+ workerAddress))
}
}
再进入到registerWorker的方法中:
private def registerWorker(worker: WorkerInfo): Boolean = {
// There may be one or more refs to dead workers on this same node (w/ different ID's),
// remove them.
//把处于DEAD状态的worker过滤掉
workers.filter { w =>
(w.host == worker.host && w.port == worker.port) && (w.state == WorkerState.DEAD)
}.foreach { w =>
workers -= w
}
val workerAddress = worker.endpoint.address
//如果已经包含了这个worker的地址,把这个旧的worker移除
if (addressToWorker.contains(workerAddress)) {
val oldWorker = addressToWorker(workerAddress)
if (oldWorker.state == WorkerState.UNKNOWN) {
// A worker registering from UNKNOWN implies that the worker was restarted during recovery.
// The old worker must thus be dead, so we will remove it and accept the new worker.
removeWorker(oldWorker, "Worker replaced by a new worker with same address")
} else {
logInfo("Attempted to re-register worker at same address: " + workerAddress)
return false
}
}
//更新变量,把worker加入到workers
workers += worker
//加入worker的ID
idToWorker(worker.id) = worker
//加入worker的地址
addressToWorker(workerAddress) = worker
true
}
Worker 注册完毕后会master会想worker发送消息,告知worker是否注册成功。调用workerRef.send(RegisteredWorker(…))的方法,worker 在接收到消息后是如何处理的,接下来看一下Worker中receive方法中的case RegistereWorkerResponse
case msg: RegisterWorkerResponse =>
//调用下面的方法来进行处理
handleRegisterResponse(msg)
进入到handleRegisterResponse方法:
private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized {
msg match {
//master 成功注册worker
case RegisteredWorker(masterRef, masterWebUiUrl, masterAddress) =>
if (preferConfiguredMasterAddress) {
logInfo("Successfully registered with master " + masterAddress.toSparkURL)
} else {
logInfo("Successfully registered with master " + masterRef.address.toSparkURL)
}
//设置成功注册的标志
registered = true
//更新master的映射信息
changeMaster(masterRef, masterWebUiUrl, masterAddress)
forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
//向master发送心跳
self.send(SendHeartbeat)
}
}, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS)
if (CLEANUP_ENABLED) {
logInfo(
s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
//worker向本身发送WorkDirCleanup消息,进行清理旧的工作目录
self.send(WorkDirCleanup)
}
}, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS)
}
val execs = executors.values.map { e =>
new ExecutorDescription(e.appId, e.execId, e.cores, e.state)
}
//向Master发送自己最新的状态
masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))
Worker会每隔段时间就会想Master发送心跳,master会专门启动一个线程每隔段时间就检查一下上次接收到的worker的时间,一旦这个时间超时了,那么就会把这个worker移除。
以上就建立了完成了master和worker的启动,以及两者之间的通信。通过心跳机制来检测worker是否处于alive状态,如果处于DEAD的状态就会把这个worker从映射中移除。