任务提交后执行前的逻辑:
client端:
1、spark-submit脚本提交任务,会通过反射的方式调用到我们自己提交的类的main方法
2、执行我们自己代码中的new SparkContext
2.1、创建actorSystem
2.2、创建TaskSchedulerImpl 任务分发的类
2.3、创建SparkDeploySchedulerBackend 调度任务
2.4、创建DAGScheduler 划分任务,创建一个线程,task阻塞队列
2.5、创建clinetActor,向master注册app(job-jar)信息
2.6、创建driverActor
Master端:
1、master接收到app的注册信息后将信息保存起来
2、通过worker的心跳和注册,知道集群中有多少资源
3、对比app需要的资源,然后根据集群中有的资源,进行资源分配(打伞、集中)
4、通知worker,启动executor
Worker端:
1、启动executor
2、和driverActor交互,监听和接收任务
集群提交任务后的调度spark-submit
在spark-submit的脚本启动启动了SparkSubmit类,调用了SparkSubmit的main方法
下面我们看一下SparkSubmit的main方法:
def main(args: Array[String]): Unit = { val appArgs = new SparkSubmitArguments(args) if (appArgs.verbose) { printStream.println(appArgs) } appArgs.action match { case SparkSubmitAction.SUBMIT => submit(appArgs) case SparkSubmitAction.KILL => kill(appArgs) case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs) } }
匹配到SUBMIT,看一下submit方法:
private[spark] def submit(args: SparkSubmitArguments): Unit = { val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args) def doRunMain(): Unit = { if (args.proxyUser != null) { val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser, UserGroupInformation.getCurrentUser()) try { proxyUser.doAs(new PrivilegedExceptionAction[Unit]() { override def run(): Unit = { runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose) } }) } catch { case e: Exception => // Hadoop's AuthorizationException suppresses the exception's stack trace, which // makes the message printed to the output by the JVM not very helpful. Instead, // detect exceptions with empty stack traces here, and treat them differently. if (e.getStackTrace().length == 0) { printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}") exitFn() } else { throw e } } } else { runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose) } } // In standalone cluster mode, there are two submission gateways: // (1) The traditional Akka gateway using o.a.s.deploy.Client as a wrapper // (2) The new REST-based gateway introduced in Spark 1.3 // The latter is the default behavior as of Spark 1.3, but Spark submit will fail over // to use the legacy gateway if the master endpoint turns out to be not a REST server. if (args.isStandaloneCluster && args.useRest) { try { printStream.println("Running Spark using the REST application submission protocol.") doRunMain() } catch { // Fail over to use the legacy submission gateway case e: SubmitRestConnectionException => printWarning(s"Master endpoint ${args.master} was not a REST server. " + "Falling back to legacy submission gateway instead.") args.useRest = false