Spark Submit流程源码详解(on yarn)

2 篇文章 0 订阅
2 篇文章 0 订阅

Spark Submit流程(on yarn)

本文基于spark3.0.1

 #执行提交命令:
 bin/spark-submit --master yarn
 --deploy-mode cluster    
 --class com.Tencent.spark.WordCount 
 /opt/module/spark-standalone/bin/spark-1.0-SNAPSHOT.jar hdfs://hadoop:9820/input
 
 执行脚本实际是执行:
  bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
 所以实际是在org.apache.spark.deploy.SparkSubmit类中开始执行

在这里插入图片描述

1.提交任务时,执行的是SparkSubmit类的mian方法 
2.main方法中:
val submit = new SparkSubmit() //创建SparkSubmit对象 -986行
submit.doSubmit(args) //利用submit对象执行提交动作 -1016行

3.调用父类的doSubmit方法:
  def doSubmit(args: Array[String]): Unit = {
	//解析参数
    val appArgs = parseArguments(args) //85行
  }

  protected def parseArguments(args: Array[String]): SparkSubmitArguments = {
    new SparkSubmitArguments(args) //98行
  }

4.SparkSubmitArguments类内部定义了提交任务所需的参数,如master,deployMode等。还执行了解释方法:
  parse(args.asJava) //108行

5.parse()方法内部提供handle()方法,用来解析参数:
	//参数opt和value都是用户提交时指定的参数,如--master yarn
  override protected def handle(opt: String, value: String): Boolean = { //331
    opt match { 
      case NAME =>  //->--name
        name = value
      case MASTER => //->--master
        master = value
	...
    }
  }
6.参数解释完毕后:
      appArgs.action match { //89
      // 因为action = Option(action).getOrElse(SUBMIT),--SparkSubmitArguments类中227行
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog) //所以默认走的是这个,开始提交
      case SparkSubmitAction.KILL => kill(appArgs)
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
      case SparkSubmitAction.PRINT_VERSION => printVersion()
    }
7. submit方法中,执行runMain(args, uninitLog) //180
8. runmain方法中:
   //准备提交环境
  val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)//871
  
  mainClass = Utils.classForName(childMainClass) //通过类名获得类,使用反射
  //创建mainClass的实例,并将类型转换为SparkApplication, //912
  app: SparkApplication = mainClass.getConstructor().newInstance().asInstanceOf[SparkApplication] 
      
  				所以childMainClass到底是谁?
                       cluster模式:childMainClass = YARN_CLUSTER_SUBMIT_CLASS //715
                                   =org.apache.spark.deploy.yarn.YarnClusterApplication
                       client模式:childMainClass = args.mainClass  //就是我们提交的类 627
      
9. app.start() // 928 实际是YarnClusterApplication.start
      
"-------提交环境准备完成,Application开始运行--------"

10.在client类中:
private[spark] class YarnClusterApplication extends SparkApplication { //1575行
  override def start(args: Array[String], conf: SparkConf): Unit = {
    // SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
    // so remove them from sparkConf here for yarn mode.
    conf.remove(JARS)
    conf.remove(FILES)

    new Client(new ClientArguments(args), conf, rpcEnv = null).run()
  }
}
    
11.Client类中有一个关键属性:创建yarn客户端对象
    val yarnClient = YarnClient.createYarnClient //72
12.创建对象后,执行run()方法

13. Client类的run方法中:
  def run(): Unit = { //1176
    this.appId = submitApplication() 
  }

14.submitApplication中:
//Submit an application running our ApplicationMaster to the ResourceManager.
def submitApplication(): ApplicationId = { //165
    var appId: ApplicationId = null
    try {
      launcherBackend.connect()
      //根据hadoopconf初始化客户端
      yarnClient.init(hadoopConf)
      //yarn客户端启动
      yarnClient.start()

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()
      val newAppResponse = newApp.getNewApplicationResponse()
      //获取应用的id,在yarn应用程序中,每一个应用都是有唯一的应用id
      appId = newAppResponse.getApplicationId()

      // Set up the appropriate contexts to launch our AM
      //内部是启动用于Appmaster的container和配置JVM的启动参数以及运行指令:启动AppMaster的命令
【集群模式】command = bin/java org.apache.spark.deploy.yarn.ApplicationMaster //拼接的指令,980行
【client模式】command = bin/java org.apache.spark.deploy.yarn.ExecutorLauncher
      val containerContext = createContainerLaunchContext(newAppResponse)  //196
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application,提交的是运行指令及相关资源,jar包,路径等
      yarnClient.submitApplication(appContext)
  }
}

"-----------以上是申请启动AppMaster------------"

在org.apache.spark.deploy.yarn.ApplicationMaster类中:
1. val amArgs = new ApplicationMasterArguments(args) //获取AM需要的参数 842
2. master = new ApplicationMaster(amArgs, sparkConf, yarnConf) //创建ApplicationMaster 859
3. override def run(): Unit = System.exit(master.run()) // ApplicationMaster运行  890  

4. run()方法中:
	if (isClusterMode) {  //264
	        runDriver()
	      }

5. runDriver()中, //启动用户定义的程序,说明Driver就是用户定义的程序
     userClassThread = startUserApplication()  // 492

6. startUserApplication()方法中, 获取用户定义的main方法   
	val mainMethod = userClassLoader.loadClass(args.userClass) //718
      .getMethod("main", classOf[Array[String]])

7. 新起一个线程,运行用户定义的main方法
	val userThread = new Thread {
      override def run(): Unit = {
            mainMethod.invoke(null, userArgs.toArray)
      }
    }

8.  userThread.setName("Driver")//新起的线程命名为Driver,说明了Driver是以一个线程的形式运行在AppMaster
    userThread.start() //启动线程,因为线程运行的是用户定义的main方法,所以会进行SparkContext初始化

"------插入SparkContext初始化部分内容---------"
SparkContext进行初始化taskScheduler时,会执行下面的程序:
val backend = cm.createSchedulerBackend(sc, masterUrl, scheduler) //SparkContext类中2928行
createSchedulerBackend是一个抽象方法,其实现方法:
  //YarnClusterManager类下
  override def createSchedulerBackend(...)
      case "cluster" =>
        new YarnClusterSchedulerBackend(scheduler.asInstanceOf[TaskSchedulerImpl], sc) //45
  }
 new了一个YarnClusterSchedulerBackend,而YarnClusterSchedulerBackend是继承了YarnSchedulerBackend,
YarnSchedulerBackend类又继承了CoarseGrainedSchedulerBackend类,在CoarseGrainedSchedulerBackend的构造器中有一个属性:
val driverEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME, createDriverEndpoint()) //435
说明在执行val backend = cm.createSchedulerBackend时,会执行YarnClusterSchedulerBackend父类的父类的构造器,继而创建了DriverEndpoint。也就是说,"在进行SparkContext初始化时,DriverEndpoint就被创建了""-------------插播完成,下面继续------------- "


9.  阻塞当前线程,等待Driver创建好SparkContext,返回结果 
    val sc: SparkContext = ThreadUtils.awaitResult //499

10. val rpcEnv = sc.env.rpcEnv //通过SparkContext获取通讯环境,说明通讯环境就在SparkContext中。

11. 通过yarnRMClient向RM注册AM
	registerAM(host, port, userConf, sc.ui.map(_.webUrl), appAttemptId) // 507

12. 获取Driver的EndpointRef。注:在创建SparkContext时,就已经创建了Driver的Endpoint
    val driverRef = rpcEnv.setupEndpointRef //509 

13. 关键代码: 
createAllocator(driverRef, userConf, rpcEnv, appAttemptId, distCacheConf) //512
createAllocator()方法中:通过YarnRMClient创建资源获取器
	allocator: YarnAllocator = client.createAllocator //465
	allocator是YarnAllocator的实例,而YarnAllocator是负责从RM申请containers,并决定container的用途。来	 源于YarnAllocator类的源码注释。

14. 拿allocator申请资源
	allocator.allocateResources() //479
	allocateResources()方法中:
	def allocateResources(): Unit = synchronized {
	    val allocateResponse = amClient.allocate(progressIndicator)
	    //申请可分配的资源
	    val allocatedContainers = allocateResponse.getAllocatedContainers()
		//如果申请到了, 就决定这个Container的用途
	    if (allocatedContainers.size > 0) {
	      handleAllocatedContainers(allocatedContainers.asScala)
	    }
    }
	handleAllocatedContainers()方法中,会确定Container的用途
	runAllocatedContainers(containersToUse) //481
      
	runAllocatedContainers方法中:
	//对于每个container
    for (container <- containersToUse) 
        //判断当前运行的Executor数量是否小于目标数量,注:Executor是运行在container中
        if (runningExecutors.size() < targetNumExecutors) )
            //如果小于,判断是否需要运行Containers
        	if (launchContainers) 
                //需要的话就起新线程运行Executor
        	  launcherPool.execute(() => {
        	    try {
        	      new ExecutorRunnable().run() //571
                }
              }
                                   )

           执行run()方法:
        def run(): Unit = {
 		   logDebug("Starting Executor Container")
 		   nmClient = NMClient.createNMClient() //创建和NM通信的客户端
 		   nmClient.init(conf)
 		   nmClient.start()
 		   startContainer() //Container是在NM中运行的,正式启动NM上的Container
        }
		startContainer()中:
		val commands = prepareCommand() //101行,准备在Container上运行的Java命令
		prepareCommand()中:拼接命令:
		//这是在container上运行的后台进程
		commands=/bin/java/org.apache.spark.executor.YarnCoarseGrainedExecutorBackend // 207

//目前为止:AppMaster进程中,有Driver线程,DriverEndpoint,YarnRMClient,NMclient
"总结:以上是启动AppMaster后再在AppMaster中起新线程启动Driver,然后申请Container的资源,然后启动Container,在Container上运行org.apache.spark.executor.YarnCoarseGrainedExecutorBackend后台进程"

 在org.apache.spark.executor.YarnCoarseGrainedExecutorBackend后台进程中
1.createFn :CoarseGrainedExecutorBackend = new YarnCoarseGrainedExecutorBackend //75 

//这个CoarseGrainedExecutorBackend并不是上面的CoarseGrainedExecutorBackend,这个是它的伴生对象
2.CoarseGrainedExecutorBackend.run(backendArgs, createFn) //81

3.run方法中:
val fetcher: RpcEnv = RpcEnv.create() //create()是用来获取NettyRpcEnv,是一个简单的环境   289
 
4.获取到通讯环境后,利用driverUrl获得Driver的EndpointRef
var driver: RpcEndpointRef = null
driver = fetcher.setupEndpointRefByURI(arguments.driverUrl) //303

5.根据Driver的EndpointRef向Driver发请求,请求SparkApp的配置
val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig(arguments.resourceProfileId)) //311

6.利用获得的SparkAppConfig,重新创建通讯环境。
val driverConf = new SparkConf()
val env = SparkEnv.createExecutorEnv(driverConf, ...) //331
                                   
7.向通讯环境注册一个名称为Executor的通讯端点,是一个CoarseGrainedExecutorBackend对象。注意,这个不是用于计算任务的Executor,是用来通讯的Endpoint
env.rpcEnv.setupEndpoint("Executor", ...) //334

8.'在注册端点后,Dispatcher会向端点的InbOx放入一个Onstart()方法,进行Endpoint的初始化
 
9.env.rpcEnv.awaitTermination() //阻塞当前线程,直到Onstart()结束
    
"总结:YarnCoarseGrainedExecutorBackend是另一个NOdeManager上在container上的一个后台进程,类似于AppMaster,运行Task的executor对象运行在这个进程中。而CoarseGrainedExecutorBackend是进程中的一个Endpoint,名称为Executor,负责和Driver进行通讯"

CoarseGrainedExecutorBackend.Onstart()

1. 可以看出,这个Onstart()作用就是连接driver进行一系列操作
	logInfo("Connecting to driver: " + driverUrl) //83

2. driver = Some(ref) //获取Driver的EnpointRef  92

3. 获取到Driver的EnpointRef后,CoarseGrainedExecutorBackend(名字叫做Executor)向Driver发送			RegisterExecutor的消息,要求回复True or false。这就是反向注册。
	ref.ask[Boolean](RegisterExecutor())  //93

"------下面是Driver收到RegisterExecutor消息后,如何处理-----"
Driver线程是以DriverEndpoint作为通讯端点
消息发送和接收的对应方法:
send --> receive
ask --> receiveAndReply
而上面ref.ask[Boolean](RegisterExecutor())是需要返回值的,所以会调用receiveAndReply方法

4. DriverEndpoint类中的receiveAndReply()//执行消息判断
   case RegisterExecutor(...) =>  //207
	//如果executorDataMap中已经有了这个executorId的ID,就报ID已存在的异常
     if (executorDataMap.contains(executorId)) {
       context.sendFailure(new IllegalStateException(s"Duplicate executor ID: $executorId"))
       //如果executorId, hostname在黑名单,就报Executor is blacklisted异常
     } else if (scheduler.nodeBlacklist.contains(hostname) ||
         isBlacklisted(executorId, hostname)) {
        context.sendFailure(new IllegalStateException(s"Executor is blacklisted: $executorId"))
     } else {
         //回复true
         context.reply(true)  //255
     }

"CoarseGrainedExecutorBackend收到Driver的回复后如何处理"

5. CoarseGrainedExecutorBackend的start()方法中
   case Success(_) =>  //96
        self.send(RegisteredExecutor) //如果Driver回复true,自己给自己发送一个RegisteredExecutor消息
   case Failure(e) =>  //失败,退出Executor
        exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)

6. 自己给自己发送RegisteredExecutor,需要在receive()方法中处理:                        
  override def receive: PartialFunction[Any, Unit] = {  //147
    case RegisteredExecutor =>
      logInfo("Successfully registered with driver") 
        //创建一个executor对象,这个才是我们经常说用于计算运行Task的executor
        executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false,
          resources = _resources)
      	//然后向Driver发送LaunchedExecutor(executorId)消息,已经创建executor
        driver.get.send(LaunchedExecutor(executorId))
  }
                        
7. DriverEndpoint需要在receive()中处理LaunchedExecutor(executorId))消息
      case LaunchedExecutor(executorId) =>  //198
        executorDataMap.get(executorId).foreach { data =>
          data.freeCores = data.totalCores
        }
        makeOffers(executorId)  //给executor发offer,意味着executor可以运行Task了
      
                      
                                   

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值