Spark2.2源码分析:Driver的注册与启动

Spark2.2源码阅读顺序

1. Spark2.2源码分析:Spark-Submit提交任务
2. Spark2.2源码分析:Driver的注册与启动


当spark-submit命令提交后,client提交driver到master进行注册,在master里会对该driver做一系列操作(对应图中1部分)

在这里插入图片描述
Master接收到提交Driver请求后进行处理
org.apache.spark.deploy.master.Master

 override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
 	//此case处理提交Driver请求
    case RequestSubmitDriver(description) =>
      //如果此master不处于存活状态,返回client false状态
      if (state != RecoveryState.ALIVE) {
        val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " +
          "Can only accept driver submissions in ALIVE state."
        context.reply(SubmitDriverResponse(self, false, None, msg))
      } else {
        logInfo("Driver submitted " + description.command.mainClass)
        //创建Driver信息,由此master进程维护
        val driver = createDriver(description)
        //持久化driver信息,以用于之后的主备切换或者重启能重读driver信息
        persistenceEngine.addDriver(driver)
        //加入“等待调度的driver列表”
        waitingDrivers += driver
      	//加入master内存中所管理的driver列表
        drivers.add(driver)
        //由于有新的driver需要运行,所以开始调度资源
        schedule()
        //返回消息给Client,Client结束进程
        context.reply(SubmitDriverResponse(self, true, Some(driver.id),
          s"Driver successfully submitted as ${driver.id}"))
      }
      case ...
}

先分析第一步:创建DriverInfo
主要生成一个ID,和封装了提交时间、是否supervise,需要用到几个cpu等等信息

private def createDriver(desc: DriverDescription): DriverInfo = {
    val now = System.currentTimeMillis()
    val date = new Date(now)
    new DriverInfo(now, newDriverId(date), desc, date)
  }

第二步,由于有了新的需求(driver)需要调度,所以调用schduler方法进行资源分配
此schduler方法在master类里大概有10个地方都用到,很重要,后面详细分析,现在只分析里面的driver部分

 private def schedule(): Unit = {
    //standby master不做任何操作
    if (state != RecoveryState.ALIVE) {
      return
    }
    //拿到所有存活可用的worker,并且进行一个简单的swap打乱操作
    val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
    val numWorkersAlive = shuffledAliveWorkers.size
    //访问worker的下标
    var curPos = 0
    //循环每一个需要调度的driver
    for (driver <- waitingDrivers.toList) { 
      //标识一下,当前循环的driver是否启动了,如果启动,则下面的while结束
      var launched = false
      //累计一下已经访问过的worker个数,超过可用worker个数,则也跳出下面的循环
      var numWorkersVisited = 0
      while (numWorkersVisited < numWorkersAlive && !launched) {
        //挨个拿出打乱后的worker
        val worker = shuffledAliveWorkers(curPos)
        //访问个数自增1
        numWorkersVisited += 1
        //检查当前worker是否满足启动这个driver所需的条件(内存和cpu)
        if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
          //启动driver
          launchDriver(worker, driver)
          //把当前driver从等待列表中移除
          waitingDrivers -= driver
          //标识已经启动,结束本次循环(不一定启动成功)
          launched = true
        }
        //累加下标并保证了不超过可用worker长度
        curPos = (curPos + 1) % numWorkersAlive
      }
    }
    //启动executor
    startExecutorsOnWorkers()
  }

其中我们关系的最关键的应该是“启动driver”
刚才的循环里,已经选出了可用的并且资源足够的worker,传递进这个函数用于启动driver

private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
    logInfo("Launching driver " + driver.id + " on worker " + worker.id)
    //给worker和driver互添对方的实例引用
    worker.addDriver(driver)
    driver.worker = Some(worker)
    //用SparkRPC机制去同通知worker可以启动Driver进程了
    worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
    //给driver状态设置为运行中(因为send异步实现机制,所以如果启动失败或其他意外情况,worker会发送消息过来进行状态改变)
    driver.state = DriverState.RUNNING
  }

发送消息之后,worker进程会接收到消息并且处理

 override def receive: PartialFunction[Any, Unit] = synchronized {
  case LaunchDriver(driverId, driverDesc) =>
      logInfo(s"Asked to launch driver $driverId")
      //封装一个DriverRunner实例,它里面维护了Process,就靠它去启动一个Driver进程
      val driver = new DriverRunner(
        conf,
        driverId,
        workDir,
        sparkHome,
        //这个command的mainClass属性就是在客户端里面默认选择了的	
        //org.apache.spark.deploy.worker.DriverWrapper这个类
        //所以Driver启动是去执行DriverWrapper这个类的main方法
        driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
        self,
        workerUri,
        securityMgr)
      //在当前worker进程里也需要维护driver列表
      drivers(driverId) = driver
      //启动Driver
      driver.start()
	  //cpu和内存的使用累计,用于汇报给Master,作用于剩余资源统计
      coresUsed += driverDesc.cores
      memoryUsed += driverDesc.mem
}

org.apache.spark.deploy.worker.DriverRunner
开启一个线程去发送启动Driver的命令,它是阻塞的

/** Starts a thread to run and manage the driver. */
  private[worker] def start() = {
    new Thread("DriverRunner for " + driverId) {
      override def run() {
        var shutdownHook: AnyRef = null
        try {
          //因为jvm的退出可能导致一些状态并不会正常的保存,所以开启一个钩子函数
          shutdownHook = ShutdownHookManager.addShutdownHook { () =>
            //jvm退出的时候,杀掉当前的process
            logInfo(s"Worker shutting down, killing driver $driverId")
            kill()
          }

          // prepare driver jars and run driver
          // 这个方法会干很多事情
          // 用driver自身的id去创建本地工作目录
          // 下载jar包到这个目录
          // 在工作目录创建stdout,stderr的日志目录
          // 在一定的条件下,不断的尝试去发送命令启动Driver进程
          // 返回code
          // 正常来说,此线程执行到这,就会等待,没发生意外的话,master那边的driver state,就会是running不变
          val exitCode = prepareAndRunDriver()
			
		  // 如果有返回值返回,代表此进程已经结束、主动杀死、失败
          // set final state depending on if forcibly killed and process exit code
          finalState = if (exitCode == 0) {
            Some(DriverState.FINISHED)
          } else if (killed) {
            Some(DriverState.KILLED)
          } else {
            Some(DriverState.FAILED)
          }
        } catch {
          case e: Exception =>
            kill()
            finalState = Some(DriverState.ERROR)
            finalException = Some(e)
        } finally {
          //如果钩子还在,则移除
          if (shutdownHook != null) {
            ShutdownHookManager.removeShutdownHook(shutdownHook)
          }
        }
		// 通知worker主进程,此driver状态已经改变
        // notify worker of final driver state, possible exception
        worker.send(DriverStateChanged(driverId, finalState.get, finalException))
      }
    }.start()
  }

至此,driver的启动就完成了,DriverWrapper类里面的main方法,主要就是用反射去执行spark-submit
里指定的 --class 类的main方法,也就是我们所编写的spark类
最后,在看一下当exitCode 返回后会执行什么重要的操作.

org.apache.spark.deploy.worker.Worker

 override def receive: PartialFunction[Any, Unit] = synchronized {
	case driverStateChanged @ DriverStateChanged(driverId, state, exception) =>
      handleDriverStateChanged(driverStateChanged)
 }
 //专门处理driver状态改变
private[worker] def handleDriverStateChanged(driverStateChanged: DriverStateChanged): Unit = {
    val driverId = driverStateChanged.driverId
    val exception = driverStateChanged.exception
    val state = driverStateChanged.state
    //匹配状态打印不同日志
    state match {
      case DriverState.ERROR =>
        logWarning(s"Driver $driverId failed with unrecoverable exception: ${exception.get}")
      case DriverState.FAILED =>
        logWarning(s"Driver $driverId exited with failure")
      case DriverState.FINISHED =>
        logInfo(s"Driver $driverId exited successfully")
      case DriverState.KILLED =>
        logInfo(s"Driver $driverId was killed by user")
      case _ =>
        logDebug(s"Driver $driverId changed state to $state")
    }
    //通知master,它所管辖的这个driver状态改了
    //master会把它内存里管理的driver相关数据结构里移除或添加这个driver
    //master会把移除此driver的持久化信息
    //移除之后继续调用scheudler,因为此driver结束代表它的worker释放了新资源,可以调度其他任务
    sendToMaster(driverStateChanged)
    //从当前worker维护的driver列表里移除掉
    val driver = drivers.remove(driverId).get
    //添加到完成列表里,此列表,重启后将清空,因为没有持久化
    finishedDrivers(driverId) = driver
	//检查下内存里已完成的driver个数是否超过设置(spark.worker.ui.retainedDrivers,默认1000)个,
	//如果超过,则删除掉 max(总个数/10 ,1)的内存数据
    trimFinishedDriversIfNecessary()
    //释放一下资源
    memoryUsed -= driver.driverDesc.mem
    coresUsed -= driver.driverDesc.cores
  }

好了,到这里driver的注册与启动完全结束,接下来就是我们自己编写的Spark代码开始初始化和执行
3. Spark2.2源码分析:SparkContext初始化

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值