sparkContext初始化后会注册Application,然后会调用schedule方法,如何为Application在worker上启动Executor,Executor启动后,DAGScheduler和TaskScheduler才能分配task给Executor来进行计算。所以schedule是把整个流程窜起来的重点。
private def schedule(): Unit = { //standby master是不会进行Application等资源调度的 if (state != RecoveryState.ALIVE) { return } // Drivers take strict precedence over executors //第一行重要代码,Random.shuffle的原理,将集合的元素随机大量 取出workers中所有之前注册上来的workers,进行过滤,必须是状态ALIVE的worker 对状态为ALIVE的worker,调用Random的shuffle方法进行随机打乱 val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0 //首先调度Driver,什么情况下会注册Driver并且导致Driver被调度,其实只有用yarn-cluster模式提交 才会。因为standalone和yarn-client模式,都是在本地直接启动Driver,而不会来注册Driver,更不可能调度Driver 遍历waitingDrivers ArrayBuffer for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers // We assign workers to each waiting driver in a round-robin fashion. For each driver, we // start from the last worker that was assigned a driver, and continue onwards until we have // explored all alive workers. var launched = false var numWorkersVisited = 0 //只要有活着的worker没有遍历到,并且driver还没有被启动,也就是launched为false while (numWorkersVisited < numWorkersAlive && !launched) { val worker = shuffledAliveWorkers(curPos) numWorkersVisited += 1 //如果当前这个worker的空闲内存和cpu数量大于等于Driver需要的 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { //启动Driver launchDriver(worker, driver) //并且将driver从ArrayBuffer中移除 waitingDrivers -= driver launched = true } //将指针指向下一个worker curPos = (curPos + 1) % numWorkersAlive } } startExecutorsOnWorkers() }
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) { logInfo("Launching driver " + driver.id + " on worker " + worker.id) //将driver加入worker内存的缓存结构 将worker内使用的内存和cpu数量,都加上driver需要的内存和cpu数量 worker.addDriver(driver) //同时把worker也加入到driver的缓存结构中 driver.worker = Some(worker) //然后调用worker的RpcEndpoint,给它发送LaunchDriver消息,让worker来启动Driver worker.endpoint.send(LaunchDriver(driver.id, driver.desc)) //将driver的状态设置为Running driver.state = DriverState.RUNNING }
Application的调度机制(核心之核心 )
两种算法:一种是spreadOutApps(默认),另一种是非spreadOutApps
通过spreadOutApps(默认)算法,其实 会将每个application,要启动的executor都平均分配 到每个worker上
比如有20cpu core,有10个worker,那么实际会遍历两遍,每次循环,每个worker分配一个core
最后每个worker分配了两个core
非spreadOutApps算法与上面的正好相反,每个application,都尽可能少的分配到worker上去, 比如总共有10个worker,每个有10个core application总共要分配20个core,那么只会分配到两个worker上,每个worker都占满了这10个core那么其它的application只能分配另外的worker上去了。 所以我们在spark-submit中配置了要10个executor,每个execuotr需要2个core 那么共需要20个core,但这种算法中,其实只会启动两个executor,每个executor有10个core
//这个方法就是真正启动executor的方法,在执行这个方法之前,会调用一些其他的验证方法,得到一个结果集合 //assignedCores,这个集合计算出了每个一个work上能分配几个core。通过这个结果,就能知道启动几个executor private def allocateWorkerResourceToExecutors( app: ApplicationInfo, assignedCores: Int, coresPerExecutor: Option[Int], worker: WorkerInfo): Unit = { // If the number of cores per executor is specified, we divide the cores assigned // to this worker evenly among the executors with no remainder. // Otherwise, we launch a single executor that grabs all the assignedCores on this worker. //在循环的当前worker里,要启动exec的个数 (该worker的总数core / 每个exec需要的core = exec个数,如果没配置每个exec所需core,则默认为1) val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1) //如果没配置每个exec所需core,直接在这个把分配给这个worker的所有core全部用来启动这个exec,否者按照配置的来 val coresToAssign = coresPerExecutor.getOrElse(assignedCores) for (i <- 1 to numExecutors) { val exec = app.addExecutor(worker, coresToAssign) launchExecutor(worker, exec) app.state = ApplicationState.RUNNING } } /** 总结: 提交任务时指定每个exec分配2个core,启动3个executor 那么在spark会用默认算法spreadOutApps,平均给每个worker分配资源的情况下 先计算出总core数 2*3 = 6 然后给某三个worker一个分配1个exec(assignedCores集合里存两个Int:2,2,2 代表三个worker分别分配2个core) 然后公式 assignedCores(该worker启动exec所需core总数) / coresPerExecutor(配置的每个exec启动core个数) = (2/2=1)(该worker启动的exe个数) 然后公式 coresPerExecutor.getOrElse(assignedCores) 到底要启动几个core(2) 最后:得到了要启动2个core,得到了要启动exec的个数,就循环exec个数来分别启动2个core