[spark-src-core] 8. trivial bug in spark standalone executor assignment

 

   yep from [1] we know that spark will divide jobs into two steps to be executed:a.launches executors and b.assigns tasks to that executors by driver.so how do executors are assigned to workers by master is very important!

  for standalone mode,when we dive into the src in Master#receiveWithLogging() for case RequestSubmitDriver you will figure out it.

 

1.what

  while you step more ,u will see the details in the code path below:

 

/**相比hadoop中的mr slots,spark分配executors显得智能了:后者是按照cores,mem总体要求进行全集群分配,
    * 并且资源多的workers分配更多exers.很明显,这里不是类似hadoop那样按照splits数量进行; 另外也比hadoop slots更智能些.
   * Schedule executors to be launched on the workers.-note:here will not clear out the assigned app.
   * vip==> spread out purpose:
   * There are two modes of launching executors. The first attempts to spread out an application's
   * executors on as many workers as possible, while the second does the opposite (i.e. launch them
   * on as few workers as possible). The former is usually better for data locality purposes and is
   * the default.<==
   *
   * The number of cores assigned to each executor is configurable. When this is explicitly set,
   * multiple executors from the same application may be launched on the same worker if the worker
   * has enough cores and memory. Otherwise, each executor grabs all the cores available on the
   * worker by default, in which case only one executor may be launched on each worker.
   */
  private def startExecutorsOnWorkers(): Unit = {
    // Right now this is a very simple <<FIFO scheduler>>. We keep trying to fit in the first app
    // in the queue, then the second app, etc.
    if (spreadOutApps) { //- how to present the meaning of 'spread out'? see loop 'while()'
      // Try to spread out each app among all the workers, until it has all its cores
      for (app <- waitingApps if app.coresLeft > 0) { //深度置后
        //1 //-workers satisfied the need of a executor;reverse order to balance worker's load
        //-this filters limit thats a worker's mem and free cores must satisfy at least one executor's mem and cpus.
        val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
          .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
            worker.coresFree >= app.desc.coresPerExecutor.getOrElse(1))
          .sortBy(_.coresFree).reverse //-the more free cores of worker the more priority it has
        //-2 balance the resources asked to spread out to cluster as far as possible
        val numUsable = usableWorkers.length
        val assigned = new Array[Int](numUsable) // Number of cores to give on each node
        //-here means if app.coresLeft > sum(workers'cores),more than one exectors will be reassigned in one worker
        // in next round.app.coresLeft can be thinked as the property spark.cores.max per app
        var toAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum) //-can't determine which one is less
        var pos = 0
        ///-spread out the target cpus(spark.cores.max) across cluster for load balance.
        while (toAssign > 0) { //-coresFree is never changed,so one work's cpus will be all assigned as far as possible
          if (usableWorkers(pos).coresFree - assigned(pos) > 0) {//-app.coresLeft is a multiple of one exer core?needless
            toAssign -= 1
            assigned(pos) += 1
          }
          pos = (pos + 1) % numUsable
        }
        //3 Now that we've decided how many cores to give on each node, let's actually give them
        //严格按照worker本身的cpus and mem资源来分配exers,若果不够一个executor资源要求则不分配
        for (pos <- 0 until numUsable if assigned(pos) > 0) { //广度优先(横向)
          //-worker free mem(mainly) and coresFree and app.coresLeft both will be decreased below
          allocateWorkerResourceToExecutors(app, assigned(pos), usableWorkers(pos))
        }
      }
    } else {
      //-spark.deploy.spreadOut=false will launch 25 executores(ie.50 cores which same as specified)
      // Pack each app into as few workers as possible until we've assigned all its cores.先逐个worker分配资源.不够再下一个
      //-worker.coresFree will be decreased in allocateWorkerResourceToExecutors();
      // 实际上每次分配exers到worker时只考虑mem,而cpus是在下一轮分配时再考虑!这意味着stpreadOut=false时单个worker上
      // exers占用的cpus可能超过 单个worker配额.
      for (worker <- workers if worker.coresFree > 0 && worker.state == WorkerState.ALIVE) { //广度置后
        for (app <- waitingApps if app.coresLeft > 0) { //深度优先(垂直),若果一个worker资源不满足,进入一个worker继续分配
          allocateWorkerResourceToExecutors(app, app.coresLeft, worker)
        }
      }
    }
  }

  /**-以worker.memoryFree和参数corsToAllocate为原则生成executors,对于spreadOut=true,cores分配是严格按照worker实际数量进行的.
    * assign cores and mem to executor by it's reqeusts(core and mem unit).
    * *对于spreadOut=false,实际上这里分配executors时只考虑了mem而cpus并未考虑,
    * 只有在下一轮分配exers到worker时才考虑,see startExecutorsOnWorkers().但spreadOut=true时严格按照mem and cpus来分配
   * Allocate a worker's resources to one or more executors.-ie several exers may be run on same worker
   * @param app the info of the application which the executors belong to
   * @param coresToAllocate cores on this worker to be allocated to this application(-total cores to be assigned to this
    *                        worker)
   * @param worker the worker info
   */
  private def allocateWorkerResourceToExecutors(
      app: ApplicationInfo,
      coresToAllocate: Int,
      worker: WorkerInfo): Unit = {
    val memoryPerExecutor = app.desc.memoryPerExecutorMB
    val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(coresToAllocate)
    var coresLeft = coresToAllocate
    ///-stop whichever meet the cpus or mem conditions
    while (coresLeft >= coresPerExecutor && worker.memoryFree >= memoryPerExecutor) {
      val exec = app.addExecutor(worker, coresPerExecutor) //-here will decrease the app.coresGranted,ie coresLeft
      coresLeft -= coresPerExecutor
      launchExecutor(worker, exec) //-here will decrease the number of free core and mem of worker
      app.state = ApplicationState.RUNNING
    }
  }

   its meaning by below figure:



 

 

2.how about 

   annotation refered from spark src,it said thtat 'spark.cores.max' is the # cores to be allocated to one app as many as possible.that means there will be a computation bug in spark,ie.(spreadOut=true):

casespark.cores.max#workers#worker cores#worker memcoresPerExecutormemPerExecutorresult
110101616g22g

failed:no executors be allocated,ie

10/10=1 < 2coresPerExecutor

220     

10 executors allocated at one wave,ie

a.20/10=2>=coresPerExecutor,2/2=1

b.10 cores>=2x1

c.16g>=2gx1

340     20 executors at one wave
440   216g10 exers at one wave,10 exers at othe wave,total is 20
540   162gsimilar as above
640   20 

failed,

#worker cpus < 20

740   220g

failed,

#worker mem < 20g

815101616g22g

only 5 executors allocated,ie

15/10=1wave,then15-10=5 

that is only 10 cores to be  assigned.

 

  so from case 1 ,8 we know that the cluster has enough resources to allocate exers but in fact no any executors (or no reasonable # executors) to be launched.then you will see something weird occurs:

 

16/11/18 14:07:10 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks

16/11/18 14:07:25 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/11/18 14:07:40 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

 

 3.workarounds

  a.use a reasonable # of cores ,ie

natural number= cores.max % (#workers x # coresPerExecutor)

  b.appends a embedding code block to check cores.max

  ie checks this # max  whether can be collapsed with prevous wave computations,no matter more or less then coresPerExecutor after assigning cores to workers and before allocating executors.

 

4.conclusion

  no doubt the property 'spark.cores.max' maybe arise certain misunderstands,but u can aovid this case if adopt the solutions above.

  in general speaking this property will let spark more intelligent to allocate executors dynamically compared to other yarn computation framework etc.

 

ref:

[1] [spark-src-core] 4.2 communications b/t certain kernal components

[2] spark调度系列----1. spark stanalone模式下Master对worker上各个executor资源的分配

[3] Spark技术内幕:Executor分配详解

yarn-similar logs when starting up container

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值