Spark TaskScheduler 功能及源码解析

TaskScheduler是抽象类,目前Spark仅提供了TaskSchedulerImpl一种实现;其初始化是在SparkContext

private[spark] class TaskSchedulerImpl(
    val sc: SparkContext,
    val maxTaskFailures: Int,
    isLocal: Boolean = false)
  extends TaskScheduler with Logging

TaskScheduler实际是SchedulerBackend的代理,本身处理一些通用逻辑,如不同Job间的调度顺序,将运行缓慢的task在空闲节点上重新提交(speculation)等

// SparkContext调用TaskSchedulerImpl.initialize方法,传入SchedulerBackend对象
def initialize(backend: SchedulerBackend) {
  this.backend = backend
  // temporarily set rootPool name to empty
  rootPool = new Pool("", schedulingMode, 0, 0)
  schedulableBuilder = {
    schedulingMode match {
      case SchedulingMode.FIFO =>
        new FIFOSchedulableBuilder(rootPool)
      case SchedulingMode.FAIR =>
        new FairSchedulableBuilder(rootPool, conf)
    }
  }
  schedulableBuilder.buildPools()
}

Pool用于调度TaskManager,实际上PoolTaskManager都继承了Schedulable特征,因此Pool可以包含TaskManager或其他Pool

Spark默认使用FIFO(First In,First Out)调度模式,另外还有FAIR模式。FIFO模式只有一个poolFAIR模式有多个poolPool也分FIFOFAIR两种模式,两种模式分别对应于FairSchedulableBuilderFIFOSchedulableBuilder

SparkContext根据master参数决定采用何种SchedulerBackend,以Spark Standalone模式为例,使用的是SparkDeploySchedulerBackend,继承CoarseGrainedSchedulerBackend父类

private[spark] class SparkDeploySchedulerBackend(
    scheduler: TaskSchedulerImpl,
    sc: SparkContext,
    masters: Array[String])
  extends CoarseGrainedSchedulerBackend(scheduler, sc.env.rpcEnv)
  with AppClientListener
  with Logging {

// SparkContext调用TaskSchedulerImpl.start方法
override def start() {
  backend.start()

  // 判断speculation是否开启,如是,则启动线程将运行缓慢的任务在空闲的资源上重新提交
  if (!isLocal && conf.getBoolean("spark.speculation", false)) {
    logInfo("Starting speculative execution thread")
    sc.env.actorSystem.scheduler.schedule(SPECULATION_INTERVAL_MS milliseconds, SPECULATION_INTERVAL_MS milliseconds) {
      Utils.tryOrStopSparkContext(sc) { checkSpeculatableTasks() }
    }(sc.env.actorSystem.dispatcher)
  }
}

SparkDeploySchedulerBackend.start方法中初始化了AppClient对象,主要用于DriverMasterAkka通信交互信息、注册Spark Application

// SparkDeploySchedulerBackend.start()
override def start() {
  super.start()
  ...
  val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
  val javaOpts = sparkJavaOpts ++ extraJavaOpts

  // 将CoarseGrainedExecutorBackend封装入Command
  val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend", args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
  val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
  val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)
  val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor)

  // 初始化AppClient对象
  cl
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值