Spark源码-任务提交流程之-6.1-sparkContext初始化-创建spark driver端执行环境SparkEnv

zdaiqing

已于 2022-07-28 16:41:47 修改

阅读量310

点赞数

分类专栏： Spark 大数据源码文章标签： spark 大数据 scala

于 2022-07-18 21:58:54 首次发布

本文链接：https://blog.csdn.net/m0_37817767/article/details/125855073

版权

创建spark driver端执行环境SparkEnv

1.过程分析
2.参考资料

1.过程分析

1.1.入口

spark的driver端执行环境的创建在sparkContext的初始化过程中完成；
具体入口位置如下代码段所示：在sparkContext类的try catch代码块中；
调用链：try catch块 ->createSparkEnv方法->SparkEnv.createDriverEnv方法；

class SparkContext(config: SparkConf) extends Logging {
   
  //无关代码省略...
  //创建spark env调用方法
  private[spark] def createSparkEnv(
      conf: SparkConf,
      isLocal: Boolean,
      listenerBus: LiveListenerBus): SparkEnv = {
   
    //指定driver端可以使用的cpu核数
    SparkEnv.createDriverEnv(conf, isLocal, listenerBus, SparkContext.numDriverCores(master, conf))
  }
  //无关代码省略...
  try {
   
	//无关代码省略...
    // Create the Spark execution environment (cache, map output tracker, etc)
    //创建spark env入口
    _env = createSparkEnv(_conf, isLocal, listenerBus)
    SparkEnv.set(_env)
	//无关代码省略...
  } catch {
   
    //无关代码省略...  
  }
}

1.1.1.driver端cpu核数指定

本地模式下：
	没有指定线程数，设置默认值1
	指定线程数，根据线程数设置值；
		如果线程数为*号，根据jvm虚拟机可用cpu数量确定；
yarn集群模式下，根据spark.driver.cores参数确定，默认值0；
其他情况，默认值设0；

object SparkContext extends Logging {
   
  private[spark] def numDriverCores(master: String, conf: SparkConf): Int = {
   
    //String -> Int
    def convertToInt(threads: String): Int = {
   
      if (threads == "*") Runtime.getRuntime.availableProcessors() else threads.toInt
    }
    master match {
   
      //本地模式，没指定情况下默认1
      case "local" => 1
      //本地模式指定线程数情况下，根据线程数确定
      case SparkMasterRegex.LOCAL_N_REGEX(threads) => convertToInt(threads)
      case SparkMasterRegex.LOCAL_N_FAILURES_REGEX(threads, _) => convertToInt(threads)
      //yarn集群模式下，根据spark.driver.cores参数确定，默认值0
      case "yarn" =>
        if (conf != null && conf.getOption("spark.submit.deployMode").contains("cluster")) {
   
          conf.getInt("spark.driver.cores", 0)
        } else {
   
          0
        }
      //driver没有使用或后面设置driver cpu核数情况下，默认值0
      case _ =