SparkContext的初始化

最新推荐文章于 2024-04-18 18:01:29 发布

宝哥大数据

最新推荐文章于 2024-04-18 18:01:29 发布

阅读量907

点赞数

分类专栏： # spark

本文链接：https://blog.csdn.net/wuxintdrh/article/details/88856716

版权

spark 专栏收录该内容

145 篇文章 14 订阅

订阅专栏

本栏目基于spark2.1.1

SparkDriver 用于提交用户的应用程序，

一、SparkConf

负责SparkContext的配置参数加载，主要通过ConcurrentHashMap来维护各种spark.*的配置属性

class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Serializable {
    import SparkConf._

    /** Create a SparkConf that loads defaults from system properties and the classpath */
    def this() = this(true)

    /**  维护一个ConcurrentHashMap 来存储spark配置  */
    private val settings = new ConcurrentHashMap[String, String]()
    if (loadDefaults) {
        loadFromSystemProperties(false)
    }

    /**
     * 加载spark.*的配置 
     */
    private[spark] def loadFromSystemProperties(silent: Boolean): SparkConf = {
        // Load any spark.* system properties, 只加载spark.*的配置
        for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
            set(key, value, silent)
        }
        this
    }
}
。。。。。

二、SparkContext

2.1、创建Spark执行环境SparkEnv

~~2.2、创建RDD清理器metadataCleaner~~
– spark2.1.1中没有找到MetadataCleaner, 它的功能可能由ContextCleaner替代（此处不一定正确，希望有知道的朋友告知，谢谢）

2.3、创建并初始化SparkUI

2.4、Hadoop相关配置及Executor环境变量的设置

2.5、创建TaskScheduler

SparkLauncher，LauncherServer，LauncherBackend的通信流程

2.6、创建和启动DAGScheduler

2.7、TaskScheduler的启动

2.8、初始化BlockManager

2.9、启动测量系统MetricsSystem

2.10、创建和启动Executor分配管理器ExecutorAllocationManager

2.11、ContextCleaner的创建和启动

2.12、Spark环境更新

2.13、创建DAGSchedulerSource和BlockManagerSource

2.14、将SparkContext标记为激活

  // In order to prevent multiple SparkContexts from being active at the same time, mark this
  // context as having finished construction.
  // NOTE: this must be placed at the end of the SparkContext constructor.
  SparkContext.setActiveContext(this, allowMultipleContexts)