SparkContext功能:
加载配置信息,维护上下文
创建TaskScheduler对象/TaskBackend对象/DAGScheduler对象
启动job,提交到DAGScheduler
SparkContext是spark的主入口点,保证了晕spark集群的连接,能够用来在集群内部创建RDD/accumulator和broadcast。通常来说每个JVM上只允许一个SparkContext存活,在创建新的sparkContext之前需要先stop已有的sparkcontext(现在可以已经取消了这个限制,spark.driver.allowMultipleContexts)1根据sparkConf的配置信息进行初始化,并添加新的配置创建SparkContext时需要传入一个SparkConf对象,来完成相关参数配置,随后会给sparkContext对象设置master/jar等,完成初始化2创建SparkEnv维护运行时的环境,包括serializer/akka actor system/block manager/map output tracker3创建ContextCleaner,异步清理RDD/Shuffle/broadcast4创建JobProgressListener,用来追踪task信息展示给ui5创建SparkStatusTracker,low-level的状态报告,用于监控job和stage的情况6创建ConsoleProgressBar,用来在console展示stage的进度7创建SparkUI,spark应用最高级别的用户接口8创建Configuration,提供参数配置入口9注册heartbeatReceiver10创建dagScheduler,面向stage的high-level的scheduler11创建taskScheduler,low-level的任务调度器接口,目前通过TaskSchedulerImpl实现12创建schedulerBackend,配合taskSchedulerImpl完成后台调度private var _conf: SparkConf = _ private var _eventLogDir: Option[URI] = None private var _eventLogCodec: Option[String] = None private var _env: SparkEnv = _ private var _jobProgressListener: JobProgressListener = _ private var _statusTracker: SparkStatusTracker = _ private var _progressBar: Option[ConsoleProgressBar] = None private var _ui: Option[SparkUI] = None private var _hadoopConfiguration: Configuration = _ private var _executorMemory: Int = _ private var _schedulerBackend: SchedulerBackend = _ private var _taskScheduler: TaskScheduler = _ private var _heartbeatReceiver: RpcEndpointRef = _ @volatile private var _dagScheduler: DAGScheduler = _ private var _applicationId: String = _ private var _applicationAttemptId: Option[String] = None private var _eventLogger: Option[EventLoggingListener] = None private var _executorAllocationManager: Option[ExecutorAllocationManager] = None private var _cleaner: Option[ContextCleaner] = None private var _listenerBusStarted: Boolean = false private var _jars: Seq[String] = _ private var _files: Seq[String] = _ private var _shutdownHookRef: AnyRef = _