SparkConf SparkContext 的简单介绍

最新推荐文章于 2024-06-26 10:48:25 发布

潇洒-人生

最新推荐文章于 2024-06-26 10:48:25 发布

阅读量1.9k

点赞数

分类专栏：大数据 spark spark

本文链接：https://blog.csdn.net/qq_35744460/article/details/89761698

版权

大数据同时被 3 个专栏收录

45 篇文章 0 订阅

订阅专栏

spark

17 篇文章 0 订阅

订阅专栏

spark

12 篇文章 0 订阅

订阅专栏

上一篇介绍了RDD的相关知识

本编介绍Initializing Spark

官网地址 : http://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds

Initializing Spark

The first thing a Spark program must do is to create a SparkContext object,
创建一个spark程序必须创建一个SparkContext
which tells Spark how to access a cluster.
它告诉spakr如何访问cluster
To create a SparkContext you first need to build a SparkConf object that contains information about your application.
创建SparkContext前还必须创建SparkConf，这个保存application的一些信息

创建demo

    val conf = new SparkConf().setAppName(appName).setMaster(master)
    new SparkContext(conf)

那下面来看看SparkConf是什么

SparkConf
来看看SparkConf源码的主要信息

/**
 * Configuration for a Spark application. spark 的application的配置信息
 * Used to set various Spark parameters as key-value pairs.保存方式是kye-value形式
 */
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Serializable {

  import SparkConf._

  /** Create a SparkConf that loads defaults from system properties and the classpath */
  def this() = this(true)

  private val settings = new ConcurrentHashMap[String, String]()
  /** Set a configuration variable. */
  def set(key: String, value: String): SparkConf = {
    set(key, value, false)
  }
  /**设置master
   * The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
   * run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
   */
  def setMaster(master: String): SparkConf = {
    set("spark.master", master)
  }

  /** Set a name for your application. Shown in the Spark web UI.
   * 设置application的名字
   */
  def setAppName(name: String): SparkConf = {
    set("spark.app.name", name)
  }

  /** Set JAR files to distribute to the cluster. 
   * 添加jars
   */
  def setJars(jars: Seq[String]): SparkConf = {
    for (jar <- jars if (jar == null)) logWarning("null jar passed to SparkContext constructor")
    set("spark.jars", jars.filter(_ != null).mkString(","))
  }
 
 private[spark] object SparkConf extends Logging {

SparkContext
接下来看看SparkContext有哪些东西

/**
 * Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
 * cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
 *
 * Only one SparkContext may be active per JVM.  You must `stop()` the active SparkContext before
 * creating a new one.  This limitation may eventually be removed; see SPARK-2243 for more details.
 *
 * @param config a Spark Config object describing the application configuration. Any settings in
 *   this config overrides the default configs as well as system properties.
 */
class SparkContext(config: SparkConf) extends Logging {
  主要就是初始化很多信息
  private var _conf: SparkConf = _
  private var _eventLogDir: Option[URI] = None
  private var _eventLogCodec: Option[String] = None
  private var _listenerBus: LiveListenerBus = _
  private var _env: SparkEnv = _
  private var _statusTracker: SparkStatusTracker = _
  private var _progressBar: Option[ConsoleProgressBar] = None
  private var _ui: Option[SparkUI] = None
  private var _hadoopConfiguration: Configuration = _
  private var _executorMemory: Int = _
  private var _schedulerBackend: SchedulerBackend = _
  private var _taskScheduler: TaskScheduler = _
  private var _heartbeatReceiver: RpcEndpointRef = _
  @volatile private var _dagScheduler: DAGScheduler = _
  private var _applicationId: String = _
  private var _applicationAttemptId: Option[String] = None
  private var _eventLogger: Option[EventLoggingListener] = None
  private var _executorAllocationManager: Option[ExecutorAllocationManager] = None
  private var _cleaner: Option[ContextCleaner] = None
  private var _listenerBusStarted: Boolean = false
  private var _jars: Seq[String] = _
  private var _files: Seq[String] = _
  private var _shutdownHookRef: AnyRef = _
  private var _statusStore: AppStatusStore = _

这些信息内容太多后面在一一解析

	In practice, when running on a cluster,  you will not want to hardcode master in the program, 
	 but rather launch the application with spark-submit  and receive it there. 
	 However, for local testing and unit tests, 
	you can pass “local” to run Spark in-process.

	在时间上，当运行在集群上时，不能把master 硬编码到程序里，而是在spark-submit 提交时指定
	但是本地模式下，可以直接使用local