上一篇介绍了RDD的相关知识
本编介绍Initializing Spark
- 官网地址 : http://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds
-
Initializing Spark
The first thing a Spark program must do is to create a SparkContext object,
创建一个spark程序必须创建一个SparkContext
which tells Spark how to access a cluster.
它告诉spakr如何访问cluster
To create a SparkContext you first need to build a SparkConf object that contains information about your application.
创建SparkContext前还必须创建SparkConf,这个保存application的一些信息
- 创建demo
val conf = new SparkConf().setAppName(appName).setMaster(master)
new SparkContext(conf)
那下面来看看SparkConf是什么
- SparkConf
来看看SparkConf源码的主要信息
/**
* Configuration for a Spark application. spark 的application的配置信息
* Used to set various Spark parameters as key-value pairs.保存方式是kye-value形式
*/
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Serializable {
import SparkConf._
/** Create a SparkConf that loads defaults from system properties and the classpath */
def this() = this(true)
private val settings = new ConcurrentHashMap[String, String]()
/** Set a configuration variable. */
def set(key: String, value: String): SparkConf = {
set(key, value, false)
}
/**设置master
* The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
* run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
*/
def setMaster(master: String): SparkConf = {
set("spark.master", master)
}
/** Set a name for your application. Shown in the Spark web UI.
* 设置application的名字
*/
def setAppName(name: String): SparkConf = {
set("spark.app.name", name)
}
/** Set JAR files to distribute to the cluster.
* 添加jars
*/
def setJars(jars: Seq[String]): SparkConf = {
for (jar <- jars if (jar == null)) logWarning("null jar passed to SparkContext constructor")
set("spark.jars", jars.filter(_ != null).mkString(","))
}
private[spark] object SparkConf extends Logging {
- SparkContext
接下来看看SparkContext有哪些东西
/**
* Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
* cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
*
* Only one SparkContext may be active per JVM. You must `stop()` the active SparkContext before
* creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.
*
* @param config a Spark Config object describing the application configuration. Any settings in
* this config overrides the default configs as well as system properties.
*/
class SparkContext(config: SparkConf) extends Logging {
主要就是初始化很多信息
private var _conf: SparkConf = _
private var _eventLogDir: Option[URI] = None
private var _eventLogCodec: Option[String] = None
private var _listenerBus: LiveListenerBus = _
private var _env: SparkEnv = _
private var _statusTracker: SparkStatusTracker = _
private var _progressBar: Option[ConsoleProgressBar] = None
private var _ui: Option[SparkUI] = None
private var _hadoopConfiguration: Configuration = _
private var _executorMemory: Int = _
private var _schedulerBackend: SchedulerBackend = _
private var _taskScheduler: TaskScheduler = _
private var _heartbeatReceiver: RpcEndpointRef = _
@volatile private var _dagScheduler: DAGScheduler = _
private var _applicationId: String = _
private var _applicationAttemptId: Option[String] = None
private var _eventLogger: Option[EventLoggingListener] = None
private var _executorAllocationManager: Option[ExecutorAllocationManager] = None
private var _cleaner: Option[ContextCleaner] = None
private var _listenerBusStarted: Boolean = false
private var _jars: Seq[String] = _
private var _files: Seq[String] = _
private var _shutdownHookRef: AnyRef = _
private var _statusStore: AppStatusStore = _
这些信息内容太多后面在一一解析
In practice, when running on a cluster, you will not want to hardcode master in the program,
but rather launch the application with spark-submit and receive it there.
However, for local testing and unit tests,
you can pass “local” to run Spark in-process.
在时间上,当运行在集群上时,不能把master 硬编码到程序里,而是在spark-submit 提交时指定
但是本地模式下,可以直接使用local