Spark 源码解析之SparkContext家族（一）

最新推荐文章于 2019-11-29 16:52:24 发布

AlferWei

最新推荐文章于 2019-11-29 16:52:24 发布

阅读量1k

点赞数 1

分类专栏： Spark Spark专栏文章标签： spark 源码

本文链接：https://blog.csdn.net/OiteBody/article/details/53541982

版权

Spark 同时被 2 个专栏收录

32 篇文章 0 订阅

订阅专栏

Spark专栏

14 篇文章 7 订阅

订阅专栏

SparkContext

SparkContext 是Spark 应用的主入口，通过它可以连接Spark 集群，并在集群中创建RDD，累加器，广播变量等；==每一个启动 JVM 上只能有一个SparkContext，在启动一个新的SparkContext之前，必须停掉处于活动状态的SparkContext==。

/**
 * Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
 * cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
 *
 * Only one SparkContext may be active per JVM.  You must `stop()` the active SparkContext before
 * creating a new one.  This limitation may eventually be removed; see SPARK-2243 for more details.
 *
 * @param config a Spark Config object describing the application configuration. Any settings in
 *   this config overrides the default configs as well as system properties.
 */
class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {

StreamingContext

StreamingContext 是Spark Streaming 应用的主入口，它可以从输入的数据源中创建DStream。它可以通过制定Spark master URL 和 appName来创建，也可以从SparkConf 中创建，==或者从已经存在的SparkContext 中创建==。相关联的SparkContext 可以通过context.sparkContext得到。创建和转换DStreams后，流计算可以使用context.start() 启动或使用context.stop() 停止。
context.awaitTermination() 允许当前线程一直等待，直到context 进行stop() 或者抛出异常才会终止。

/**
 * Main entry point for Spark Streaming functionality. It provides methods used to create
 * [[org.apache.spark.streaming.dstream.DStream]]s from various input sources. It can be either
 * created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf
 * configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext.
 * The associated SparkContext can be accessed using `context.sparkContext`. After
 * creating and transforming DStreams, the streaming computation can be started and stopped
 * using `context.start()` and `context.stop()`, respectively.
 * `context.awaitTermination()` allows the current thread to wait for the termination
 * of the context by `stop()` or by an exception.
 */
class StreamingContext private[streaming] (
    sc_ : SparkContext,
    cp_ : Checkpoint,
    batchDur_ : Duration
  ) extends Logging {

SQLContext

SQLContext 是Spark 中运行==结构化数据==的主入口，可以创建DataFrame 对象，并执行SQL 查询。

/**
 * The entry point for working with structured data (rows and columns) in Spark.  Allows the
 * creation of [[DataFrame]] objects as well as the execution of SQL queries.
 *
 * @groupname basic Basic Operations
 * @groupname ddl_ops Persistent Catalog DDL
 * @groupname cachemgmt Cached Table Management
 * @groupname genericdata Generic Data Sources
 * @groupname specificdata Specific Data Sources
 * @groupname config Configuration
 * @groupname dataframes Custom DataFrame Creation
 * @groupname Ungrouped Support functions for language integrated queries
 *
 * @since 1.0.0
 */
class SQLContext private[sql](
    @transient val sparkContext: SparkContext,
    @transient protected[sql] val cacheManager: CacheManager,
    @transient private[sql] val listener: SQLListener,
    val isRootContext: Boolean)
  extends org.apache.spark.Logging with Serializable {

AlferWei

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Spark 源码解析之SparkContext家族（一）

SparkContextSparkContext 是Spark 应用的主入口，通过它可以连接Spark 集群，并在集群中创建RDD，累加器，广播变量等；==每一个启动 JVM 上只能有一个SparkContext，在启动一个新的SparkContext之前，必须停掉处于活动状态的SparkContext==。/** * Main entry point for Spark functionalit
复制链接

扫一扫

专栏目录