0028.SparkContext源码解析

1.Spark天堂之门SparkContext
(1)Spark程序在运行的时候分为:Driver和Executors两部分
(2)Spark程序的编写是基于SparkContext,包括两个方面
a) Spark的核心基础是RDD,是由SparkContext来最初创建(第一个RDD一定是由SparkContext来创建)
b) Spark程序的调度优化也是基于SparkContext
(3)Spark程序注册是通过SparkContext实例化的时候产生的对象来完成的(其实是SchedulerBankend来注册程序)
(4)Spark程序运行的时候,通过Cluster Manager获得具体的计算资源,计算资源的获取也是通过SparkContext产生的对象来申请的(其实是SchedulerBackend来获取计算资源的)
(5)SparkContext崩溃前或结束的时候整个Spark程序也结束
     总结:
(a)Spark程序是通过SparkContext发布到Spark集群的;
(b)Spark程序运行都是在SparkContext为核心的调度器的指挥下进行的;
(c)SparkContext崩溃前或结束的时候整个Spark程序也结束。

2.SparkContext使用鉴赏

3.SparkContext构建的顶级三大核心对象:DAGScheduler、TaskScheduler、SchedulerBackend
(1)DAGScheduler是面向Job的Stage的高层调度器
(2)TaskScheduler是一个接口,根据具体的Cluster Manager的不同会有不同实现,Standalone模式下的具体实现是TaskSchedulerImpl
(3)SchedulerBackend是一个接口,根据具体的Cluster Manager不同会有不同的实现,Standalone模式下的具体实现是SparkDeploySchedulerBackend

4.从整个程序的运行角度看,SparkContext有四大核心:DAGScheduler、TaskScheduler、SchedulerBackend、MapoutputTrackerMaster

5.SparkDeploySchedulerBackend有三大核心功能
(1)负责Master链接注册当前程序
(2)接收集群中为当前应用程序而分配的计算资源Executor的注册并管理Executors;
(3)负责发送Task到具体的Executor执行。
注意:SparkDeploySchedulerBackend是被TaskSchedulerImpl来管理的,
然后启动taskScheduler

6.代码运行实例查看SparkContext的实例化过程
import org.apache.spark.SparkContext._
import org.apache.spark.{SparkConf, SparkContext}

object SparkWordCount{
    def main(args: Array[String]) {
        //输入文件既可以是本地linux系统文件,也可以是其它来源文件,例如HDFS
        if (args.length == 0) {
            System.err.println("Usage: SparkWordCount <inputfile>")
            System.exit(1)
        }
        //以本地线程方式运行,可以指定线程个数,
        //如.setMaster("local[2]"),两个线程执行
        //下面给出的是单线程执行
        val conf = new SparkConf().setAppName("SparkWordCount").setMaster("local")
        val sc = new SparkContext(conf)

        //wordcount操作,计算文件中包含Spark的行数
        val count=sc.textFile(args(0)).filter(line => line.contains("hello")).count()
        //打印结果
        println("count="+count)
        sc.stop()
    }
}

使用文件内容
运行日志
/usr/local/jdk1.7.0_72/bin/java -Didea.launcher.port=7532 -Didea.launcher.bin.path=/usr/local/idea-IC-141.1532.4/bin -Dfile.encoding=UTF-8 -classpath /usr/local/jdk1.7.0_72/jre/lib/management-agent.jar:/usr/local/jdk1.7.0_72/jre/lib/jsse.jar:/usr/local/jdk1.7.0_72/jre/lib/deploy.jar:/usr/local/jdk1.7.0_72/jre/lib/jfxrt.jar:/usr/local/jdk1.7.0_72/jre/lib/resources.jar:/usr/local/jdk1.7.0_72/jre/lib/jce.jar:/usr/local/jdk1.7.0_72/jre/lib/javaws.jar:/usr/local/jdk1.7.0_72/jre/lib/jfr.jar:/usr/local/jdk1.7.0_72/jre/lib/charsets.jar:/usr/local/jdk1.7.0_72/jre/lib/rt.jar:/usr/local/jdk1.7.0_72/jre/lib/plugin.jar:/usr/local/jdk1.7.0_72/jre/lib/ext/sunpkcs11.jar:/usr/local/jdk1.7.0_72/jre/lib/ext/dnsns.jar:/usr/local/jdk1.7.0_72/jre/lib/ext/sunec.jar:/usr/local/jdk1.7.0_72/jre/lib/ext/sunjce_provider.jar:/usr/local/jdk1.7.0_72/jre/lib/ext/zipfs.jar:/usr/local/jdk1.7.0_72/jre/lib/ext/localedata.jar:/root/IdeaProjects/test/out/production/test:/hadoop/scala/lib/scala-actors.jar:/hadoop/scala/lib/scala-swing.jar:/hadoop/scala/lib/scala-library.jar:/hadoop/scala/lib/scala-actors-migration.jar:/hadoop/scala/lib/scala-reflect.jar:/hadoop/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/usr/local/idea-IC-141.1532.4/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain SparkWordCount /hadoop/mr/wordcount.txt
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/19 00:38:54 INFO SparkContext: Running Spark version 1.6.0
16/03/19 00:38:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/19 00:38:58 INFO SecurityManager: Changing view acls to: root
16/03/19 00:38:58 INFO SecurityManager: Changing modify acls to: root
16/03/19 00:38:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/03/19 00:38:59 INFO Utils: Successfully started service ' sparkDriver' on port 43290.
16/03/19 00:39:00 INFO Slf4jLogger: Slf4jLogger started
16/03/19 00:39:00 INFO Remoting: Starting remoting
16/03/19 00:39:01 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.6.135:57495]
16/03/19 00:39:01 INFO Utils: Successfully started service ' sparkDriverActorSystem' on port 57495.
16/03/19 00:39:01 INFO SparkEnv: Registering MapOutputTracker
16/03/19 00:39:01 INFO SparkEnv: Registering BlockManagerMaster
16/03/19 00:39:02 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-a99dec66-4d57-4213-bd3a-55cd4f7af0ac
16/03/19 00:39:02 INFO MemoryStore: MemoryStore started with capacity 1200.4 MB
16/03/19 00:39:02 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/19 00:39:03 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/19 00:39:03 INFO SparkUI: Started SparkUI at http://192.168.6.135:4040
16/03/19 00:39:04 INFO Executor: Starting executor ID driver on host localhost
16/03/19 00:39:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51867.
16/03/19 00:39:04 INFO NettyBlockTransferService: Server created on 51867
16/03/19 00:39:04 INFO BlockManagerMaster: Trying to register BlockManager
16/03/19 00:39:04 INFO BlockManagerMasterEndpoint: Registering block manager localhost:51867 with 1200.4 MB RAM, BlockManagerId(driver, localhost, 51867)
16/03/19 00:39:04 INFO BlockManagerMaster: Registered BlockManager
16/03/19 00:39:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 153.6 KB)
16/03/19 00:39:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 167.5 KB)
16/03/19 00:39:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:51867 (size: 13.9 KB, free: 1200.4 MB)
16/03/19 00:39:06 INFO SparkContext: Created broadcast 0 from textFile at SparkWordCount.scala:21
16/03/19 00:39:07 INFO FileInputFormat: Total input paths to process : 1
16/03/19 00:39:07 INFO SparkContext: Starting job: count at SparkWordCount.scala:21
16/03/19 00:39:07 INFO DAGScheduler: Got job 0 (count at SparkWordCount.scala:21) with 1 output partitions
16/03/19 00:39:07 INFO DAGScheduler: Final stage: ResultStage 0 (count at SparkWordCount.scala:21)
16/03/19 00:39:07 INFO DAGScheduler: Parents of final stage: List()
16/03/19 00:39:07 INFO DAGScheduler: Missing parents: List()
16/03/19 00:39:07 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at filter at SparkWordCount.scala:21), which has no missing parents
16/03/19 00:39:07 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 170.6 KB)
16/03/19 00:39:07 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1858.0 B, free 172.4 KB)
16/03/1
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值