本节主要讲解SparkContext的逻辑 首先看一个spark自带的最简单的例子:object SparkPi { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Spark Pi") val spark = new SparkContext(conf) val slices = if (args.length > 0) args(0).toInt else 2 val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow val count = spark.parallelize(1 until n, slices).map { i => val x = random * 2 - 1 val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) spark.stop() } }
从这个简单的程序中,逐步分析内部的原理。sparkcontext才是spark最精髓的地方,至于之前的master,worker的启动流程与一般的分布式系统无太多差别。
首先创建SparkConf,加载一些spark的配置信息。
创建SparkContext,在创建SparkContext时可以指定preferredNodeLocationData,也可以不指定。
SparkContext创建的过程比较复杂,我们只介绍比较重要的对象及方法
1、listenerBus中可添加各种SparkListener监听器,当任何SparkListenerEvent事件到来时,向所有注册进来的监听器发送事件private[spark] val listenerBus = new LiveListenerBus // "_jobProgressListener" should be set up before creating SparkEnv because when creating // "SparkEnv", some messages will be posted to "listenerBus" and we should not miss them.创建一个_jobProgressListener监听器,sparkenv创建的时候需要使用里面的一些变量信息 _jobProgressListener = new JobProgressListener(_conf) listenerBus.addListener(jobProgressListener)
2、persistentRdds用于缓存RDD在内存中
3、创建SparkEnv -> 调用createDriverEnv// Keeps track of all persisted RDDs private[spark] val persistentRdds = new TimeStampedWeakValueHashMap[Int, RDD[_]]
// Create the Spark execution environment (cache, map output tracker, etc) _env = createSparkEnv(_conf, isLocal, listenerBus)
这是一个比较复杂的过程,通过查看此处代码你可以发现更多的spark的思想。 流程:1)创建driver的ActorRef,并包装在rpcEnv中
2)创建mapOutputTracker,实际类型为MapOutputTrackerMaster,用于跟踪map output的信息。并将该对象注册到MapOutputTrackerMasterEndpoint中。说明一下注册的作用:注册返回mapOutputTracker.trackerEndpoint(ActorRef类型),之后向该ActorRef发送消息会回调mapOutputTracker中的相关方法。比如发送AkkaMessage消息,会回调MapOutputTrackerMasterEndpoint的receiveAndReply或者receive方法。val rpcEnv = RpcEnv.create(actorSystemName, hostname, port, conf, securityManager, clientMode = !isDriver) //默认是netty
def registerOrLookupEndpoint( name: String, endpointCreator: => RpcEndpoint): RpcEndpointRef = { if (isDriver) { logInfo("Registering " + name) rpcEnv.setupEndpoint(name, endpointCreator) } else { RpcUtils.makeDriverRef(name, conf, rpcEnv) } } val mapOutputTracker = if (isDriver) { new MapOutputTrackerMaster(conf) } else { new MapOutputTrackerWorker(conf) } // Have to assign trackerActor after initialization as MapOutputTrackerActor // requires the MapOutputTracker itself mapOutputTracker.trackerEndpoint = registerOrLookupEndpoint(MapOutputTracker.ENDPOINT_NAME, new MapOutputTrackerMasterEndpoint( rpcEnv, mapOutputTracker.asInstanceOf[MapOutputTrackerMaster], conf))
3)创建shuffleManager,默认是org.apache.spark.shuffle.hash.HashShuffleManager
4)创建MemoryManagerval shortShuffleMgrNames = Map( "hash" -> "org.apache.spark.shuffle.hash.HashShuffleManager", "sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager", "tungsten-sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager") val shuffleMgrName = conf.get("spark.shuffle.manager", "sort") val shuffleMgrClass = shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase, shuffleMgrName) val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass)
5)创建blockTransferService默认是netty,shuffle时读取块的服务val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false) val memoryManager: MemoryManager = if (useLegacyMemoryManager) { new StaticMemoryManager(conf, numUsableCores) } else { UnifiedMemoryManager(conf, numUsableCores) }
6)创建blockManagerMaster,负责记录下所有BlockIds存储在哪个Worker上val blockTransferService = new NettyBlockTransferService(conf, securityManager, numUsableCores)
7)创建blockManager,提供真正的接口用于读写val blockManagerMaster = new BlockManagerMaster(registerOrLookupEndpoint( BlockManagerMaster.DRIVER_ENDPOINT_NAME, new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)), conf, isDriver)
val blockManager = new BlockManager(executorId, rpcEnv, blockManagerMaster, serializer, conf, memoryManager, mapOutputTracker, shuffleManager, blockTransferService, securityManager, numUsableCores)
8)创建broadcastManagerval broadcastManager = new BroadcastManager(isDriver, conf, securityManager)
9)创建cacheManager,它是依赖于blockManager的,RDD在进行计算的时候,通过CacheManager来获取数据,并通过CacheManager来存储计算结果10)创建metricsSystem对象val cacheManager = new CacheManager(blockManager)
12)创建outputCommitCoordinatorval metricsSystem = if (isDriver) { // Don't start metrics system right now for Driver. // We need to wait for the task scheduler to give us an app ID. // Then we can start the metrics system. MetricsSystem.createMetricsSystem("driver", conf, securityManager) } else { // We need to set the executor ID before the MetricsSystem is created because sources and // sinks specified in the metrics configuration file will want to incorporate this executor's // ID into the metrics they report. conf.set("spark.executor.id", executorId) val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager) ms.start() ms }
11)创建httpFileServer,Driver和Executor在运行的时候都有可能存在第三方包依赖,Driver比较简单,spark-submit在提交的时候会指定所要依赖的jar文件从哪里读取;Executor由worker来启动,worker需要下载Executor启动时所需要的jar文件。为了解决Executor启动时依赖的Jar问题,Driver在启动的时候要启动HttpFileServer存储第三方jar包,然后由worker从HttpFileServer来获取。
将上面的对象共同包装成SparkEnvval outputCommitCoordinator = mockOutputCommitCoordinator.getOrElse { new OutputCommitCoordinator(conf, isDriver) }
4、创建_metadataCleaner,定期清理元数据信息val envInstance = new SparkEnv( executorId, rpcEnv, actorSystem, serializer, closureSerializer, cacheManager, mapOutputTracker, shuffleManager, broadcastManager, blockTransferService, blockManager, securityManager, sparkFilesDir, metricsSystem, memoryManager, outputCommitCoordinator, conf) // Add a reference to tmp dir created by driver, we will delete this tmp dir when stop() is // called, and we only need to do it for driver. Because driver may run as a service, and if we // don't delete this tmp dir when sc is stopped, then will create too many tmp dirs. if (isDriver) { envInstance.driverTmpDirToDelete = Some(sparkFilesDir) } envInstance
5、创建_statusTracker,对象状态的监控6、创建executorEnvs,Executor相关的配置7、_heartbeatReceiver,用于接收Executor的心跳,同时,也会起一个定时器检测Executor是否过期8、调用createTaskScheduler方法创建_taskScheduler和_schedulerBackend 1)根据master来区分运行的逻辑,我们以standalone模式(spark://开头)为例讲解 2)taskscheduler实际创建的是TaskSchedulerImpl,backend实际是SparkDeploySchedulerBackend,而SparkDeploySchedulerBackend本身拓展自CoarseGrainedSchedulerBackend。CoarseGrainedSchedulerBackend是一个基于Akka Actor实现的粗粒度的资源调度类,在整个SparkJob运行期间,CoarseGrainedSchedulerBackend会监听并持有注册给它的Executor资源,并且接收Executor注册,状态更新,响应Scheduler请求等,根据现有Executor资源发起任务调度流程。总之,两者是互相协作,分工合作,共同完成整个任务调度的流程。_statusTracker = new SparkStatusTracker(this)
3)scheduler的初始化 这里需要说明一下Pool的作用:每个SparkContext可能同时存在多个可运行的没有依赖关系任务集,这些任务集之间如何调度,则是由pool来决定的,默认是FIFO,其他还有Fair调度器
9、创建_dagScheduler,它是根据我们的程序来划分stage,构建有依赖关系的任务集。DAGscheduler内部会开启事件循环器,轮询处理接收到的事件 9、调用_taskScheduler.start() -> backend.start(),创建driverEndpoint,用于向外界的交互,构建运行Executor所需要的环境,包括Appname,每个Executor上需要的cores、memory,classpath,jar以及参数,指定运行的类为org.apache.spark.executor.CoarseGrainedExecutorBackend,封装成ApplicationDescription。并将ApplicationDescription以及masters等封装成AppClient,作为App向masters提交的入口。
override def start() { backend.start() if (!isLocal && conf.getBoolean("spark.speculation", false)) { logInfo("Starting speculative execution thread") speculationScheduler.scheduleAtFixedRate(new Runnable { override def run(): Unit = Utils.tryOrStopSparkContext(sc) { checkSpeculatableTasks() } }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS) } }
override def start() { super.start() launcherBackend.connect() // The endpoint for executors to talk to us val driverUrl = rpcEnv.uriOf(SparkEnv.driverActorSystemName, RpcAddress(sc.conf.get("spark.driver.host"), sc.conf.get("spark.driver.port").toInt), CoarseGrainedSchedulerBackend.ENDPOINT_NAME) val args = Seq( "--driver-url", driverUrl, "--executor-id", "{{EXECUTOR_ID}}", "--hostname", "{{HOSTNAME}}", "--cores", "{{CORES}}", "--app-id", "{{APP_ID}}", "--worker-url", "{{WORKER_URL}}") val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions") .map(Utils.splitCommandString).getOrElse(Seq.empty) val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath") .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil) val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath") .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil) // When testing, expose the parent class path to the child. This is processed by // compute-classpath.{cmd,sh} and makes all needed jars available to child processes // when the assembly is built with the "*-provided" profiles enabled. val testingClassPath = if (sys.props.contains("spark.testing")) { sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq } else { Nil } // Start executors with a few necessary configs for registering with the scheduler val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf) val javaOpts = sparkJavaOpts ++ extraJavaOpts val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend", args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts) val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("") val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt) val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor) client = new AppClient(sc.env.rpcEnv, masters, appDesc, this, conf) client.start() launcherBackend.setState(SparkAppHandle.State.SUBMITTED) waitForRegistration() launcherBackend.setState(SparkAppHandle.State.RUNNING) }
可以看到,其实只是向masters的actorRef的发送RegisterApplication消息。/** * Register with all masters asynchronously and returns an array `Future`s for cancellation. */ private def tryRegisterAllMasters(): Array[JFuture[_]] = { for (masterAddress <- masterRpcAddresses) yield { registerMasterThreadPool.submit(new Runnable { override def run(): Unit = try { if (registered.get) { return } logInfo("Connecting to master " + masterAddress.toSparkURL + "...") val masterRef = rpcEnv.setupEndpointRef(Master.SYSTEM_NAME, masterAddress, Master.ENDPOINT_NAME) masterRef.send(RegisterApplication(appDescription, self)) } catch { case ie: InterruptedException => // Cancelled case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e) } }) } }
我们继续看master收到这个消息如何处理? 请查看Master的源码