简单例子
object sum {
def main(args: Array[String]): Unit = {
val conf =new SparkConf().setAppName("SUM");
conf.setMaster("local[3]")
val size=1024*1024*1024;
val sc=new SparkContext(conf);
val data=sc.parallelize( 1 to 10000)
val d=data.reduce(sum(_, _))
println(d);
}
def sum(x:Int ,y:Int):Int ={
val s=x+y;
Thread.sleep(1000);
return s;
}
}
Spark Driver用于提交用户程序,实际上可以看做spark的客户端。
Spark Driver的初始化始终围绕着sparkContext的初始化,sparkContext可以算得上是spark应用程序的发动机引擎。SparkContext初始完毕,才能向spark集群提交任务。
SparkConf实例化
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
import SparkConf._
/** Create a SparkConf that loads defaults from system properties and the classpath */
def this() = this(true)
private val settings = new ConcurrentHashMap[String, String]()
if (loadDefaults) {
// Load any spark.* system properties
for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
set(key, value)
}
}
SparkConf实例化对象conf,把系统环境变量中"spark."开头的属性保留下来。
conf.setAppName(),conf.setMaster()是必设选项,当然也可以通过命令参数设置。
SparkConf实例化对象会在SparkContext实例化时进行校验及配置数据:
_conf = config.clone()
_conf.validateSettings()
if (!_conf.contains("spark.master")) {
throw new SparkException("A master URL must be set in your configuration")
}
if (!_conf.contains("spark.app.name")) {
throw new SparkException("An application name must be set in your configuration")
}
// System property spark.yarn.app.id must be set if user code ran by AM on a YARN cluster
// yarn-standalone is deprecated, but still supported
if ((master == "yarn-cluster" || master == "yarn-standalone") &&
!_conf.contains("spark.yarn.app.id")) {
throw new SparkException("Detected yarn-cluster mode, but isn't running on a cluster. " +
"Deployment to YARN is not supported directly by SparkContext. Please use spark-submit.")
}
if (_conf.getBoolean("spark.logConf", false)) {
logInfo("Spark configuration:\n" + _conf.toDebugString)
}
// Set Spark driver host and port system properties
_conf.setIfMissing("spark.driver.host", Utils.localHostName())
_conf.setIfMissing("spark.driver.port", "0")
...................................
SparkContext实例化
SparkContext类中半生对象代码块(核心部分)
SparkContext的关键属性,组成了SparkContext对象的状态,需要在实例化中赋值:
/* ------------------------------------------------------------------------------------- *
| Private variables. These variables keep the internal state of the context, and are |
| not accessible by the outside world. They're mutable since we want to initialize all |
| of them to some neutral value ahead of time, so that calling "stop()" while the |
| constructor is still running is safe. |
* ------------------------------------------------------------------------------------- */
private var _conf: SparkConf = _
private var _eventLogDir: Option[URI] = None
private var _eventLogCodec: Option[String] = None
private var _env: SparkEnv = _
private var _metadataCleaner: MetadataCleaner = _
private var _jobProgressListener: JobProgressListener = _
private var _statusTracker: SparkStatusTracker = _
private var _progressBar: Option[ConsoleProgressBar] = None
private var _ui: Option[SparkUI] = None
private var _hadoopConfiguration: Configuration = _
private var _executorMemory: Int = _
private var _schedulerBackend: SchedulerBackend = _
private var _taskScheduler: TaskScheduler = _
private var _heartbeatReceiver: RpcEndpointRef = _
@volatile private var _dagScheduler: DAGScheduler = _
private var _applicationId: String = _
private var _applicationAttemptId: Option[String] = None
private var _eventLogger: Option[EventLoggingListener] = None
private var _executorAllocationManager: Option[ExecutorAllocationManager] = None
private var _cleaner: Option[ContextCleaner] = None
private var _listenerBusStarted: Boolean = false
private var _jars: Seq[String] = _
private var _files: Seq[String] = _
private var _shutdownHookRef: AnyRef = _
/* ------------------------------------------------------------------------------------- *
| Accessors and public fields. These provide access to the internal state of the |
| context. |
* ------------------------------------------------------------------------------------- */
1,JobProgressListener
#JobProgressListener监听event,获取相关job,stage信息,并全局保存,用于sparkUI实时展示
//* :: DeveloperApi ::
//* Tracks task-level information to be displayed in the UI.
// Application:
@volatile var startTime = -1L
@volatile var endTime = -1L
// Jobs:
val activeJobs = new HashMap[JobId, JobUIData]
val completedJobs = ListBuffer[JobUIData]()
val failedJobs = ListBuffer[JobUIData]()
val jobIdToData = new HashMap[JobId, JobUIData]
val jobGroupToJobIds = new HashMap[JobGroupId, HashSet[JobId]]
// Stages:
val pendingStages = new HashMap[StageId, StageInfo]
val activeStages = new HashMap[StageId, StageInfo]
val completedStages = ListBuffer[StageInfo]()
val skippedStages = ListBuffer[StageInfo]()
val failedStages = ListBuffer[StageInfo]()
val stageIdToData = new HashMap[(StageId, StageAttemptId), StageUIData]
val stageIdToInfo = new HashMap[StageId, StageInfo]
val stageIdToActiveJobIds = new HashMap[StageId, HashSet[JobId]]
val poolToActiveStages = HashMap[PoolName, HashMap[StageId, StageInfo]]()
// Total of completed and failed stages that have ever been run. These may be greater than
// `completedStages.size` and `failedStages.size` if we have run more stages or jobs than
// JobProgressListener's retention limits.
var numCompletedStages = 0
var numFailedStages = 0
var numCompletedJobs = 0
var numFailedJobs = 0
// Misc:
val executorIdToBlockManagerId = HashMap[ExecutorId, BlockManagerId]()
def blockManagerIds: Seq[BlockManagerId] = executorIdToBlockManagerId.values.toSeq
var schedulingMode: Option[SchedulingMode] = None
// To limit the total memory usage of JobProgressListener, we only track information for a fixed
// number of non-active jobs and stages (there is no limit for active jobs and stages):
val retainedStages = conf.getInt("spark.ui.retainedStages", SparkUI.DEFAULT_RETAINED_STAGES)
val retainedJobs = conf.getInt("spark.ui.retainedJobs", SparkUI.DEFAULT_RETAINED_JOBS)
2,LiveListenerBus
spark中的listener都是通过listenerBus统一管理的
#listenerBus = new LiveListenerBus
listenerBus.addListener(jobProgressListener)
类的集成关系:
class LiveListenerBus extends AsynchronousListenerBus[SparkListener, SparkListenerEvent]("SparkListenerBus")
with SparkListenerBus
abstract class AsynchronousListenerBus[L <: AnyRef, E](name: String) extends ListenerBus[L, E]
#trait ListenerBus.postToAll(~)方法:
#trait SparkListenerBus.onPostEvent(~)实现了ListenerBus的方法,通过event类型匹配把event传递到对应的listener
/**
* Post the event to all registered listeners. The `postToAll` caller should guarantee calling
* `postToAll` in the same thread for all events.
*/
final def postToAll(event: E): Unit = {
// JavaConverters can create a JIterableWrapper if we use asScala.
// However, this method will be called frequently. To avoid the wrapper cost, here ewe use
// Java Iterator directly.
val iter = listeners.iterator
while (iter.hasNext) {
val listener = iter.next()
try {
onPostEvent(listener, event)
} catch {
case NonFatal(e) =>
logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)
}
}
}
setupAndStartListenerBus()方法:
通过spark参数(spark.extraListeners)向listenerBus中添加自定义监听器,调用listenerBus.start()方法
spark.extraListeners参数以逗号分隔,通过反射获得listener对象。
conf.get("spark.extraListeners", "").split(',').map(_.trim).filter(_ != "")
...........
listenerBus.addListener(listener)
listenerBus.start(this)
_listenerBusStarted = true
postEnvironmentUpdate() 方法触发系统环境信息事件,更新信息。
/** Post the environment update event once the task scheduler is ready */
private def postEnvironmentUpdate() {
if (taskScheduler != null) {
val schedulingMode = getSchedulingMode.toString
val addedJarPaths = addedJars.keys.toSeq
val addedFilePaths = addedFiles.keys.toSeq
val environmentDetails = SparkEnv.environmentDetails(conf, schedulingMode, addedJarPaths,
addedFilePaths)
val environmentUpdate = SparkListenerEnvironmentUpdate(environmentDetails)
listenerBus.post(environmentUpdate)
}
}
postApplicationStart() 方法触application start事件:
/** Post the application start event */
private def postApplicationStart() {
// Note: this code assumes that the task scheduler has been initialized and has contacted
// the cluster manager to get an application ID (in case the cluster manager provides one).
listenerBus.post(SparkListenerApplicationStart(appName, Some(applicationId),
startTime, sparkUser, applicationAttemptId, schedulerBackend.getDriverLogUrls))
}
3,SparkEnv
// Create the Spark execution environment (cache, map output tracker, etc)
//Helper method to create a SparkEnv for a driver or an executor.
_env = createSparkEnv(_conf, isLocal, listenerBus)
SparkEnv.set(_env)
SparkEnv包含spark运行时环境信息,所有driver、executor都有对应的SparkEnv实例;
通过SparkEnv.create(~)方法创建实例,方法中调用SparkEnv()构造方法:
val envInstance = new SparkEnv(
executorId,
rpcEnv,
actorSystem,
serializer,
closureSerializer,
cacheManager,
mapOutputTracker,
shuffleManager,
broadcastManager,
blockTransferService,
blockManager,
securityManager,
sparkFilesDir,
metricsSystem,
memoryManager,
outputCommitCoordinator,
conf)
4,_metadataCleaner
//this.cleanup Called by MetadataCleaner to clean up the persistentRdds map periodically
//创建一个Timer,周期性清理内存中的RDD
_metadataCleaner = new MetadataCleaner(MetadataCleanerType.SPARK_CONTEXT, this.cleanup, _conf)、
5,_statusTracker
_statusTracker = new SparkStatusTracker(this)
/**
* Low-level status reporting APIs for monitoring job and stage progress.
*
* These APIs intentionally provide very weak consistency semantics; consumers of these APIs should
* be prepared to handle empty / missing information. For example, a job's stage ids may be known
* but the status API may not have any information about the details of those stages, so
* `getStageInfo` could potentially return `None` for a valid stage id.
*
* To limit memory usage, these APIs only provide information on recent jobs / stages. These APIs
* will provide information for the last `spark.ui.retainedStages` stages and
* `spark.ui.retainedJobs` jobs.
*/
SparkStatusTracker的私有属性:jobProgressListener = sc.jobProgressListener
状态追踪器(statusTracker)就是通过jobProgressListener获取相关的job、stage信息。
6,_progressBar
在控制台显示进度条。进度条显示在最后一个输出之后的下一行中,继续覆盖自己以保持在一行中。
/**
* ConsoleProgressBar shows the progress of stages in the next line of the console. It poll the
* status of active stages from `sc.statusTracker` periodically, the progress bar will be showed
* up after the stage has ran at least 500ms. If multiple stages run in the same time, the status
* of them will be combined together, showed in one line.
*/
_progressBar =
if (_conf.getBoolean("spark.ui.showConsoleProgress", true) && !log.isInfoEnabled) {
Some(new ConsoleProgressBar(this))
} else {
None
}
7,SparkUI
_ui =
if (conf.getBoolean("spark.ui.enabled", true)) {
Some(SparkUI.createLiveUI(this, _conf, listenerBus, _jobProgressListener,
_env.securityManager, appName, startTime = startTime))
} else {
// For tests, do not enable the UI
None
}
// Bind the UI before starting the task scheduler to communicate
// the bound port to the cluster manager properly
_ui.foreach(_.bind())
SparkUI.create(~)创建实例,其中会实例化各种监听器,并添加到SparkListenerBus中,统一监听各种event
/**
* Create a new Spark UI.
*
* @param sc optional SparkContext; this can be None when reconstituting a UI from event logs.
* @param jobProgressListener if supplied, this JobProgressListener will be used; otherwise, the
* web UI will create and register its own JobProgressListener.
*/
private def create(
sc: Option[SparkContext],
conf: SparkConf,
listenerBus: SparkListenerBus,
securityManager: SecurityManager,
appName: String,
basePath: String = "",
jobProgressListener: Option[JobProgressListener] = None,
startTime: Long): SparkUI = {
val _jobProgressListener: JobProgressListener = jobProgressListener.getOrElse {
val listener = new JobProgressListener(conf)
listenerBus.addListener(listener)
listener
}
val environmentListener = new EnvironmentListener
val storageStatusListener = new StorageStatusListener
val executorsListener = new ExecutorsListener(storageStatusListener)
val storageListener = new StorageListener(storageStatusListener)
val operationGraphListener = new RDDOperationGraphListener(conf)
listenerBus.addListener(environmentListener)
listenerBus.addListener(storageStatusListener)
listenerBus.addListener(executorsListener)
listenerBus.addListener(storageListener)
listenerBus.addListener(operationGraphListener)
new SparkUI(sc, conf, securityManager, environmentListener, storageStatusListener,
executorsListener, _jobProgressListener, storageListener, operationGraphListener,
appName, basePath, startTime)
}
//SparkUI类中伴生代码块initialize(),实例化对象时执行。
/** Initialize all components of the server. */
def initialize() {
attachTab(new JobsTab(this))
attachTab(stagesTab)
attachTab(new StorageTab(this))
attachTab(new EnvironmentTab(this))
attachTab(new ExecutorsTab(this))
attachHandler(createStaticHandler(SparkUI.STATIC_RESOURCE_DIR, "/static"))
attachHandler(createRedirectHandler("/", "/jobs/", basePath = basePath))
attachHandler(ApiRootResource.getServletHandler(this))
// This should be POST only, but, the YARN AM proxy won't proxy POSTs
attachHandler(createRedirectHandler(
"/stages/stage/kill", "/stages/", stagesTab.handleKillRequest,
httpMethods = Set("GET", "POST")))
}
initialize()
WebUI.bind()方法中调用:JettyUtils.startJettyServer("0.0.0.0", port, handlers, conf, name)
startJettyServer启动sparkUI服务,如果默认端口被占用,port=port+1再尝试。
/**
* Attempt to start a Jetty server bound to the supplied hostName:port using the given
* context handlers.
*
* If the desired port number is contended, continues incrementing ports until a free port is
* found. Return the jetty Server object, the chosen port, and a mutable collection of handlers.
*/
def startJettyServer(
hostName: String,
port: Int,
handlers: Seq[ServletContextHandler],
conf: SparkConf,
serverName: String = ""): ServerInfo = {
addFilters(handlers, conf)
val collection = new ContextHandlerCollection
val gzipHandlers = handlers.map { h =>
val gzipHandler = new GzipHandler
gzipHandler.setHandler(h)
gzipHandler
}
collection.setHandlers(gzipHandlers.toArray)
// Bind to the given port, or throw a java.net.BindException if the port is occupied
def connect(currentPort: Int): (Server, Int) = {
val server = new Server(new InetSocketAddress(hostName, currentPort))
val pool = new QueuedThreadPool
pool.setDaemon(true)
server.setThreadPool(pool)
val errorHandler = new ErrorHandler()
errorHandler.setShowStacks(true)
server.addBean(errorHandler)
server.setHandler(collection)
try {
server.start()
(server, server.getConnectors.head.getLocalPort)
} catch {
case e: Exception =>
server.stop()
pool.stop()
throw e
}
}
val (server, boundPort) = Utils.startServiceOnPort[Server](port, connect, conf, serverName)
ServerInfo(server, boundPort, collection)
}
}
8,_hadoopConfiguration : Configuration
_hadoopConfiguration = SparkHadoopUtil.get.newConfiguration(_conf)
//newConfiguration中的代码块,把spark配置的hadoop信息截取下,作为hadoop的配置信息。
// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar"
conf.getAll.foreach { case (key, value) =>
if (key.startsWith("spark.hadoop.")) {
hadoopConf.set(key.substring("spark.hadoop.".length), value)
}
}
val bufferSize = conf.get("spark.buffer.size", "65536")
hadoopConf.set("io.file.buffer.size", bufferSize)
9,_taskScheduler
// We need to register "HeartbeatReceiver" before "createTaskScheduler" because Executor will
// retrieve "HeartbeatReceiver" in the constructor. (SPARK-6640)
_heartbeatReceiver = env.rpcEnv.setupEndpoint(
HeartbeatReceiver.ENDPOINT_NAME, new HeartbeatReceiver(this))
// Create and start the scheduler
val (sched, ts) = SparkContext.createTaskScheduler(this, master)
_schedulerBackend = sched
_taskScheduler = ts
_dagScheduler = new DAGScheduler(this)
_heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)
// start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
// constructor
_taskScheduler.start()
_applicationId = _taskScheduler.applicationId()
_applicationAttemptId = taskScheduler.applicationAttemptId()
_conf.set("spark.app.id", _applicationId)
_ui.foreach(_.setAppId(_applicationId))
_env.blockManager.initialize(_applicationId)
env.rpcEnv.setupEndpoint(~)方法获得_heartbeatReceiver 对象
创建TaskScheduler,并启动,获得applicationId,配置化相关模块
SparkContext.createTaskScheduler(~)方法如下:
/**
* Create a task scheduler based on a given master URL.
* Return a 2-tuple of the scheduler backend and the task scheduler.
*/
private def createTaskScheduler(
sc: SparkContext,
master: String): (SchedulerBackend, TaskScheduler) = {
import SparkMasterRegex._
// When running locally, don't try to re-execute tasks on failure.
val MAX_LOCAL_TASK_FAILURES = 1
master match {
case "local" =>
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(sc.getConf, scheduler, 1)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_N_REGEX(threads) =>
def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
// local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
val threadCount = if (threads == "*") localCpuCount else threads.toInt
if (threadCount <= 0) {
throw new SparkException(s"Asked to run locally with $threadCount threads")
}
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(sc.getConf, scheduler, threadCount)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
// local[*, M] means the number of cores on the computer with M failures
// local[N, M] means exactly N threads with M failures
val threadCount = if (threads == "*") localCpuCount else threads.toInt
val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, isLocal = true)
val backend = new LocalBackend(sc.getConf, scheduler, threadCount)
scheduler.initialize(backend)
(backend, scheduler)
case SPARK_REGEX(sparkUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
// Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang.
val memoryPerSlaveInt = memoryPerSlave.toInt
if (sc.executorMemory > memoryPerSlaveInt) {
throw new SparkException(
"Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format(
memoryPerSlaveInt, sc.executorMemory))
}
val scheduler = new TaskSchedulerImpl(sc)
val localCluster = new LocalSparkCluster(
numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt, sc.conf)
val masterUrls = localCluster.start()
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
backend.shutdownCallback = (backend: SparkDeploySchedulerBackend) => {
localCluster.stop()
}
(backend, scheduler)
case "yarn-standalone" | "yarn-cluster" =>
if (master == "yarn-standalone") {
logWarning(
"\"yarn-standalone\" is deprecated as of Spark 1.0. Use \"yarn-cluster\" instead.")
}
val scheduler = try {
val clazz = Utils.classForName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
val cons = clazz.getConstructor(classOf[SparkContext])
cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
} catch {
// TODO: Enumerate the exact reasons why it can fail
// But irrespective of it, it means we cannot proceed !
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
val backend = try {
val clazz =
Utils.classForName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
scheduler.initialize(backend)
(backend, scheduler)
case "yarn-client" =>
val scheduler = try {
val clazz = Utils.classForName("org.apache.spark.scheduler.cluster.YarnScheduler")
val cons = clazz.getConstructor(classOf[SparkContext])
cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
val backend = try {
val clazz =
Utils.classForName("org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend")
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
scheduler.initialize(backend)
(backend, scheduler)
case MESOS_REGEX(mesosUrl) =>
MesosNativeLibrary.load()
val scheduler = new TaskSchedulerImpl(sc)
val coarseGrained = sc.conf.getBoolean("spark.mesos.coarse", defaultValue = true)
val backend = if (coarseGrained) {
new CoarseMesosSchedulerBackend(scheduler, sc, mesosUrl, sc.env.securityManager)
} else {
new MesosSchedulerBackend(scheduler, sc, mesosUrl)
}
scheduler.initialize(backend)
(backend, scheduler)
case SIMR_REGEX(simrUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val backend = new SimrSchedulerBackend(scheduler, sc, simrUrl)
scheduler.initialize(backend)
(backend, scheduler)
case zkUrl if zkUrl.startsWith("zk://") =>
logWarning("Master URL for a multi-master Mesos cluster managed by ZooKeeper should be " +
"in the form mesos://zk://host:port. Current Master URL will stop working in Spark 2.0.")
createTaskScheduler(sc, "mesos://" + zkUrl)
case _ =>
throw new SparkException("Could not parse Master URL: '" + master + "'")
}
}
}
/**
* A collection of regexes for extracting information from the master string.
*/
private object SparkMasterRegex {
// Regular expression used for local[N] and local[*] master formats
val LOCAL_N_REGEX = """local\[([0-9]+|\*)\]""".r
// Regular expression for local[N, maxRetries], used in tests with failing tasks
val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+|\*)\s*,\s*([0-9]+)\]""".r
// Regular expression for simulating a Spark cluster of [N, cores, memory] locally
val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
// Regular expression for connecting to Spark deploy clusters
val SPARK_REGEX = """spark://(.*)""".r
// Regular expression for connection to Mesos cluster by mesos:// or mesos://zk:// url
val MESOS_REGEX = """mesos://(.*)""".r
// Regular expression for connection to Simr cluster
val SIMR_REGEX = """simr://(.*)""".r
}
此处简单分析下master:local模式下:
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(sc.getConf, scheduler, 1)
scheduler.initialize(backend)
10,_executorAllocationManager
executor分配管理器,动态调整application占用executor的个数。
此处暂不解释,详情关注:
https://blog.csdn.net/zisheng_wang_data/article/details/51737008
11,metricsSystem
// The metrics system for Driver need to be set spark.app.id to app ID.
// So it should start after we get app ID from the task scheduler and set spark.app.id.
metricsSystem.start()
// Attach the driver metrics servlet handler to the web ui after the metrics system is started.
metricsSystem.getServletHandlers.foreach(handler => ui.foreach(_.attachHandler(handler)))
......................
setupAndStartListenerBus()
postEnvironmentUpdate()
postApplicationStart()
// Post init
_taskScheduler.postStartHook()
_env.metricsSystem.registerSource(_dagScheduler.metricsSource)
_env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager))
MetricsSystem 是为了衡量系统的各种指标的度量系统
。算是一个key-value形态的东西。
举个比较简单的例子,我怎么把当前JVM相关信息展示出去呢?做法自然很多,通过MetricsSystem就可以做的更标准化些,具体方式如下:
-
Source 。数据来源。比如对应的有
org.apache.spark.metrics.source.JvmSource
-
Sink。 数据发送到哪去。有被动和主动。一般主动的是通过定时器来完成输出,譬如CSVSink,被动的如MetricsServlet等需要被用户主动调用。
-
桥接Source 和Sink的则是MetricRegistry了。
运算逻辑处理
通过sparkContext实例获取RDD;
RDD操作处理运算逻辑
withScope方法模块:是用来做DAG可视化的(DAG visualization on SparkUI),把所有创建的RDD的方法都包裹起来,同时用RDDOperationScope 记录 RDD 的操作历史和关联,sparkUI中展示DAG运行关系图
RDD中的reduce方法(action)
传递调用SparkContext.runjob(~)方法,启动dagScheduler.runJob(~)
通过DAGScheduler把各步骤(rdd任务)划分阶段(stage),形成一系列的TaskSet,然后传给TaskScheduler,把具体的Task交给Worker节点上的Executor的线程池处理。线程池中的线程工作,通过BlockManager来读写数据。
DAGScheduler.runJob(~):
DAGScheduler.submitJob(~):
DAGScheduler采用事件传递,把event(job信息)传递到DAGSchedulerEventProcessLoop.onReceive(~)
private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)
case MapStageSubmitted(jobId, dependency, callSite, listener, properties) =>
dagScheduler.handleMapStageSubmitted(jobId, dependency, callSite, listener, properties)
case StageCancelled(stageId) =>
dagScheduler.handleStageCancellation(stageId)
case JobCancelled(jobId) =>
dagScheduler.handleJobCancellation(jobId)
case JobGroupCancelled(groupId) =>
dagScheduler.handleJobGroupCancelled(groupId)
case AllJobsCancelled =>
dagScheduler.doCancelAllJobs()
case ExecutorAdded(execId, host) =>
dagScheduler.handleExecutorAdded(execId, host)
case ExecutorLost(execId) =>
dagScheduler.handleExecutorLost(execId, fetchFailed = false)
case BeginEvent(task, taskInfo) =>
dagScheduler.handleBeginEvent(task, taskInfo)
case GettingResultEvent(taskInfo) =>
dagScheduler.handleGetTaskResult(taskInfo)
case completion @ CompletionEvent(task, reason, _, _, taskInfo, taskMetrics) =>
dagScheduler.handleTaskCompletion(completion)
case TaskSetFailed(taskSet, reason, exception) =>
dagScheduler.handleTaskSetFailed(taskSet, reason, exception)
case ResubmitFailedStages =>
dagScheduler.resubmitFailedStages()
}
dagScheduler.handleJobSubmitted(~):job提交,开始划分stage,首先通过finalRDD(即触发action的RDD)创建ResultStage,通过其逆向追溯,通过识别shuffle操作来划分。
reduce (类似的action操作) 对应的都是 ResultStage
private[scheduler] def handleJobSubmitted(jobId: Int,
finalRDD: RDD[_],
func: (TaskContext, Iterator[_]) => _,
partitions: Array[Int],
callSite: CallSite,
listener: JobListener,
properties: Properties) {
var finalStage: ResultStage = null
try {
// New stage creation may throw an exception if, for example, jobs are run on a
// HadoopRDD whose underlying HDFS files have been deleted.
finalStage = newResultStage(finalRDD, func, partitions, jobId, callSite)
} catch {
case e: Exception =>
logWarning("Creating new stage failed due to exception - job: " + jobId, e)
listener.jobFailed(e)
return
}
val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
clearCacheLocs()
logInfo("Got job %s (%s) with %d output partitions".format(
job.jobId, callSite.shortForm, partitions.length))
logInfo("Final stage: " + finalStage + " (" + finalStage.name + ")")
logInfo("Parents of final stage: " + finalStage.parents)
logInfo("Missing parents: " + getMissingParentStages(finalStage))
val jobSubmissionTime = clock.getTimeMillis()
jobIdToActiveJob(jobId) = job
activeJobs += job
finalStage.setActiveJob(job)
val stageIds = jobIdToStageIds(jobId).toArray
val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo))
listenerBus.post(
SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos, properties))
submitStage(finalStage)
submitWaitingStages()
}
ResultStage对象的建立需要需要(stageId 、parentStages),需要通过getParentStagesAndId(~)方法获取。
getParentStages(~)获得finaiRDD的parentStages,原子操作递增生成stageId
/**
* Helper function to eliminate some code re-use when creating new stages.
*/
private def getParentStagesAndId(rdd: RDD[_], firstJobId: Int): (List[Stage], Int) = {
val parentStages = getParentStages(rdd, firstJobId)
val id = nextStageId.getAndIncrement()
(parentStages, id)
}
dagScheduler通过RDD之间的依赖来分stage,通过ShuffleDependency(宽依赖)来分界:
每个stage都保留其依赖的stage(parentStages);
job的起始stage,其parentStages是NIL
DAGScheduler.getShuffleMapStage(~)处理shufDep得到其对应的Stage
DAGScheduler.shuffleToMapStage保存当前application的stage
若shuffleToMapStage中有当前shufDep的stage就直接返回;
若shuffleToMapStage中没有保存当前shufDep的stage执行如下步骤:
1,getAncestorShuffleDependencies(shuffleDep.rdd)获取当前shufDep之前所有没有保存到shuffleToMapStage的中的shufDep,通过newOrUsedShuffleStage(~)遍历生成stage,保存到shuffleToMapStage中
2,生成当前shufDep的stage,保存到shuffleToMapStage中,并返回。
/**
* Get or create a shuffle map stage for the given shuffle dependency's map side.
*/
private def getShuffleMapStage(
shuffleDep: ShuffleDependency[_, _, _],
firstJobId: Int): ShuffleMapStage = {
shuffleToMapStage.get(shuffleDep.shuffleId) match {
case Some(stage) => stage
case None =>
// We are going to register ancestor shuffle dependencies
getAncestorShuffleDependencies(shuffleDep.rdd).foreach { dep =>
shuffleToMapStage(dep.shuffleId) = newOrUsedShuffleStage(dep, firstJobId)
}
// Then register current shuffleDep
val stage = newOrUsedShuffleStage(shuffleDep, firstJobId)
shuffleToMapStage(shuffleDep.shuffleId) = stage
stage
}
}
getAncestorShuffleDependencies(rdd):rdd逆向回溯,获取所有没有保存到shuffleToMapStage的中的shufDep保存到Stack中,按顺序stack中顶端到低端是父子依赖关系
优化算法:getShuffleMapStage(~)方法中遍历stack,
正向依次生成stage,保存到shuffleToMapStage中
起始RDD预示getAncestorShuffleDependencies(~)方法穷尽job中的了一条rdd依赖线,获得了job的一个起始shuffleDepnedency(firstShufDep)
firstShufDep向前追溯,没有ShuffleDependency,直到起始RDD(创建的RDD),其deps是Nil;
/** Find ancestor shuffle dependencies that are not registered in shuffleToMapStage yet */
private def getAncestorShuffleDependencies(rdd: RDD[_]): Stack[ShuffleDependency[_, _, _]] = {
val parents = new Stack[ShuffleDependency[_, _, _]]
val visited = new HashSet[RDD[_]]
// We are manually maintaining a stack here to prevent StackOverflowError
// caused by recursively visiting
val waitingForVisit = new Stack[RDD[_]]
def visit(r: RDD[_]) {
if (!visited(r)) {
visited += r
for (dep <- r.dependencies) {
dep match {
case shufDep: ShuffleDependency[_, _, _] =>
if (!shuffleToMapStage.contains(shufDep.shuffleId)) {
parents.push(shufDep)
}
case _ =>
}
waitingForVisit.push(dep.rdd)
}
}
}
waitingForVisit.push(rdd)
while (waitingForVisit.nonEmpty) {
visit(waitingForVisit.pop())
}
parents
}
newOrUsedShuffleStage(~)
1,调用newShuffleMapStage(~)生成stage
2,
mapOutputTracker(application级别)判断此shuffleDependence是否做过shuffleMap操作
a,做过,把shufflleMap操作的重分区信息(有效的(?:我认为是异常恢复的task))保存到当前stage中。
b,未做过,注册shuffle,其对应的分区信息Array[MapStatus](partitonsNum)都是空。
/**
* Create a shuffle map Stage for the given RDD. The stage will also be associated with the
* provided firstJobId. If a stage for the shuffleId existed previously so that the shuffleId is
* present in the MapOutputTracker, then the number and location of available outputs are
* recovered from the MapOutputTracker
*/
private def newOrUsedShuffleStage(
shuffleDep: ShuffleDependency[_, _, _],
firstJobId: Int): ShuffleMapStage = {
val rdd = shuffleDep.rdd
val numTasks = rdd.partitions.length
val stage = newShuffleMapStage(rdd, numTasks, shuffleDep, firstJobId, rdd.creationSite)
if (mapOutputTracker.containsShuffle(shuffleDep.shuffleId)) {
val serLocs = mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId)
val locs = MapOutputTracker.deserializeMapStatuses(serLocs)
(0 until locs.length).foreach { i =>
if (locs(i) ne null) {
// locs(i) will be null if missing
stage.addOutputLoc(i, locs(i))
}
}
} else {
// Kind of ugly: need to register RDDs with the cache and map output tracker here
// since we can't do it in the RDD constructor because # of partitions is unknown
logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + ")")
mapOutputTracker.registerShuffle(shuffleDep.shuffleId, rdd.partitions.length)
}
stage
}
newShuffleMapStage(~)方法,此方法和newResultStage几乎一样(就Stage的的实现类不一样)
new~~Stage方法中getParentStagesAndId(~)方法构成了逆向递归;
触发条件是job正向第一个firstShufDep,没有ShuffleDependency;
getParentStages(~)返回结果Nil;nextStageId.getAndIncrement()生成stageId;new~~Stage(~)生成stage实例
/**
* Create a ShuffleMapStage as part of the (re)-creation of a shuffle map stage in
* newOrUsedShuffleStage. The stage will be associated with the provided firstJobId.
* Production of shuffle map stages should always use newOrUsedShuffleStage, not
* newShuffleMapStage directly.
*/
private def newShuffleMapStage(
rdd: RDD[_],
numTasks: Int,
shuffleDep: ShuffleDependency[_, _, _],
firstJobId: Int,
callSite: CallSite): ShuffleMapStage = {
val (parentStages: List[Stage], id: Int) = getParentStagesAndId(rdd, firstJobId)
val stage: ShuffleMapStage = new ShuffleMapStage(id, rdd, numTasks, parentStages,
firstJobId, callSite, shuffleDep)
stageIdToStage(id) = stage
updateJobIdStageIdMaps(firstJobId, stage)
stage
}
如上:
1,通过action操作的RDD,调用newResultStage(~)方法,开始逆推,得到第一个shuffleDependence(lastShufDep)如红框
2,通过getAncestorShuffleDependencies(lastShufDep._dd)逆推获取所有的shuffleDependence,从下往上放依次存入stack中
3,遍历stack从上往下正向依次得到实例化stage,并保存shuffleToMapStage中,这样getShuffleMapStage(~)就正常获取结果(不进入循环中)
stage构造方法需要parentStages,stageId参数
4,红框是分界线,下面部分(包含)是逆向递归的,上面部分正向实例化stage,步骤3完成后,通过lastShufDep实例化lastShufStage(lastShufDep._rdd的parentStage已经保存到shuffleToMapStage中,可直接获取),此lastShufStage也保存到shuffleToMapStage中,递归返回执行newResultStage(~)得到ResultStage实例
(为什么双向设计,单向设计更简单明了)
提交stage : submitStage(finalStage)
finalStage:=>
通过action触发的ResultStage,最后运行的stage
getMissingParentStages(~)逆向识别,返回当前stage的parentStages(未成功运行的)递归执行submitStage(undo
ParentStages),直到运行job的起始stage(firstStage
的parentStages是空值)
/** Submits stage, but first recursively submits any missing parents. */
private def submitStage(stage: Stage) {
val jobId = activeJobForStage(stage)
if (jobId.isDefined) {
logDebug("submitStage(" + stage + ")")
if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
val missing = getMissingParentStages(stage).sortBy(_.id)
logDebug("missing: " + missing)
if (missing.isEmpty) {
logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
submitMissingTasks(stage, jobId.get)
} else {
for (parent <- missing) {
submitStage(parent)
}
waitingStages += stage
}
}
} else {
abortStage(stage, "No active job for stage " + stage.id, None)
}
}
getMissingParentStages(~)方法:
1,rddHasUncachedPartitions:rdd的partition数据是否都已缓存,若已缓存,就不用考虑次条线以上的部分了
2,RDD需要向上追溯来获取数据,直到碰到shufDep,getShuffleMapStage(~)直接获取parentStage(stage划分已经处理)
3,parentStage.isAvailable()判断parentStage的tasks是否成功完成,若不成功,返回parentStage(可能多个)
DAGScheduler.submitMissingTasks(stage,jobId)方法:
1,获取当前stage的未执行的task,判别:partition的map输出信息为空
2,初始化stage每个task的TaskAttemptNumber(默认值:-1)
3,获取最优TaskLocation信息,每个partition的taskLocation可能有多个
sparkUI监控stage信息
4,序列化stage.rdd和stage.shuffleDep,并广播序列化数组taskBinary
5,实例化stage的各partition task
6,taskScheduler.submitTasks(~)提交stage的任务集taskSet
TaskScheduler调度
taskScheduler.submitTasks(~)方法:
1,创建TaskSetManager实例,用于监控task的生命周期
2,当前stage的TaskSet是否正在运行,不能冲突
3,把TaskSetManager实例添加到调度池schdulableBuilder(application级别)中。
4 ,持续打印warning信息(Initial job has not accepted any resources),直到资源分配完成或者取消。
5 ,taskScheduler的SchedulerBackend(和Driver/sparkContext的backend是同一个)调用reviveOffers(~)方法
override def submitTasks(taskSet: TaskSet) {
val tasks = taskSet.tasks
logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
this.synchronized {
val manager = createTaskSetManager(taskSet, maxTaskFailures)
val stage = taskSet.stageId
val stageTaskSets =
taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])
stageTaskSets(taskSet.stageAttemptId) = manager
val conflictingTaskSet = stageTaskSets.exists { case (_, ts) =>
ts.taskSet != taskSet && !ts.isZombie
}
if (conflictingTaskSet) {
throw new IllegalStateException(s"more than one active taskSet for stage $stage:" +
s" ${stageTaskSets.toSeq.map{_._2.taskSet.id}.mkString(",")}")
}
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
if (!isLocal && !hasReceivedTask) {
//持续打印warning信息
starvationTimer.scheduleAtFixedRate(new TimerTask() {
override def run() {
if (!hasLaunchedTask) {
logWarning("Initial job has not accepted any resources; " +
"check your cluster UI to ensure that workers are registered " +
"and have sufficient resources")
} else {
this.cancel()
}
}
}, STARVATION_TIMEOUT_MS, STARVATION_TIMEOUT_MS)
}
hasReceivedTask = true
}
backend.reviveOffers() //主要逻辑在后在后台进程处理
}
CoarseGrainedSchedulerBackend是SchedulerBackend是实现类,基于yarn的资源调度。
CoarseGrainedSchedulerBackend.reviveOffers()方法: