1.看看SparkListener提交的过程,如何提交的?
LiveListenerBus类中大小和存储结构,存储通过LinkedBlockingQueue来管理的,里面存放事件类型的数据。
private lazy val EVENT_QUEUE_CAPACITY = validateAndGetQueueSize()
private lazy val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](EVENT_QUEUE_CAPACITY)
//队列的大小
private def validateAndGetQueueSize(): Int = {
val queueSize = sparkContext.conf.get(LISTENER_BUS_EVENT_QUEUE_SIZE)
if (queueSize <= 0) {
throw new SparkException("spark.scheduler.listenerbus.eventqueue.size must be > 0!")
}
queueSize
}
//listener大小
private[spark] val LISTENER_BUS_EVENT_QUEUE_SIZE =
ConfigBuilder("spark.scheduler.listenerbus.eventqueue.size")
.intConf
.createWithDefault(10000)
事件是如何在队里中如何读取出来的呢?
//监听线程的定义
private val listenerThread = new Thread(name) {
setDaemon(true)
override def run(): Unit = Utils.tryOrStopSparkContext(sparkContext) {
LiveListenerBus.withinListenerThread.withValue(true) {
while (true) {
eventLock.acquire()
self.synchronized {
//标识此对象开始处理事件
processingEvent = true
}
try {
//事件队列中取出事件
val event = eventQueue.poll
if (event == null) {
//没有事件了就退出
if (!stopped.get) {
throw new IllegalStateException("Polling `null` from eventQueue means" +
" the listener bus has been stopped. So `stopped` must be true")
}
return
}
//发送事件
postToAll(event)
} finally {
self.synchronized {
//结束退出
processingEvent = false
}
}
}
}
}
}
使用ListenerBus类
如何调用到对应的事件处理方法呢?
def postToAll(event: E): Unit = {
//listeners是一个CopyOnWriteArrayList对象
val iter = listeners.iterator
//迭代这个对象
while (iter.hasNext) {
val listener = iter.next()
try {
doPostEvent(listener, event)
} catch {
case NonFatal(e) =>
logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)
}
}
}
然后由SparkListenerBus匹配对应的Job随之提交
protected override def doPostEvent(
listener: SparkListenerInterface,
event: SparkListenerEvent): Unit = {
event match {
case stageSubmitted: SparkListenerStageSubmitted =>
listener.onStageSubmitted(stageSubmitted)
case stageCompleted: SparkListenerStageCompleted =>
listener.onStageCompleted(stageCompleted)
case jobStart: SparkListenerJobStart =>
listener.onJobStart(jobStart)
case jobEnd: SparkListenerJobEnd =>
listener.onJobEnd(jobEnd)
case taskStart: SparkListenerTaskStart =>
listener.onTaskStart(taskStart)
case taskGettingResult: SparkListenerTaskGettingResult =>
listener.onTaskGettingResult(taskGettingResult)
case taskEnd: SparkListenerTaskEnd =>
listener.onTaskEnd(taskEnd)
case environmentUpdate: SparkListenerEnvironmentUpdate =>
listener.onEnvironmentUpdate(environmentUpdate)
case blockManagerAdded: SparkListenerBlockManagerAdded =>
listener.onBlockManagerAdded(blockManagerAdded)
case blockManagerRemoved: SparkListenerBlockManagerRemoved =>
listener.onBlockManagerRemoved(blockManagerRemoved)
case unpersistRDD: SparkListenerUnpersistRDD =>
listener.onUnpersistRDD(unpersistRDD)
case applicationStart: SparkListenerApplicationStart =>
listener.onApplicationStart(applicationStart)
case applicationEnd: SparkListenerApplicationEnd =>
listener.onApplicationEnd(applicationEnd)
case metricsUpdate: SparkListenerExecutorMetricsUpdate =>
listener.onExecutorMetricsUpdate(metricsUpdate)
case executorAdded: SparkListenerExecutorAdded =>
listener.onExecutorAdded(executorAdded)
case executorRemoved: SparkListenerExecutorRemoved =>
listener.onExecutorRemoved(executorRemoved)
case blockUpdated: SparkListenerBlockUpdated =>
listener.onBlockUpdated(blockUpdated)
case logStart: SparkListenerLogStart => // ignore event log metadata
case _ => listener.onOtherEvent(event)
}
}
我们看了从事件到提交的过程,那么问题如下:
1.这个提交线程如何启动?
def start(): Unit = {
/cas判断一下是否还未启动
if (started.compareAndSet(false, true)) {
//线程启动
listenerThread.start()
} else {
throw new IllegalStateException(s"$name already started!")
}
}
2.event是怎么来的?
继续看源码
LiveListenerBus类代码
def post(event: SparkListenerEvent): Unit = {
//看一下是否停了
if (stopped.get) {
return
}
//将事件添加进去
val eventAdded = eventQueue.offer(event)
if (eventAdded) {
eventLock.release()
} else {
//删除事件
onDropEvent(event)
//标记一下
droppedEventsCounter.incrementAndGet()
}
val droppedEvents = droppedEventsCounter.get
//得到是否存在删除的事件
if (droppedEvents > 0) {
// Don't log too frequently
if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) {
// There may be multiple threads trying to decrease droppedEventsCounter.
// Use "compareAndSet" to make sure only one thread can win.
// And if another thread is increasing droppedEventsCounter, "compareAndSet" will fail and
// then that thread will update it.
if (droppedEventsCounter.compareAndSet(droppedEvents, 0)) {
val prevLastReportTimestamp = lastReportTimestamp
lastReportTimestamp = System.currentTimeMillis()
logWarning(s"Dropped $droppedEvents SparkListenerEvents since " +
new java.util.Date(prevLastReportTimestamp))
}
}
}
}
3.listener是如何注册上去的?
listener这个对象通常是在listenerBus的分支上,ListenerBus类中确实存在这个方法.但是这个方法不能被继承(客户不能继承这个类直接添加了):
final def addListener(listener: L): Unit = { listeners.add(listener) }
这个类的调用可以在sparkContext中找到:
果然有,那我就可以实现SparkListenerInterface后调用addSparkListener添加了:
def addSparkListener(listener: SparkListenerInterface) {
listenerBus.addListener(listener)
}
再看看spark加载自己的咋加载的?
SparkContext类中:
private def setupAndStartListenerBus(): Unit = {
try {
//加载配置文件信息
val listenerClassNames: Seq[String] =
conf.get("spark.extraListeners", "").split(',').map(_.trim).filter(_ != "")
遍历配置信息
for (className <- listenerClassNames) {
//构建对象
val constructors = {
val listenerClass = Utils.classForName(className)
//这个方式可以学习一下,当有多个类需要构造方法时,如何做
listenerClass
.getConstructors
.asInstanceOf[Array[Constructor[_ <: SparkListenerInterface]]]
}
//找一下有没有SparkConf类型的参数
val constructorTakingSparkConf = constructors.find { c =>
c.getParameterTypes.sameElements(Array(classOf[SparkConf]))
}
lazy val zeroArgumentConstructor = constructors.find { c =>
c.getParameterTypes.isEmpty
}
val listener: SparkListenerInterface = {
if (constructorTakingSparkConf.isDefined) {
//有的话就初始化一下
constructorTakingSparkConf.get.newInstance(conf)
} else if (zeroArgumentConstructor.isDefined) {
zeroArgumentConstructor.get.newInstance()
} else {
throw new SparkException(
s"$className did not have a zero-argument constructor or a" +
" single-argument constructor that accepts SparkConf. Note: if the class is" +
" defined inside of another Scala class, then its constructors may accept an" +
" implicit parameter that references the enclosing class; in this case, you must" +
" define the listener as a top-level class in order to prevent this extra" +
" parameter from breaking Spark's ability to find a valid constructor.")
}
}
listenerBus.addListener(listener)
//开始监听
listenerBus.start()
_listenerBusStarted = true
4.假设一个场景,当我job开始跑的时候,如何发送消息呢?
我们先逆推,首先肯定是由监听,那么我们可以找到JobListener,很自然的找到JobWaiter对象。
找到DAGScheduler开始的发送消息。
本质是啥?
还是一个继承实现
5.这个listener是如何设计的?
总结就是:一生二,二生三,三生万物。这就是listener的设计思想。太虚的话觉得实际点那就是软件工程了,又是一门学问。后面加上。