源码学习
spark源码注释中有下面一句话:
Asynchronously passes SparkListenerEvents to registered SparkListeners
即所有spark消息SparkListenerEvents 被异步的发送给已经注册过的SparkListeners.
在SparkContext中, 首先会创建LiveListenerBus实例,这个类主要功能如下:
- 保存有消息队列,负责消息的缓存
- 保存有注册过的listener,负责消息的分发
该类的继承层次如下所示:
listener链表保存在ListenerBus类中,为了保证并发访问的安全性,此处采用Java的CopyOnWriteArrayList类来存储listener. 当需要对listener链表进行更改时,CopyOnWriteArrayList的特性使得会先复制整个链表,然后在复制的链表上面进行修改.当一旦获得链表的迭代器,在迭代器的生命周期中,可以保证数据的一致性.
private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {
// Marked `private[spark]` for access in tests.
private[spark] val listeners = new CopyOnWriteArrayList[L]
/**
* Add a listener to listen events. This method is thread-safe and can be called in any thread.
*/
final def addListener(listener: L) {
listeners.add(listener)
}
...
消息队列实际上是保存在类AsynchronousListenerBus中的:
private val EVENT_QUEUE_CAPACITY = 10000
private val eventQueue = new LinkedBlockingQueue[E](EVENT_QUEUE_CAPACITY)
事件队列的长度为10000,当缓存事件数量达到上限后,新来的事件会被丢弃,具体的丢弃处理函数位于LiveListenerBus类中:
private[spark] class LiveListenerBus
extends AsynchronousListenerBus[SparkListener, SparkListenerEvent]("SparkListenerBus")
with SparkListenerBus {
private val logDroppedEvent = new AtomicBoolean(false)
override def onDropEvent(event: SparkListenerEvent): Unit = {
if (logDroppedEvent.compareAndSet(false, true)) {
// Only log the following message once to avoid duplicated annoying logs.
logError("Dropping SparkListenerEvent because no remaining room in event queue. " +
"This likely means one of the SparkListeners is too slow and cannot keep up with " +
"the rate at which tasks are being started by the scheduler.")
}
}
}
通过上面代码可以看到, 处理方式输出错误日志,且通过使用变量logDroppedEvent来保证仅输出一次.
继续把目光放在类AsynchronousListenerBus上,该类是消息机制的核心.既然是消息队列,就涉及到消息的生产和消费.首先来看消息的消费方式,AsynchronousL