Spark源码之SparkListener

 

 

1.看看SparkListener提交的过程,如何提交的?

LiveListenerBus类中大小和存储结构,存储通过LinkedBlockingQueue来管理的,里面存放事件类型的数据。
private lazy val EVENT_QUEUE_CAPACITY = validateAndGetQueueSize()
private lazy val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](EVENT_QUEUE_CAPACITY)
//队列的大小
private def validateAndGetQueueSize(): Int = {
  val queueSize = sparkContext.conf.get(LISTENER_BUS_EVENT_QUEUE_SIZE)
  if (queueSize <= 0) {
    throw new SparkException("spark.scheduler.listenerbus.eventqueue.size must be > 0!")
  }
  queueSize
}
//listener大小
private[spark] val LISTENER_BUS_EVENT_QUEUE_SIZE =
  ConfigBuilder("spark.scheduler.listenerbus.eventqueue.size")
    .intConf
    .createWithDefault(10000)
事件是如何在队里中如何读取出来的呢?

//监听线程的定义
private val listenerThread = new Thread(name) {
  setDaemon(true)
  override def run(): Unit = Utils.tryOrStopSparkContext(sparkContext) {
    LiveListenerBus.withinListenerThread.withValue(true) {
      while (true) {
        eventLock.acquire()
        self.synchronized {
//标识此对象开始处理事件
          processingEvent = true
        }
        try {
//事件队列中取出事件
          val event = eventQueue.poll
          if (event == null) {
            //没有事件了就退出
            if (!stopped.get) {
              throw new IllegalStateException("Polling `null` from eventQueue means" +
                " the listener bus has been stopped. So `stopped` must be true")
            }
            return
          }
//发送事件
          postToAll(event)
        } finally {
          self.synchronized {
//结束退出
            processingEvent = false
          }
        }
      }
    }
  }
}
使用ListenerBus类
如何调用到对应的事件处理方法呢?
def postToAll(event: E): Unit = {
//listeners是一个CopyOnWriteArrayList对象
  val iter = listeners.iterator
//迭代这个对象
  while (iter.hasNext) {
    val listener = iter.next()
    try {
      doPostEvent(listener, event)
    } catch {
      case NonFatal(e) =>
        logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)
    }
  }
}

然后由SparkListenerBus匹配对应的Job随之提交

protected override def doPostEvent(
    listener: SparkListenerInterface,
    event: SparkListenerEvent): Unit = {
  event match {
    case stageSubmitted: SparkListenerStageSubmitted =>
      listener.onStageSubmitted(stageSubmitted)
    case stageCompleted: SparkListenerStageCompleted =>
      listener.onStageCompleted(stageCompleted)
    case jobStart: SparkListenerJobStart =>
      listener.onJobStart(jobStart)
    case jobEnd: SparkListenerJobEnd =>
      listener.onJobEnd(jobEnd)
    case taskStart: SparkListenerTaskStart =>
      listener.onTaskStart(taskStart)
    case taskGettingResult: SparkListenerTaskGettingResult =>
      listener.onTaskGettingResult(taskGettingResult)
    case taskEnd: SparkListenerTaskEnd =>
      listener.onTaskEnd(taskEnd)
    case environmentUpdate: SparkListenerEnvironmentUpdate =>
      listener.onEnvironmentUpdate(environmentUpdate)
    case blockManagerAdded: SparkListenerBlockManagerAdded =>
      listener.onBlockManagerAdded(blockManagerAdded)
    case blockManagerRemoved: SparkListenerBlockManagerRemoved =>
      listener.onBlockManagerRemoved(blockManagerRemoved)
    case unpersistRDD: SparkListenerUnpersistRDD =>
      listener.onUnpersistRDD(unpersistRDD)
    case applicationStart: SparkListenerApplicationStart =>
      listener.onApplicationStart(applicationStart)
    case applicationEnd: SparkListenerApplicationEnd =>
      listener.onApplicationEnd(applicationEnd)
    case metricsUpdate: SparkListenerExecutorMetricsUpdate =>
      listener.onExecutorMetricsUpdate(metricsUpdate)
    case executorAdded: SparkListenerExecutorAdded =>
      listener.onExecutorAdded(executorAdded)
    case executorRemoved: SparkListenerExecutorRemoved =>
      listener.onExecutorRemoved(executorRemoved)
    case blockUpdated: SparkListenerBlockUpdated =>
      listener.onBlockUpdated(blockUpdated)
    case logStart: SparkListenerLogStart => // ignore event log metadata
    case _ => listener.onOtherEvent(event)
  }
}

我们看了从事件到提交的过程,那么问题如下:

1.这个提交线程如何启动?

def start(): Unit = {
/cas判断一下是否还未启动
  if (started.compareAndSet(false, true)) {
//线程启动
    listenerThread.start()
  } else {
    throw new IllegalStateException(s"$name already started!")
  }
}

2.event是怎么来的?

继续看源码

LiveListenerBus类代码
def post(event: SparkListenerEvent): Unit = {
//看一下是否停了
  if (stopped.get) {
    return
  }
//将事件添加进去
  val eventAdded = eventQueue.offer(event)
  if (eventAdded) {
    eventLock.release()
  } else {
//删除事件
    onDropEvent(event)
//标记一下
    droppedEventsCounter.incrementAndGet()
  }

  val droppedEvents = droppedEventsCounter.get
//得到是否存在删除的事件
  if (droppedEvents > 0) {
    // Don't log too frequently
    if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) {
      // There may be multiple threads trying to decrease droppedEventsCounter.
      // Use "compareAndSet" to make sure only one thread can win.
      // And if another thread is increasing droppedEventsCounter, "compareAndSet" will fail and
      // then that thread will update it.
      if (droppedEventsCounter.compareAndSet(droppedEvents, 0)) {
        val prevLastReportTimestamp = lastReportTimestamp
        lastReportTimestamp = System.currentTimeMillis()
        logWarning(s"Dropped $droppedEvents SparkListenerEvents since " +
          new java.util.Date(prevLastReportTimestamp))
      }
    }
  }
}

3.listener是如何注册上去的?

   listener这个对象通常是在listenerBus的分支上,ListenerBus类中确实存在这个方法.但是这个方法不能被继承(客户不能继承这个类直接添加了):

       final def addListener(listener: L): Unit = { listeners.add(listener) }

   这个类的调用可以在sparkContext中找到:

   果然有,那我就可以实现SparkListenerInterface后调用addSparkListener添加了:

def addSparkListener(listener: SparkListenerInterface) {
  listenerBus.addListener(listener)
}

再看看spark加载自己的咋加载的?

SparkContext类中:
private def setupAndStartListenerBus(): Unit = {
  try {
//加载配置文件信息
    val listenerClassNames: Seq[String] =
      conf.get("spark.extraListeners", "").split(',').map(_.trim).filter(_ != "")
遍历配置信息
    for (className <- listenerClassNames) {
     //构建对象
      val constructors = {
        val listenerClass = Utils.classForName(className)
//这个方式可以学习一下,当有多个类需要构造方法时,如何做
        listenerClass
            .getConstructors
            .asInstanceOf[Array[Constructor[_ <: SparkListenerInterface]]]
      }
//找一下有没有SparkConf类型的参数
      val constructorTakingSparkConf = constructors.find { c =>
        c.getParameterTypes.sameElements(Array(classOf[SparkConf]))
      }
      lazy val zeroArgumentConstructor = constructors.find { c =>
        c.getParameterTypes.isEmpty
      }
      val listener: SparkListenerInterface = {
        if (constructorTakingSparkConf.isDefined) {
//有的话就初始化一下
          constructorTakingSparkConf.get.newInstance(conf)
        } else if (zeroArgumentConstructor.isDefined) {
          zeroArgumentConstructor.get.newInstance()
        } else {
          throw new SparkException(
            s"$className did not have a zero-argument constructor or a" +
              " single-argument constructor that accepts SparkConf. Note: if the class is" +
              " defined inside of another Scala class, then its constructors may accept an" +
              " implicit parameter that references the enclosing class; in this case, you must" +
              " define the listener as a top-level class in order to prevent this extra" +
              " parameter from breaking Spark's ability to find a valid constructor.")
        }
      }
      listenerBus.addListener(listener)
//开始监听
 listenerBus.start()
_listenerBusStarted = true

4.假设一个场景,当我job开始跑的时候,如何发送消息呢?

我们先逆推,首先肯定是由监听,那么我们可以找到JobListener,很自然的找到JobWaiter对象。

找到DAGScheduler开始的发送消息。

本质是啥?

还是一个继承实现

5.这个listener是如何设计的?

  总结就是:一生二,二生三,三生万物。这就是listener的设计思想。太虚的话觉得实际点那就是软件工程了,又是一门学问。后面加上。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值