MapOutputTrackerMaster

最新推荐文章于 2020-10-22 14:45:30 发布

铁扇纶巾

最新推荐文章于 2020-10-22 14:45:30 发布

阅读量401

点赞数

分类专栏： spark2.7.2源码分析

本文链接：https://blog.csdn.net/qq_36558473/article/details/102859082

版权

MapOutputTrackerMaster diver的map任务输出跟踪器

MapOutputTrackerMaster 继承了MapOutputTracker
MapOutputTrackerMaster 是在diver端的MapOutputTracker，在diver端MapOutputTracker要负责维护跟踪各个map任务的输出状态，所以会为存储映射状态和序列化状态各创建一个hashmap，并且还要处理shuffle的请求，在shuffle完成还会再缓存中清除关于shuffle的序列化版本以及信息，为了提高效率使用了多线程的方法。而在excutor端的MapOutputTracker只用作缓存，对输出状态进行查询。

源码清单和我的理解注释

将shuffle 输出状态的序列化后缓存下来，并在后续操作中更新
cacheEpoch 记录缓存中shuffle 输出状态的编号，epoch记录当前shuffle 输出状态的编号，每处理完一个shuffle ，epoch+1

 /** Cache a serialized version of the output statuses for each shuffle to send them out faster */
  private var cacheEpoch = epoch
 
  // Kept in sync with cachedSerializedStatuses explicitly
  // This is required so that the Broadcast variable remains in scope until we remove
  // the shuffleId explicitly or implicitly.
  //缓存中序列化的广播变量
  private val cachedSerializedBroadcast = new HashMap[Int, Broadcast[Array[Byte]]]()

  def incrementEpoch() {
   
    epochLock.synchronized {
   
      epoch += 1
      logDebug("Increasing epoch to " + epoch)
    }
  }

    // Check to see if we have a cached version, returns true if it does
    // and has side effect of setting retBytes.  If not returns false
    // with side effect of setting statuses
    //若epoch > cacheEpoch为真，清空缓存中的序列化状态以及清空与上一个shuffle的广播变量
    def checkCachedStatuses(): Boolean = {
   
      epochLock.synchronized {
   
        if (epoch > cacheEpoch) {
   
          cachedSerializedStatuses.clear()
          clearCachedBroadcast()
          cacheEpoch = epoch
        }
        cachedSerializedStatuses.get(shuffleId) match {
   
          case Some(bytes) =>
            retBytes = bytes
            true
          case None =>
            logDebug("cached status not found for : " + shuf

最低0.47元/天解锁文章

铁扇纶巾

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MapOutputTrackerMaster

MapOutputTrackerMaster diver的map任务输出跟踪器源码清单和我的理解注释将shuffle 输出状态的序列化后缓存下来，并在后续操作中更新cacheEpoch 记录缓存中shuffle 输出状态的编号，epoch记录当前shuffle 输出状态的编号，每处理完一个shuffle ，epoch+1 /** Cache a serialized version of ...
复制链接

扫一扫