MapOutputTrackerMaster diver的map任务输出跟踪器
MapOutputTrackerMaster 继承了MapOutputTracker
MapOutputTrackerMaster 是在diver端的MapOutputTracker,在diver端MapOutputTracker要负责维护跟踪各个map任务的输出状态,所以会为存储映射状态和序列化状态各创建一个hashmap,并且还要处理shuffle的请求,在shuffle完成还会再缓存中清除关于shuffle的序列化版本以及信息,为了提高效率使用了多线程的方法。而在excutor端的MapOutputTracker只用作缓存,对输出状态进行查询。
源码清单和我的理解注释
将shuffle 输出状态的序列化后缓存下来,并在后续操作中更新
cacheEpoch 记录缓存中shuffle 输出状态的编号,epoch记录当前shuffle 输出状态的编号,每处理完一个shuffle ,epoch+1
/** Cache a serialized version of the output statuses for each shuffle to send them out faster */
private var cacheEpoch = epoch
// Kept in sync with cachedSerializedStatuses explicitly
// This is required so that the Broadcast variable remains in scope until we remove
// the shuffleId explicitly or implicitly.
//缓存中序列化的广播变量
private val cachedSerializedBroadcast = new HashMap[Int, Broadcast[Array[Byte]]]()
def incrementEpoch() {
epochLock.synchronized {
epoch += 1
logDebug("Increasing epoch to " + epoch)
}
}
// Check to see if we have a cached version, returns true if it does
// and has side effect of setting retBytes. If not returns false
// with side effect of setting statuses
//若epoch > cacheEpoch为真,清空缓存中的序列化状态以及清空与上一个shuffle的广播变量
def checkCachedStatuses(): Boolean = {
epochLock.synchronized {
if (epoch > cacheEpoch) {
cachedSerializedStatuses.clear()
clearCachedBroadcast()
cacheEpoch = epoch
}
cachedSerializedStatuses.get(shuffleId) match {
case Some(bytes) =>
retBytes = bytes
true
case None =>
logDebug("cached status not found for : " + shuf