Spark Cleaner 清理器

功能概述

这里使用的是一个弱引用(WeakReference)队列,主要用于对RDD,shuffle和广播状态异步清理。当这些对象被gc回收以后,会被放入待清理队列referenceQueue中等待清理,实际的清理动作是在单独的守护线程完成

Cleaner的创建

SparkContext在初始化时就会创建并启动一个cleaner

	_cleaner =
      if (_conf.getBoolean("spark.cleaner.referenceTracking", true)) {
        Some(new ContextCleaner(this))
      } else {
        None
      }
    _cleaner.foreach(_.start())

cleaner内部维护一个缓冲区refercenceBuffer,目的是防止CleanupTaskWeakReference未处理就被垃圾回收器回收,这里存的只是RDD的refercence,并非真正的RDD对象。当然,这里还有一个比较重要的角色referenceQueue,主要作用就是当如RDD这样的对象被gc回收后,能通知到它。那系统何时会做gc呢?其实Spark在这个cleaner中启动了一个定时做垃圾回收单线程context-cleaner-periodic-gc

spark.cleaner.periodicGC.interval=30min 表示每30分钟做一次系统gc
spark.cleaner.referenceTracking.blocking=true 表示清理线程是否等待远端操作的完成,即rpc的返回
spark.cleaner.referenceTracking.blocking.shuffle=false 表示shuffule清理线程是否等待远端操作的完成,即rpc的返回

清理逻辑

cleaner清理的逻辑都在keepCleaning()方法中,当RDD被GC回收后,referenceQueue会收到删除对象的reference,该方法不断从队列中remove reference,然后执行真正的清理 doCleaupXXX()

/** Keep cleaning RDD, shuffle, and broadcast state. */
  private def keepCleaning(): Unit = Utils.tryOrStopSparkContext(sc) {
    while (!stopped) {
      try {
        val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
          .map(_.asInstanceOf[CleanupTaskWeakReference])
        // Synchronize here to avoid being interrupted on stop()
        synchronized {
          reference.foreach { ref =>
            logDebug("Got cleaning task " + ref.task)
            referenceBuffer.remove(ref)
            ref.task match {
              case CleanRDD(rddId) =>
                doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
              case CleanShuffle(shuffleId) =>
                doCleanupShuffle(shuffleId, blocking = blockOnShuffleCleanupTasks)
              case CleanBroadcast(broadcastId) =>
                doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
              case CleanAccum(accId) =>
                doCleanupAccum(accId, blocking = blockOnCleanupTasks)
              case CleanCheckpoint(rddId) =>
                doCleanCheckpoint(rddId)
            }
          }
        }
      } catch {
        case ie: InterruptedException if stopped => // ignore
        case e: Exception => logError("Error in cleaning thread", e)
      }
    }
  }

RDD的清理

RDD的清理直接调用SparkContextunpersistRDD(rddId, blocking)方法,并且会通知到已注册的listeners

这里说明一下blocking这个参数,其主要作用是用于BlockManagerMaster向Driver发送消息时是同步发送还是异步发送,在调用unpersistRDD方法时,SparkContext会调用BlockManagerMasterremoveRdd(rddId, blocking)删除RDD数据(内存或者磁盘,如果该参数是true则代表等待rpc的返回结果,否则不用等待,所以这里可能抛出Exception

 /** Perform RDD cleanup. */
  def doCleanupRDD(rddId: Int, blocking: Boolean): Unit = {
    try {
      logDebug("Cleaning RDD " + rddId)
      sc.unpersistRDD(rddId, blocking)
      listeners.asScala.foreach(_.rddCleaned(rddId))
      logInfo("Cleaned RDD " + rddId)
    } catch {
      case e: Exception => logError("Error cleaning RDD " + rddId, e)
    }
  }

Shuffle的清理

Shuffle清理会调用到mapOutputTrackerMaster(For Driver)组建的unregisterShuffle(shuffuleId)方法移除Spark关于该shuffle的内部信息,同时会调用到blockManagerMasterremoveShuffle(shuffleId, blocking)方法删除所有属于该shuffleblock块,此处也是向Driver发送删除消息,因此有blocking参数的存在,同时这里也会通知到相关的listener

 /** Perform shuffle cleanup. */
  def doCleanupShuffle(shuffleId: Int, blocking: Boolean): Unit = {
    try {
      logDebug("Cleaning shuffle " + shuffleId)
      mapOutputTrackerMaster.unregisterShuffle(shuffleId)
      blockManagerMaster.removeShuffle(shuffleId, blocking)
      listeners.asScala.foreach(_.shuffleCleaned(shuffleId))
      logInfo("Cleaned shuffle " + shuffleId)
    } catch {
      case e: Exception => logError("Error cleaning shuffle " + shuffleId, e)
    }
  }

Broadcast的清理

广播变量broadcast的清理则是调用的BroadcastManagerunbroadcast(broadcastId, true, blocking)方法删除所有已经持久化的广播变量状态,第二个参数表示是否从Driver端删除。同样也会通知到所有的listener

/** Perform broadcast cleanup. */
  def doCleanupBroadcast(broadcastId: Long, blocking: Boolean): Unit = {
    try {
      logDebug(s"Cleaning broadcast $broadcastId")
      broadcastManager.unbroadcast(broadcastId, true, blocking)
      listeners.asScala.foreach(_.broadcastCleaned(broadcastId))
      logDebug(s"Cleaned broadcast $broadcastId")
    } catch {
      case e: Exception => logError("Error cleaning broadcast " + broadcastId, e)
    }
  }

Accum的清理

累加器的清理主要是调用AccumulatorContextremove(accId)方法将累加器从其上下文删除,所以这里用不到blocking参数

/** Perform accumulator cleanup. */
  def doCleanupAccum(accId: Long, blocking: Boolean): Unit = {
    try {
      logDebug("Cleaning accumulator " + accId)
      AccumulatorContext.remove(accId)
      listeners.asScala.foreach(_.accumCleaned(accId))
      logInfo("Cleaned accumulator " + accId)
    } catch {
      case e: Exception => logError("Error cleaning accumulator " + accId, e)
    }
  }

Checkpoint的清理

清理写入磁盘的Checkpoint文件,这里是直接删除的文件,所以也用不到blocking参数

/**
   * Clean up checkpoint files written to a reliable storage.
   * Locally checkpointed files are cleaned up separately through RDD cleanups.
   */
  def doCleanCheckpoint(rddId: Int): Unit = {
    try {
      logDebug("Cleaning rdd checkpoint data " + rddId)
      ReliableRDDCheckpointData.cleanCheckpoint(sc, rddId)
      listeners.asScala.foreach(_.checkpointCleaned(rddId))
      logInfo("Cleaned rdd checkpoint data " + rddId)
    }
    catch {
      case e: Exception => logError("Error cleaning rdd checkpoint data " + rddId, e)
    }
  }

参考

ContextCleaner Github源码

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值