菜鸟的Spark 源码学习之路 -6 Memory管理源码 -part3 MemoryPool

上文中讲到MemoryManager的源码,它跟踪计算内存和存储内存的使用情况,提供内存管理的入口,这次我们就从计算内存和存储内存两个方向深入学习spark的内存管理。

1. ExecutionMemoryPool

/**
  * Implements policies and bookkeeping for sharing an adjustable-sized pool of memory between tasks.
  *
  * Tries to ensure that each task gets a reasonable share of memory, instead of some task ramping up
  * to a large amount first and then causing others to spill to disk repeatedly.
  *
  * If there are N tasks, it ensures that each task can acquire at least 1 / 2N of the memory
  * before it has to spill, and at most 1 / N. Because N varies dynamically, we keep track of the
  * set of active tasks and redo the calculations of 1 / 2N and 1 / N in waiting tasks whenever this
  * set changes. This is all done by synchronizing access to mutable state and using wait() and
  * notifyAll() to signal changes to callers. Prior to Spark 1.6, this arbitration of memory across
  * tasks was performed by the ShuffleMemoryManager.
  *
  * @param lock       a [[MemoryManager]] instance to synchronize on
  * @param memoryMode the type of memory tracked by this pool (on- or off-heap)
  */
private[memory] class ExecutionMemoryPool(
                                           lock: Object,
                                           memoryMode: MemoryMode
                                         ) extends MemoryPool(lock) with Logging 

ExecutionMemoryPool 是MemoryPool的子类,主要用于管理task 和它占用内存空间,这里有一个重要的数据结构:memoryForTask ,它维护了每个task与使用空间的关系,是一个hashMap。

/**
  * Map from taskAttemptId -> memory consumption in bytes
  */
@GuardedBy("lock")
private val memoryForTask = new mutable.HashMap[Long, Long]()

这里有几个重要的方法:

1.1 空间获取

首先看一下执行内存的MemoryPool如何获取空间的:

/**
  * Try to acquire up to `numBytes` of memory for the given task and return the number of bytes
  * obtained, or 0 if none can be allocated.
  *
  * This call may block until there is enough free memory in some situations, to make sure each
  * task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of
  * active tasks) before it is forced to spill. This can happen if the number of tasks increase
  * but an older task had a lot of memory already.
  * 某些情况下,这个方法可能会阻塞直至有足够剩余空间为止。必须保证每个task在内存溢出前至少能够拿到1/2N的内存大小
  *
  * @param numBytes           number of bytes to acquire
  * @param taskAttemptId      the task attempt acquiring memory
  * @param maybeGrowPool      a callback that potentially grows the size of this pool. It takes in
  *                           one parameter (Long) that represents the desired amount of memory by
  *                           which this pool should be expanded.
  * @param computeMaxPoolSize a callback that returns the maximum allowable size of this pool
  *                           at this given moment. This is not a field because the max pool
  *                           size is variable in certain cases. For instance, in unified
  *                           memory management, the execution pool can be expanded by evicting
  *                           cached blocks, thereby shrinking the storage pool.
  * @return the number of bytes granted to the task.
  */
private[memory] def acquireMemory(
                                   numBytes: Long,
                                   taskAttemptId: Long,
                                   maybeGrowPool: Long => Unit = (additionalSpaceNeeded: Long) => Unit,
                                   computeMaxPoolSize: () => Long = () => poolSize): Long = lock.synchronized {
  assert(numBytes > 0, s"invalid number of bytes requested: $numBytes")

  // TODO: clean up this clunky method signature

  // Add this task to the taskMemory map just so we can keep an accurate count of the number
  // of active tasks, to let other tasks ramp down their memory in calls to `acquireMemory`
  // 记录task-内存使用,用于精确的内存记录跟踪
  if (!memoryForTask.contains(taskAttemptId)) {
    memoryForTask(taskAttemptId) = 0L
    // This will later cause waiting tasks to wake up and check numTasks again
    lock.notifyAll()
  }

  // Keep looping until we're either sure that we don't want to grant this request (because this
  // task would have more than 1 / numActiveTasks of the memory) or we have enough free
  // memory to give it (we always let each task get at least 1 / (2 * numActiveTasks)).
  // TODO: simplify this to limit each task to its own slot
  while (true) {
    val numActiveTasks = memoryForTask.keys.size
    // 任务已占用内存大小
    val curMem = memoryForTask(taskAttemptId)

    // In every iteration of this loop, we should first try to reclaim any borrowed execution
    // space from storage. This is necessary because of the potential race condition where new
    // storage blocks may steal the free execution memory that this task was waiting for.
    // 这里调用的是MemoryManager传入的方法,每次循环都会尝试回收从计算空间内“挪用”的内存空间。 竞争条件下,新的存储块可能会   占用当前任务在等待的空间
    maybeGrowPool(numBytes - memoryFree)

    // Maximum size the pool would have after potentially growing the pool.
    // This is used to compute the upper bound of how much memory each task can occupy. This
    // must take into account potential free memory as well as the amount this pool currently
    // occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management,
    // we did not take into account space that could have been freed by evicting cached blocks.
    val maxPoolSize = computeMaxPoolSize()
    val maxMemoryPerTask = maxPoolSize / numActiveTasks
    val minMemoryPerTask = poolSize / (2 * numActiveTasks)

    // How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks
    // 计算能够分配的最大空间大小,保证在0 <= X <= 1 / numActiveTasks
    val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem))
    // Only give it as much memory as is free, which might be none if it reached 1 / numTasks
    // 实际分配的空间大小
    val toGrant = math.min(maxToGrant, memoryFree)

    // We want to let each task get at least 1 / (2 * numActiveTasks) before blocking;
    // if we can't give it this much now, wait for other tasks to free up memory
    // (this happens if older tasks allocated lots of memory before N grew)
    // 当前任务持有空间大小小于单个任务空间下限(minMemoryPerTask)则阻塞,直至其获取足够空间,并达到空间获取下限
    if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
      logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName pool to be free")
      lock.wait()
    } else {
      // 更新task的内存值,并返回分配的空间大小
      memoryForTask(taskAttemptId) += toGrant
      return toGrant
    }
  }
  0L // Never reached
}

1.2 释放空间

/**
  * Release `numBytes` of memory acquired by the given task.
  */
def releaseMemory(numBytes: Long, taskAttemptId: Long): Unit = lock.synchronized {
  val curMem = memoryForTask.getOrElse(taskAttemptId, 0L)
  var memoryToFree = if (curMem < numBytes) {
    logWarning(
      s"Internal error: release called on $numBytes bytes but task only has $curMem bytes " +
        s"of memory from the $poolName pool")
    curMem
  } else {
    numBytes
  }
  // 更新task的内存使用情况,如果task的内存空间为0,则删除task的记录
  if (memoryForTask.contains(taskAttemptId)) {
    memoryForTask(taskAttemptId) -= memoryToFree
    if (memoryForTask(taskAttemptId) <= 0) {
      memoryForTask.remove(taskAttemptId)
    }
  }
  // 通知等待空间释放的task继续完成空间申请
  lock.notifyAll() // Notify waiters in acquireMemory() that memory has been freed
}


/**
  * Release all memory for the given task and mark it as inactive (e.g. when a task ends).
  *
  * @return the number of bytes freed.
  */
def releaseAllMemoryForTask(taskAttemptId: Long): Long = lock.synchronized {
  val numBytesToFree = getMemoryUsageForTask(taskAttemptId)
  releaseMemory(numBytesToFree, taskAttemptId)
  numBytesToFree
}

以上为计算内存管理的主要功能。

2.  StorageMemoryPool

存储内存的管理与计算内存管理基本类似,它有一个重要的数据结构:_memoryStore

@GuardedBy("lock")
private[this] var _memoryUsed: Long = 0L

override def memoryUsed: Long = lock.synchronized {
  _memoryUsed
}

private var _memoryStore: MemoryStore = _
def memoryStore: MemoryStore = {
  if (_memoryStore == null) {
    throw new IllegalStateException("memory store not initialized yet")
  }
  _memoryStore
}

/**
 * Set the [[MemoryStore]] used by this manager to evict cached blocks.
 * This must be set after construction due to initialization ordering constraints.
 */
final def setMemoryStore(store: MemoryStore): Unit = {
  _memoryStore = store
}
/**
 * Stores blocks in memory, either as Arrays of deserialized Java objects or as
 * serialized ByteBuffers.
 */
private[spark] class MemoryStore(
    conf: SparkConf,
    blockInfoManager: BlockInfoManager,
    serializerManager: SerializerManager,
    memoryManager: MemoryManager,
    blockEvictionHandler: BlockEvictionHandler)

MemoryStore用于存储block的内存信息。

它有几个数据结构:

// Note: all changes to memory allocations, notably putting blocks, evicting blocks, and
// acquiring or releasing unroll memory, must be synchronized on `memoryManager`!
// 记录block 和 内存
private val entries = new LinkedHashMap[BlockId, MemoryEntry[_]](32, 0.75f, true)

// A mapping from taskAttemptId to amount of memory used for unrolling a block (in bytes)
// All accesses of this map are assumed to have manually synchronized on `memoryManager`
private val onHeapUnrollMemoryMap = mutable.HashMap[Long, Long]()
// Note: off-heap unroll memory is only used in putIteratorAsBytes() because off-heap caching
// always stores serialized values.
private val offHeapUnrollMemoryMap = mutable.HashMap[Long, Long]()

// Initial memory to request before unrolling any block
private val unrollMemoryThreshold: Long =
  conf.getLong("spark.storage.unrollMemoryThreshold", 1024 * 1024)

2.1 内存获取

/**
 * Acquire N bytes of memory to cache the given block, evicting existing ones if necessary.
 *
 * @return whether all N bytes were successfully granted.
 */
def acquireMemory(blockId: BlockId, numBytes: Long): Boolean = lock.synchronized {
  val numBytesToFree = math.max(0, numBytes - memoryFree)
  acquireMemory(blockId, numBytes, numBytesToFree)
}

/**
 * Acquire N bytes of storage memory for the given block, evicting existing ones if necessary.
 *
 * @param blockId the ID of the block we are acquiring storage memory for
 * @param numBytesToAcquire the size of this block
 * @param numBytesToFree the amount of space to be freed through evicting blocks
 * @return whether all N bytes were successfully granted.
 */
def acquireMemory(
    blockId: BlockId,
    numBytesToAcquire: Long,
    numBytesToFree: Long): Boolean = lock.synchronized {
  assert(numBytesToAcquire >= 0)
  assert(numBytesToFree >= 0)
  assert(memoryUsed <= poolSize)
  if (numBytesToFree > 0) {
    // 调用memoryStore 的方法移除一些block,释放空间
    memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, memoryMode)
  }
  // NOTE: If the memory store evicts blocks, then those evictions will synchronously call
  // back into this StorageMemoryPool in order to free memory. Therefore, these variables
  // should have been updated.
  val enoughMemory = numBytesToAcquire <= memoryFree
  if (enoughMemory) {
    _memoryUsed += numBytesToAcquire
  }
  // 返回空间是否足够
  enoughMemory
}

看一下 memoryStore.evictBlocksToFreeSpace方法:

内部嵌套了两个方法。一个用于判断block是否可以移除, 一个用于删除block

/**
 * Try to evict blocks to free up a given amount of space to store a particular block.
 * Can fail if either the block is bigger than our memory or it would require replacing
 * another block from the same RDD (which leads to a wasteful cyclic replacement pattern for
 * RDDs that don't fit into memory that we want to avoid).
 *
 * @param blockId the ID of the block we are freeing space for, if any
 * @param space the size of this block
 * @param memoryMode the type of memory to free (on- or off-heap)
 * @return the amount of memory (in bytes) freed by eviction
 */
private[spark] def evictBlocksToFreeSpace(
    blockId: Option[BlockId],
    space: Long,
    memoryMode: MemoryMode): Long = {
  assert(space > 0)
  memoryManager.synchronized {
    var freedMemory = 0L
    val rddToAdd = blockId.flatMap(getRddId)
    val selectedBlocks = new ArrayBuffer[BlockId]
    // 内嵌方法,判断block是否可以移除
    def blockIsEvictable(blockId: BlockId, entry: MemoryEntry[_]): Boolean = {
    // memoryMode 相同时, block所属的rdd为空或者rdd不是当前block所属rdd
      entry.memoryMode == memoryMode && (rddToAdd.isEmpty || rddToAdd != getRddId(blockId))
    }
    // This is synchronized to ensure that the set of entries is not changed
    // (because of getValue or getBytes) while traversing the iterator, as that
    // can lead to exceptions.
    entries.synchronized {
      // 遍历entries
      val iterator = entries.entrySet().iterator()
      while (freedMemory < space && iterator.hasNext) {
        val pair = iterator.next()
        val blockId = pair.getKey
        val entry = pair.getValue
        if (blockIsEvictable(blockId, entry)) {
          // We don't want to evict blocks which are currently being read, so we need to obtain
          // an exclusive write lock on blocks which are candidates for eviction. We perform a
          // non-blocking "tryLock" here in order to ignore blocks which are locked for reading:
          // 移除时,不考虑正在被读取的block,为没有读取的block加写锁
          if (blockInfoManager.lockForWriting(blockId, blocking = false).isDefined) {
          // 记录候选可移除的blocks
            selectedBlocks += blockId
            freedMemory += pair.getValue.size
          }
        }
      }
    }

    def dropBlock[T](blockId: BlockId, entry: MemoryEntry[T]): Unit = {
      val data = entry match {
        case DeserializedMemoryEntry(values, _, _) => Left(values)
        case SerializedMemoryEntry(buffer, _, _) => Right(buffer)
      }
      // 调用BlockManager的方法删除block
      val newEffectiveStorageLevel =
        blockEvictionHandler.dropFromMemory(blockId, () => data)(entry.classTag)
      if (newEffectiveStorageLevel.isValid) {
        // The block is still present in at least one store, so release the lock
        // but don't delete the block info
        blockInfoManager.unlock(blockId)
      } else {
        // The block isn't present in any store, so delete the block info so that the
        // block can be stored again
        blockInfoManager.removeBlock(blockId)
      }
    }
    // 可移除空间大于Block的size
    if (freedMemory >= space) {
      var lastSuccessfulBlock = -1
      try {
        logInfo(s"${selectedBlocks.size} blocks selected for dropping " +
          s"(${Utils.bytesToString(freedMemory)} bytes)")
        (0 until selectedBlocks.size).foreach { idx =>
          val blockId = selectedBlocks(idx)
          val entry = entries.synchronized {
            entries.get(blockId)
          }
          // This should never be null as only one task should be dropping
          // blocks and removing entries. However the check is still here for
          // future safety.
          if (entry != null) {
            // 删除block
            dropBlock(blockId, entry)
            afterDropAction(blockId)
          }
          lastSuccessfulBlock = idx
        }
        logInfo(s"After dropping ${selectedBlocks.size} blocks, " +
          s"free memory is ${Utils.bytesToString(maxMemory - blocksMemoryUsed)}")
        freedMemory
      } finally {
        // like BlockManager.doPut, we use a finally rather than a catch to avoid having to deal
        // with InterruptedException
        if (lastSuccessfulBlock != selectedBlocks.size - 1) {
          // the blocks we didn't process successfully are still locked, so we have to unlock them
          //没有完成所有候选block释放,空间已经达到需求,需要对未处理的block释放锁
          (lastSuccessfulBlock + 1 until selectedBlocks.size).foreach { idx =>
            val blockId = selectedBlocks(idx)
            blockInfoManager.unlock(blockId)
          }
        }
      }
    } else {
      blockId.foreach { id =>
        logInfo(s"Will not store $id")
      }
      // 释放候选block锁
      selectedBlocks.foreach { id =>
        blockInfoManager.unlock(id)
      }
      0L
    }
  }
}

上述过程中的dropBlock调用的是BlockManager的相应方法:

/**
 * Drop a block from memory, possibly putting it on disk if applicable. Called when the memory
 * store reaches its limit and needs to free up space.
 * 采取从内存移除,持久化到磁盘的策略
 * If `data` is not put on disk, it won't be created.
 *
 * The caller of this method must hold a write lock on the block before calling this method.
 * This method does not release the write lock.
 *
 * @return the block's new effective StorageLevel. 返回更新的后的存储级别
 */
private[storage] override def dropFromMemory[T: ClassTag](
    blockId: BlockId,
    data: () => Either[Array[T], ChunkedByteBuffer]): StorageLevel = {
  logInfo(s"Dropping block $blockId from memory")
  val info = blockInfoManager.assertBlockIsLockedForWriting(blockId)
  var blockIsUpdated = false
  val level = info.level

  // Drop to disk, if storage level requires
  // 存储级别允许的情况下,写到磁盘
  if (level.useDisk && !diskStore.contains(blockId)) {
    logInfo(s"Writing block $blockId to disk")
    data() match {
      case Left(elements) =>
        diskStore.put(blockId) { channel =>
          val out = Channels.newOutputStream(channel)
          serializerManager.dataSerializeStream(
            blockId,
            out,
            elements.toIterator)(info.classTag.asInstanceOf[ClassTag[T]])
        }
      case Right(bytes) =>
        diskStore.putBytes(blockId, bytes)
    }
    blockIsUpdated = true
  }

  // Actually drop from memory store
  // 从内存移除
  val droppedMemorySize =
    if (memoryStore.contains(blockId)) memoryStore.getSize(blockId) else 0L
  val blockIsRemoved = memoryStore.remove(blockId)
  if (blockIsRemoved) {
    blockIsUpdated = true
  } else {
    logWarning(s"Block $blockId could not be dropped from memory as it does not exist")
  }

  val status = getCurrentBlockStatus(blockId, info)
  // 返回master block的相关信息
  if (info.tellMaster) {
    reportBlockStatus(blockId, status, droppedMemorySize)
  }
  if (blockIsUpdated) {
    addUpdatedBlockStatusToTaskMetrics(blockId, status)
  }
  status.storageLevel
}

其中,从内存移除调用的是memoryStore的remove方法,去执行物理内存空间的释放

def remove(blockId: BlockId): Boolean = memoryManager.synchronized {
  val entry = entries.synchronized {
    entries.remove(blockId)
  }
  if (entry != null) {
    entry match {
      // 这里是执行物理删除的入口
      case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
      case _ =>
    }
    memoryManager.releaseStorageMemory(entry.size, entry.memoryMode)
    logDebug(s"Block $blockId of size ${entry.size} dropped " +
      s"from memory (free ${maxMemory - blocksMemoryUsed})")
    true
  } else {
    false
  }
}

StorageMemoryPool其他的方法都较为简单,仅仅是对相应的数据作简单修改:

def releaseMemory(size: Long): Unit = lock.synchronized {
  if (size > _memoryUsed) {
    logWarning(s"Attempted to release $size bytes of storage " +
      s"memory when we only have ${_memoryUsed} bytes")
    _memoryUsed = 0
  } else {
    _memoryUsed -= size
  }
}

def releaseAllMemory(): Unit = lock.synchronized {
  _memoryUsed = 0
}

接下来我们简单梳理一下:

对于计算内存:首先会尝试回收从存储挪用的内存空间,在不够的情况下,等待存储内存释放空间,该过程可能阻塞。

对于存储内存:首先尝试获取内存,不足情况下,会记录所以没有加读锁的block,对其进行空间释放。根据存储优先级,将允许onDisk的block先持久化到磁盘,然后调用memoryStore的方法进行物理删除。

以上就是Spark内存管理的基本流程。MemoryManager 是BlockManager的一个组件。下一次我们就进行BlockManager的实现学习。

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值