spark 2.2.0源码解读(七) spark内存分配

最新推荐文章于 2022-06-16 12:02:16 发布
怎么全部重名了
最新推荐文章于 2022-06-16 12:02:16 发布
阅读量213
点赞数
分类专栏： spark大数据文章标签：大数据 spark内存
原文链接：https://blog.csdn.net/qq_21383435/article/details/78641598
版权
spark大数据专栏收录该内容
10 篇文章 0 订阅
订阅专栏
spark的内存分配模型如下图所示：
在这里插入图片描述可以看到other占用40%英语用户定义的数据结构和spark元数据，这40%比例是不可变的，同一内存有60%，其中storage和executor各占50%，所以他们各自占用总体的30%，executor就是执行执行程序中间发生shuffle过程产生的中间数据，storage用于缓存数据。executor和storage在老版本不能互相借用，在新版本中可以互相借用，exector和strong如果不够用时候可以互相占用对方的资源，但是storage占用对方内存时候可以被强制收回，而executor占用storage内存时候不能被强制收回，只能等待释放。此外还有300m的预留空间。
下面从源码层面看一下内存分配是如何设计的
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.memory

import org.apache.spark.SparkConf
import org.apache.spark.storage.BlockId

/**
 * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
 * either side can borrow memory from the other.
  *
  * 一个[[MemoryManager]]，它强制执行和存储之间的软边界，这样任何一方都可以从另一方借用内存。
 *
 * The region shared between execution and storage is a fraction of (the total heap space - 300MB)
 * configurable through `spark.memory.fraction` (default 0.6). The position of the boundary
 * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
 * This means the size of the storage region is 0.6 * 0.5 = 0.3 of the heap space by default.
  *
  * 执行和存储之间共享的区域是通过“spark.memory.fraction”(默认0.6)配置的(总堆空间- 300MB)的一小部分。
  * 在这个空间内的边界位置由“spark.memory.storageFraction”进一步确定(默认0.5)。这意味着默认情况下，
  * 存储区域的大小为0.6 * 0.5 = 0.3。
 *
 * Storage can borrow as much execution memory as is free until execution reclaims its space.
 * When this happens, cached blocks will be evicted from memory until sufficient borrowed
 * memory is released to satisfy the execution memory request.
  *
  * Storage can borrow as much execution memory as is free，直到执行重新声明它的空间。当这种情况发生时，
  * 缓存的块将被从内存中删除，直到释放出足够的内存，以满足执行内存请求。
 *
 * Similarly, execution can borrow as much storage memory as is free. However, execution
 * memory is *never* evicted by storage due to the complexities involved in implementing this.
 * The implication is that attempts to cache blocks may fail if execution has already eaten
 * up most of the storage space, in which case the new blocks will be evicted immediately
 * according to their respective storage levels.
  *
  * 类似地，execution可以以自由的方式借用大量存储内存。然而，由于execution此操作的复杂性，执行内存永远不会被存储。
  * 其含义是，如果执行已经耗尽了大部分存储空间，那么尝试缓存块可能会失败，在这种情况下，新的块将根据它们各自的存储
  * 级别立即被驱逐。
 *
 * @param onHeapStorageRegionSize Size of the storage region, in bytes.
 *                          This region is not statically reserved; execution can borrow from
 *                          it if necessary. Cached blocks can be evicted only if actual
 *                          storage memory usage exceeds this region.
  *
  *                          存储区域的大小，以字节为单位。
  *                          这个区域不是静态保留的;如果有必要，execution可以向它借。
  *                          只有在实际的存储内存使用超过该区域时，缓存的块才可以被驱逐。
  *
  *  该memoryManager主要是使得execution部分和storage部分的内存不像之前由比例参数限定住，而是两者可以互相借用内存。
  *  execution和storage总的内存上限由参数｀spark.memory.fraction（默认0.75）来设定的，这个比例是相对于整个JVM heap来说的。
  *  Storage部分可以申请Execution部分的所有空闲内存，直到Execution内存不足时向Storage发出信号为止。当Execution需要
  *  更多内存时，Storage部分会向磁盘spill数据，直到把借用的内存都还上为止。同样的Execution部分也能向Storage部分借用内存，
  *  当Storage需要内存时，Execution中的数据不会马上spill到磁盘，因为Execution使用的内存发生在计算过程中，如果数据丢失就
  *  会到账task计算失败。Storage部分只能等待Execution部分主动释放占用的内存。
  *
  *
  *  UnifiedMemoryManager代表的是统一的内存管理器，统一么，是不是有共享和变动的意思。
  *
  *  参考博客：https://blog.csdn.net/qq_21383435/article/details/79108106
 */
private[spark] class UnifiedMemoryManager private[memory] (
    conf: SparkConf,
    val maxHeapMemory: Long,
    onHeapStorageRegionSize: Long,
    numCores: Int)
  extends MemoryManager(
    conf,
    numCores,
    onHeapStorageRegionSize,
    maxHeapMemory - onHeapStorageRegionSize) {

  /**
    * 这个函数传入的memoryMode可选择是使用堆内存还是直接使用本地内存,默认是使用堆内存.
    *
    * // 确保onHeapExecutionMemoryPool和storageMemoryPool大小之和等于二者共享内存区域maxMemory大小
    * */
  private def assertInvariants(): Unit = {
    assert(onHeapExecutionMemoryPool.poolSize + onHeapStorageMemoryPool.poolSize == maxHeapMemory)
    assert(
      offHeapExecutionMemoryPool.poolSize + offHeapStorageMemoryPool.poolSize == maxOffHeapMemory)
  }

  assertInvariants()

  /**
    * 以前的版本是：maxStorageMemory
    * maxOnHeapStorageMemory为execution和storage区域共享的最大内存减去Execution已用内存
     */
  override def maxOnHeapStorageMemory: Long = synchronized {
    maxHeapMemory - onHeapExecutionMemoryPool.memoryUsed
  }

  override def maxOffHeapStorageMemory: Long = synchronized {
    maxOffHeapMemory - offHeapExecutionMemoryPool.memoryUsed
  }

  /**
   * Try to acquire up to `numBytes` of execution memory for the current task and return the
   * number of bytes obtained, or 0 if none can be allocated.
    *
    * 尝试获取当前任务的执行内存的“numBytes”，并返回所获得的字节数，如果没有可以分配的话，则返回0。
   *
   * This call may block until there is enough free memory in some situations, to make sure each
   * task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of
   * active tasks) before it is forced to spill. This can happen if the number of tasks increase
   * but an older task had a lot of memory already.
    *
    * 这个调用可能会阻塞，直到在某些情况下有足够的空闲内存，以确保每个任务都有机会到达总内存池的至少1 / 2N
    * (其中N是活动任务的#)，然后才会被迫溢写。如果任务的数量增加，但较老的任务有很多内存，这可能会发生。
    *
    * 为当前的taskAttemptId申请最多numBytes的内存，如果内存不足则返回0。
    * 由于这里涉及到的都是Executor JVM Heap中的内存，所以如果是OFF_HEAP模式，直接从offHeapExecution内存池分配。
    * 对memoryMode为ON_HEAP的进行如下处理。
   */
  override private[memory] def acquireExecutionMemory(
      numBytes: Long,
      taskAttemptId: Long,
      memoryMode: MemoryMode): Long = synchronized {
    // 这个函数传入的memoryMode可选择是使用堆内存还是直接使用本地内存,默认是使用堆内存.
    // 确保onHeapExecutionMemoryPool和storageMemoryPool大小之和等于二者共享内存区域maxMemory大小
    assertInvariants()
    assert(numBytes >= 0)
    val (executionPool, storagePool, storageRegionSize, maxMemory) = memoryMode match {
      // 这里定义的这个函数,用于判断numBytes(需要申请的内存大小)减去当前内存池中可用的内存大小是否够用,
      // 如果不够用,这个函数的传入值是一个正数
      case MemoryMode.ON_HEAP => (
        onHeapExecutionMemoryPool,
        onHeapStorageMemoryPool,
        onHeapStorageRegionSize,
        maxHeapMemory)
      case MemoryMode.OFF_HEAP => (
        offHeapExecutionMemoryPool,
        offHeapStorageMemoryPool,
        offHeapStorageMemory,
        maxOffHeapMemory)
    }

    /**
     * Grow the execution pool by evicting cached blocks, thereby shrinking the storage pool.
      * 通过驱逐缓存块来增加执行池，从而减少存储池。
     *
     * When acquiring memory for a task, the execution pool may need to make multiple
     * attempts. Each attempt must be able to evict storage in case another task jumps in
     * and caches a large block between the attempts. This is called once per attempt.
      *
      * 在为任务获取内存时，执行池可能需要多次尝试。每次尝试都必须能够驱逐存储，以防另一个任务跳跃，
      * 并在尝试之间缓存一个很大的块。这是每次尝试调用一次。
      *
      * 会释放storage中保存的数据，减小storage部分内存大小，从而增大Execution部分
      *
      * 首先，该方法何时被调用？
      *     当execution pool的空闲内存不够用时，则该方法会被调用。从该方法的名字就可以得知，该方法试图增长execution pool的大小。
      *
      * 那么，如何增长execution pool呢？
      *     通过阅读方法中的注释，我们可以看到有两种方式：
      *         1. storage pool中有空闲内存，则借用storage pool中的空闲内存
      *         2. storage pool的大小超过了storageRegionSize，则驱逐存储在storage pool中的blocks，
      *             来回收storage pool从execution pool中借走的内存。
     */
    def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {
      if (extraMemoryNeeded > 0) {
        // There is not enough free memory in the execution pool, so try to reclaim memory from
        // storage. We can reclaim any free memory from the storage pool. If the storage pool
        // has grown to become larger than `storageRegionSize`, we can evict blocks and reclaim
        // the memory that storage has borrowed from execution.
        //
        // 在执行池中没有足够的空闲内存，所以尝试从存储中回收内存。我们可以从存储池中回收任何空闲内存。
        // 如果存储池已经变得比“storageRegionSize”更大，那么我们就可以驱逐块并回收存储从执行中借用的内存。
        val memoryReclaimableFromStorage = math.max(
          storagePool.memoryFree,
          storagePool.poolSize - storageRegionSize)
        if (memoryReclaimableFromStorage > 0) {
          // Only reclaim as much space as is necessary and available: 只回收必要和可用的空间:
          val spaceToReclaim = storagePool.freeSpaceToShrinkPool(
            math.min(extraMemoryNeeded, memoryReclaimableFromStorage))
          storagePool.decrementPoolSize(spaceToReclaim)
          executionPool.incrementPoolSize(spaceToReclaim)
        }
      }
    }

    /**
     * The size the execution pool would have after evicting storage memory.
      * 执行池在清除存储内存后的大小。
     *
     * The execution memory pool divides this quantity among the active tasks evenly to cap
     * the execution memory allocation for each task. It is important to keep this greater
     * than the execution pool size, which doesn't take into account potential memory that
     * could be freed by evicting storage. Otherwise we may hit SPARK-12155.
      *
      * 执行内存池将此数量分配到活动任务中，平均分配每个任务的执行内存分配。保持这个大于执行池大小是很重要的，
      * 因为它没有考虑到可能通过驱逐存储而释放的潜在内存。否则，我们可能打到spark - 12155。
     *
     * Additionally, this quantity should be kept below `maxMemory` to arbitrate fairness
     * in execution memory allocation across tasks, Otherwise, a task may occupy more than
     * its fair share of execution memory, mistakenly thinking that other tasks can acquire
     * the portion of storage memory that cannot be evicted.
      *
      * 此外，这个数量应该保持在“maxMemory”之下，在执行内存分配中对任务进行仲裁，否则，一个任务可能会占用更多
      * 的执行内存，错误地认为其他任务可以获得无法被驱逐的存储内存部分。
      *
      * 计算在 storage 释放内存借给 execution 后，execution 部分的内存大小
     */
    def computeMaxExecutionPoolSize(): Long = {
      maxMemory - math.min(storagePool.memoryUsed, storageRegionSize)
    }

    executionPool.acquireMemory(
      numBytes, taskAttemptId, maybeGrowExecutionPool, computeMaxExecutionPoolSize)
  }







  /**
    * 首先申请的storage内存numBytes不能超过storage部分内存的最大值maxStorageMemory。
    * 然后当storage部分内存不足以满足此次申请时，尝试向execution内存池借用内存，借到的内存大小为min(execution内存池剩余
    * 内存，numBytes)，并且实时调整execution和storage内存池的大小，如下面的代码所描述的。
    *
    * 若申请的numBytes比两者总共的内存还大，直接返回false，说明申请失败。
    * 若numBytes比storage空闲的内存大，则需要向executionPool借用
    * 借用的大小为此时execution的空闲内存和numBytes的较小值（个人观点应该是和(numBytes-storage空闲内存)的较小值）
    * 减小execution的poolSize
    * 增加storage的poolSize
    *
    * */
  override def acquireStorageMemory(
      blockId: BlockId,
      numBytes: Long,
      memoryMode: MemoryMode): Boolean = synchronized {
    assertInvariants()
    assert(numBytes >= 0)
    val (executionPool, storagePool, maxMemory) = memoryMode match {
      case MemoryMode.ON_HEAP => (
        onHeapExecutionMemoryPool,
        onHeapStorageMemoryPool,
        maxOnHeapStorageMemory)
      case MemoryMode.OFF_HEAP => (
        offHeapExecutionMemoryPool,
        offHeapStorageMemoryPool,
        maxOffHeapStorageMemory)
    }
    // 申请的内存大于storage和execution内存之和
    // 如果需要申请的内存大小超过maxStorageMemory，即execution和storage区域共享的最大内存减去Execution已用内存，
    // 快速返回， 这里是将execution和storage区域一起考虑的
    if (numBytes > maxMemory) {
      // Fail fast if the block simply won't fit
      logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
        s"memory limit ($maxMemory bytes)")
      return false
    }
    // 大于storage空闲内存
    // 如果需要申请的内存大小超过预分配storage区域中可用大小memoryFree
    if (numBytes > storagePool.memoryFree) {
      // There is not enough free memory in the storage pool, so try to borrow free memory from
      // the execution pool.
      // 从Execution区域借调的内存大小，为需要申请内存大小和预分配的Execution区域可用大小memoryFree的较小者
      val memoryBorrowedFromExecution = Math.min(executionPool.memoryFree,
        numBytes - storagePool.memoryFree)

      // Execution区域减小相应的值
      executionPool.decrementPoolSize(memoryBorrowedFromExecution)

      // Storage区域增大相应的值
      storagePool.incrementPoolSize(memoryBorrowedFromExecution)
    }
    // 通过storageMemoryPool完成内存分配
    storagePool.acquireMemory(blockId, numBytes)
  }

  override def acquireUnrollMemory(
      blockId: BlockId,
      numBytes: Long,
      memoryMode: MemoryMode): Boolean = synchronized {
    acquireStorageMemory(blockId, numBytes, memoryMode)
  }
}

object UnifiedMemoryManager {

  // Set aside a fixed amount of memory for non-storage, non-execution purposes.
  // This serves a function similar to `spark.memory.fraction`, but guarantees that we reserve
  // sufficient memory for the system even for small heaps. E.g. if we have a 1GB JVM, then
  // the memory used for execution and storage will be (1024 - 300) * 0.6 = 434MB by default.
  /**
    * 为非存储、非执行的目的留出一定的内存。这与“spark.memory.fraction”类似，但保证我们为系统保留足够的内存，
    * 即使是小堆。如果我们有1GB的JVM，那么在默认情况下，用于执行和存储的内存将是(1024 - 300)* 0.6 = 434MB。
    *
    * 伴生对象的一个属性，值为300MB，是Execution和Storage之外的一部分内存，为系统保留。
    * */
  private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024

  /**
    * 使用apply方法进行初始化
    * */
  def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
    // 获得execution和storage区域共享的最大内存
    val maxMemory = getMaxMemory(conf)
    // 构造UnifiedMemoryManager对象
    new UnifiedMemoryManager(
      conf,
      maxHeapMemory = maxMemory,
      // storage区域内存大小初始为execution和storage区域共享的最大内存的spark.memory.storageFraction，
      // 默认为0.5，即一半
      onHeapStorageRegionSize =
        (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
      numCores = numCores)
  }

  /**
   * Return the total amount of memory shared between execution and storage, in bytes.
    *返回在执行和存储之间共享的内存总量，以字节为单位。
    * 返回execution和storage区域共享的最大内存
    *
    * 伴生对象的方法。获取execution和storage部分能够使用的总内存大小。
    *
    * systemMemory即Executor的内存大小。systemMemory要求最小为reservedMemory的1.5倍，否则直接抛出异常信息。
    * reservedMemory是为系统保留的内存大小，可以由参数spark.testing.reservedMemory确定，默认值为上面的300MB。
    * 如果为默认值的话，那么对应的会要求systemMemory最小为450MB。
    * memoryFraction是整个execution和storage共用的最大内存比例，由参数spark.memory.fraction（默认值0.75）来决定。
    * 那么还剩下0.25的内存作为User Memory部分使用。那么对一个1GB内存的Executor来说，在默认情况下，可使用的内存大小
    * 为（1024 - 300） * 0.75 = 543MB
    *
    * 处理流程大体如下：
    *     1、获取系统可用最大内存systemMemory，取参数spark.testing.memory，未配置的话取运行时环境中的最大内存；
    *     2、获取预留内存reservedMemory，取参数spark.testing.reservedMemory，未配置的话，根据参数spark.testing
    *        来确定默认值，参数spark.testing存在的话，默认为0，否则默认为300M；
    *     3、取最小的系统内存minSystemMemory，为预留内存reservedMemory的1.5倍；
    *     4、如果系统可用最大内存systemMemory小于最小的系统内存minSystemMemory，即预留内存reservedMemory的1.5倍
    *        的话，抛出异常，提醒用户调大JVM堆大小；
    *     5、计算可用内存usableMemory，即系统最大可用内存systemMemory减去预留内存reservedMemory；
    *     6、取可用内存所占比重，即参数spark.memory.fraction，默认为0.75；
    *     7、返回的execution和storage区域共享的最大内存为usableMemory * memoryFraction。
    *
    *   也就是说，UnifiedMemoryManager统一内存存储管理策略中，默认情况下，storage区域和execution区域默认都占其
    * 共享内存区域的一半，而execution和storage区域共享的最大内存为系统最大可用内存systemMemory减去预留内存
    * reservedMemory后的75%。至于在哪里体现的动态调整，则要到真正申请内存时再体现了。
   */
  private def getMaxMemory(conf: SparkConf): Long = {
    // 获取系统可用最大内存systemMemory，取参数spark.testing.memory，未配置的话取运行时环境中的最大内存
    // //< 生产环境中一般不会设置 spark.testing.memory，所以这里认为 systemMemory 大小为 Jvm 最大可用内存
    val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)

    // 获取预留内存reservedMemory，取参数spark.testing.reservedMemory，
    // 未配置的话，根据参数spark.testing来确定默认值，参数spark.testing存在的话，默认为0，否则默认为300M
    //< 系统预留 300M
    val reservedMemory = conf.getLong("spark.testing.reservedMemory",
      if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)

    // 取最小的系统内存minSystemMemory，为预留内存reservedMemory的1.5倍
    val minSystemMemory = (reservedMemory * 1.5).ceil.toLong

    // 如果系统可用最大内存systemMemory小于最小的系统内存minSystemMemory，即预留内存reservedMemory的1.5倍的话，
    // 抛出异常 提醒用户调大JVM堆大小
    //< 如果 systemMemory 小于450M，则抛异常
    if (systemMemory < minSystemMemory) {
      throw new IllegalArgumentException(s"System memory $systemMemory must " +
        s"be at least $minSystemMemory. Please increase heap size using the --driver-memory " +
        s"option or spark.driver.memory in Spark configuration.")
    }

    // SPARK-12759 Check executor memory to fail fast if memory is insufficient
    if (conf.contains("spark.executor.memory")) {
      val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
      if (executorMemory < minSystemMemory) {
        throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
          s"$minSystemMemory. Please increase executor memory using the " +
          s"--executor-memory option or spark.executor.memory in Spark configuration.")
      }
    }

    // 计算可用内存usableMemory，即系统最大可用内存systemMemory减去预留内存reservedMemory
    val usableMemory = systemMemory - reservedMemory

    // 取可用内存所占比重，即参数spark.memory.fraction，默认为0.6
    val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)

    // 返回的execution和storage区域共享的最大内存为usableMemory * memoryFraction
    //< 最终 execution 和 storage 的可用内存之和为 (JVM最大可用内存 - 系统预留内存) * spark.memory.fraction
    (usableMemory * memoryFraction).toLong
  }
}