菜鸟的Spark 源码学习之路 -6 Memory管理源码 -part1 功能概览

菜鸟的Spark 源码学习之路 -5 Executor源码 中,我们深入了解了Spark Executor的源码。Spark本身是一个内存计算框架,任务的执行肯定离不开内存管理。所有,这次我们打算继续探索Spark的内存管理。

先看下这个包里面都有些什么牛鬼蛇神:

看下这个包的功能描述:
/**
 * This package implements Spark's memory management system. This system consists of two main
 * components, a JVM-wide memory manager and a per-task manager:
 *
 *  - [[org.apache.spark.memory.MemoryManager]] manages Spark's overall memory usage within a JVM.
 *    This component implements the policies for dividing the available memory across tasks and for
 *    allocating memory between storage (memory used caching and data transfer) and execution
 *    (memory used by computations, such as shuffles, joins, sorts, and aggregations).
 *  - [[org.apache.spark.memory.TaskMemoryManager]] manages the memory allocated by individual
 *    tasks. Tasks interact with TaskMemoryManager and never directly interact with the JVM-wide
 *    MemoryManager.
 *
 * Internally, each of these components have additional abstractions for memory bookkeeping:
 *
 *  - [[org.apache.spark.memory.MemoryConsumer]]s are clients of the TaskMemoryManager and
 *    correspond to individual operators and data structures within a task. The TaskMemoryManager
 *    receives memory allocation requests from MemoryConsumers and issues callbacks to consumers
 *    in order to trigger spilling when running low on memory.
 *  - [[org.apache.spark.memory.MemoryPool]]s are a bookkeeping abstraction used by the
 *    MemoryManager to track the division of memory between storage and execution.
 *
 * Diagrammatically:
 *
 * {{{
 *       +-------------+
 *       | MemConsumer |----+                                   +------------------------+
 *       +-------------+    |    +-------------------+          |     MemoryManager      |
 *                          +--->| TaskMemoryManager |----+     |                        |
 *       +-------------+    |    +-------------------+    |     |  +------------------+  |
 *       | MemConsumer |----+                             |     |  |  StorageMemPool  |  |
 *       +-------------+         +-------------------+    |     |  +------------------+  |
 *                               | TaskMemoryManager |----+     |                        |
 *                               +-------------------+    |     |  +------------------+  |
 *                                                        +---->|  |OnHeapExecMemPool |  |
 *                                        *               |     |  +------------------+  |
 *                                        *               |     |                        |
 *       +-------------+                  *               |     |  +------------------+  |
 *       | MemConsumer |----+                             |     |  |OffHeapExecMemPool|  |
 *       +-------------+    |    +-------------------+    |     |  +------------------+  |
 *                          +--->| TaskMemoryManager |----+     |                        |
 *                               +-------------------+          +------------------------+
 * }}}
 *
 *
 * There are two implementations of [[org.apache.spark.memory.MemoryManager]] which vary in how
 * they handle the sizing of their memory pools:
 *
 *  - [[org.apache.spark.memory.UnifiedMemoryManager]], the default in Spark 1.6+, enforces soft
 *    boundaries between storage and execution memory, allowing requests for memory in one region
 *    to be fulfilled by borrowing memory from the other.
 *  - [[org.apache.spark.memory.StaticMemoryManager]] enforces hard boundaries between storage
 *    and execution memory by statically partitioning Spark's memory and preventing storage and
 *    execution from borrowing memory from each other. This mode is retained only for legacy
 *    compatibility purposes.
 */

内存管理主要分为两个部分:

1. JVM内存管理

2. Task内存管理

JVM内存管理由MemoryManager完成,它负责任务间内存空间分配;存储(cache和data transfer)空间分配;执行(各种计算操作)内存分配。

Task内存管理由TaskMemoryManager完成,它负责管理分配给每个任务的内存。Task直接与TaskMemoryManager交互,不会与JVM层面的MemoryManager进行直接交互。

 

两个组件的内部分别有各自的内存管理记录抽象:

MemoryConsumer 是TaskMemoryManager 的客户端,它只与每个task内部的操作和数据结构有关系,TaskMemoryManagers接收MemoryConsumers 的内存分配请求,并使用回调在内存不足时触发溢出。

MemoryPool 是MemoryManager用于跟踪管理存储和计算过程内存分配的组件。

MemoryManager有两种实现:

1. UnifiedMemoryManager: Spark1.6+默认使用的组件。强制允许执行和存储内存之间存在“软边界”,为满足一个操作的内存需求,允许它从其他地方“借用”内存空间。这就实现了动态内存管理

2. StaticMemoryManager:为兼容低版本,保留这个实现。它强制使用内存划分的“硬边界”,执行静态内存划分,不允许借用内存空间。

至此我们队Spark 的内存管理有了一个整体的认识。下一次就从Spark JVM内存管理组件——MemoryManager开始深入内存管理的源码实现。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值