MemoryStroe负责将没有序列化的Java对象数组或者序列化的ByteBuffer存储到内存中。
主要包含的内容:
1. entries : 存储Block数据的Map,key 为 BlockId,value 为 MemoryEntry,并能根据存储的先后顺序访问;
private val entries = new LinkedHashMap[BlockId, MemoryEntry](32, 0.75f, true)
2.unrollMemoryMap: 当前Driver或者Executor中所有线程展开的Block都存入Map中,key为线程Id,value为线程展开的所有块的内存的大小总和;
// A mapping from taskAttemptId to amount of memory used for unrolling a block (in bytes)
// All accesses of this map are assumed to have manually synchronized on `memoryManager`
private val unrollMemoryMap = mutable.HashMap[Long, Long]()
3.pendingUnrollMemoryMap :存放Block已经展开但是还没有放入缓存的内存
// Same as `unrollMemoryMap`, but for pending unroll memory as defined below.
// Pending unroll memory refers to the intermediate memory occupied by a task
// after the unroll but before the actual putting of the block in the cache.
// This chunk of memory is expected to be released *as soon as* we finish
// caching the corresponding block as opposed to until after the task finishes.
// This is only used if a block is successfully unrolled in its entirety in
// memory (SPARK-4777).
private val pendingUnrollMemoryMap = mutable.HashMap[Long, Long]()
4.unrollMemoryThreshold: 每个线程用来展开Block的初始内存阈值,通过spark.storage.unrollMemoryThreshold属性设置大小;
// Initial memory to request before unrolling any block
private val unrollMemoryThreshold: Long =
conf.getLong("spark.storage.unrollMemoryThreshold", 1024 * 1024)
5.maxMemory: 当前Driver或者Executor的最大可用内存;
/** Total amount of memory available for storage, in bytes. */
private def maxMemory: Long = memoryManager.maxStorageMemory
6.memoryUsed: 当前Driver或Executor已经使用放到的内存;
/** Total storage memory used including unroll memory, in bytes. */
private def memoryUsed: Long = memoryManager.storageMemoryUsed
7.currentUnrollMemory: 所有展开的Block的内存之和,即当前Driver或者Executor中所有线程展开的Block的内存之和;
/**
* Return the amount of memory currently occupied for unrolling blocks across all tasks.
*/
def currentUnrollMemory: Long = memoryManager.synchronized {
unrollMemoryMap.values.sum + pendingUnrollMemoryMap.values.sum
}
8.blocksMemoryUsed:当前Driver或者Executor中的Blocks已经使用的内存
/**
* Amount of storage memory, in bytes, used for caching blocks.
* This does not include memory used for unrolling.
*/
private def blocksMemoryUsed: Long = memoryManager.synchronized {
memoryUsed - currentUnrollMemory
}
1. 数据存储方法 putBytes
如果Block可以被反序列化,那么先对Block反序列化,然后调用putIterator;否则调用tryToPut方法。
override def putBytes(blockId: BlockId, _bytes: ByteBuffer, level: StorageLevel): PutResult = {
// Work on a duplicate - since the original input might be used elsewhere.
val bytes = _bytes.duplicate()
bytes.rewind()
if (level.deserialized) {
val values = blockManager.dataDeserialize(blockId, bytes)
putIterator(blockId, values, level, returnValues = true)
} else {
val droppedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
tryToPut(blockId, bytes, bytes.limit, deserialized = false, droppedBlocks)
PutResult(bytes.limit(), Right(bytes.duplicate()), droppedBlocks)
}
}