Spark源码阅读笔记之BlockStore

Spark源码阅读笔记之BlockStore

BlockManager底层通过BlockStore来对数据进行实际的存储。BlockStore是一个抽象类,有三种实现:DiskStore(磁盘级别的持久化)、MemoryStore(内存级别的持久化)和TachyonStore(Tachyon内存分布式文件系统级别的持久化)。

BlockStore代码:


/**
 * Abstract class to store blocks.
 */
private[spark] abstract class BlockStore(val blockManager: BlockManager) extends Logging {

  def putBytes(blockId: BlockId, bytes: ByteBuffer, level: StorageLevel): PutResult

  /**
   * Put in a block and, possibly, also return its content as either bytes or another Iterator.
   * This is used to efficiently write the values to multiple locations (e.g. for replication).
   *
   * @return a PutResult that contains the size of the data, as well as the values put if
   *         returnValues is true (if not, the result's data field can be null)
   */
  def putIterator(
    blockId: BlockId,
    values: Iterator[Any],
    level: StorageLevel,
    returnValues: Boolean): PutResult

  def putArray(
    blockId: BlockId,
    values: Array[Any],
    level: StorageLevel,
    returnValues: Boolean): PutResult

  /**
   * Return the size of a block in bytes.
   */
  def getSize(blockId: BlockId): Long

  def getBytes(blockId: BlockId): Option[ByteBuffer]

  def getValues(blockId: BlockId): Option[Iterator[Any]]

  /**
   * Remove a block, if it exists.
   * @param blockId the block to remove.
   * @return True if the block was found and removed, False otherwise.
   */
  def remove(blockId: BlockId): Boolean

  def contains(blockId: BlockId): Boolean

  def clear() { }
}

BlockStore有三个方法来存储数据:

  • def putBytes(blockId: BlockId, bytes: ByteBuffer, level: StorageLevel): PutResult
    将字节缓存(ByteBuffer)存储到内存或磁盘

  • def putArray(blockId: BlockId,values: Array[Any],level: StorageLevel,returnValues: Boolean): PutResult
    将数组(Array[Any])存储到内存或磁盘

  • def putIterator(blockId: BlockId,values: Iterator[Any],level: StorageLevel,returnValues: Boolean): PutResult
    将Iterator存储到内存或磁盘,由于Iterator可能是从磁盘或者其他不是内存的来源读取,因此需要考虑展开时的内存占用情况

调用存储数据的方法,返回的结果为PutResult


/**
 * Result of adding a block into a BlockStore. This case class contains a few things:
 *   (1) The estimated size of the put,
 *   (2) The values put if the caller asked for them to be returned (e.g. for chaining
 *       replication), and
 *   (3) A list of blocks dropped as a result of this put. This is always empty for DiskStore.
 */
private[spark] case class PutResult(
    size: Long,
    data: Either[Iterator[_], ByteBuffer],
    droppedBlocks: Seq[(BlockId, BlockStatus)] = Seq.empty)

BlockStore有两个方法来读取数据:

  • def getBytes(blockId: BlockId): Option[ByteBuffer]
    取得存储的数据,如果需要的话(数据以Iterator形式存储)将其转化为字节缓存

  • def getValues(blockId: BlockId): Option[Iterator[Any]]
    取得存储的数据,如果需要的话(数据以字节缓存的形式存储)将其转换为Iterator[Any]

BlockStore其他方法:

  • def getSize(blockId: BlockId): Long
    根据BlockId得到Block的大小

  • def remove(blockId: BlockId): Boolean
    根据BlockId删除Block

  • def contains(blockId: BlockId): Boolean
    根据BlockId判断是的存在相应的Block

  • def clear()
    清空所有存储的Block

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值