Spark源码阅读笔记之BlockObjectWriter

Spark源码阅读笔记之BlockObjectWriter

Spark中Hash Shuffle阶段能将多个map的结果合并到一个文件,以减少文件的数量,主要依赖于BlockObjectWriterBlockObjectWriter是一个接口,用来直接操作Block对应的存储容器(目前只支持磁盘存储,即只能将数据添加到Block对应的磁盘文件中),可以直接向存储容器中添加数据,从而实现向相应的Block中添加数据的操作。一个存储容器(如一个文件)可以对应多个Block,这样就将多个Block合并到一个文件中,从而减少了文件的数量,而每个Block则对应该存储容器中一段连续的数据段。BlockObjectWriter支持回滚,因此在添加数据出错时,可以将数据回滚,以保证原子性,但该接口不支持并发操作。BlockObjectWriter只有一个实现:DiskBlockObjectWriter

An interface for writing JVM objects to some underlying storage. This interface allows appending data to an existing block, and can guarantee atomicity in the case of faults as it allows the caller to revert partial writes.
This interface does not support concurrent writes. Also, once the writer has been opened, it cannot be reopened again.

BlockObjectWriter的方法

  • open(): BlockObjectWriter
    打开输入流

  • close()
    关闭输入流

  • isOpen: Boolean
    判断是否打开

  • commitAndClose(): Unit
    提交缓冲中的内容,并把写入的内容对应到相应的Block上

    Flush the partial writes and commit them as a single atomic block.

  • revertPartialWritesAndClose()
    撤销所有的写入操作,将文件中的内容恢复到写入数据之前。

    Reverts writes that haven’t been flushed yet. Callers should invoke this function when there are runtime exceptions. This method will not throw, though it may be unsuccessful in truncating written data.

  • write(value: Any)
    写入一条数据。

  • fileSegment(): FileSegment
    返回FileSegment,该方法只有在commitAndClose方法调用之后才有效。

    Returns the file segment of committed data that this Writer has written.This is only valid after commitAndClose() has been called.

FileSegment表示文件的一个连续的片段:

References a particular segment of a file (potentially the entire file), based off an offset and a length.


                
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值