HBase I/O: HFile

翻墙所得,转帖过来暂存!

 

文章来源:http://th30z.blogspot.com/2011/02/hbase-io-hfile.html

 

In the beginning HBase uses MapFile class to store data persistently to disk, and then (from version 0.20) a new file format is introduced. HFile is a specific implementation of MapFile with HBase related features.

HFile doesn't know anything about key and value struct/type (row key, qualifier, family, timestamp, …). As Hadoop' SequenceFile (Block-Compressed), keys and values are grouped in blocks, and blocks contains records. Each record has two Int Values that contains Key Length and Value Length followed by key and value byte-array.

HFile.Writer has only a couple of append overload methods, one for KeyValue class and the other for byte-array type. As for SequenceFile, each key added must be greater than the previous one. If this condition is not satisfied an IOException() is raised.

By default each 64k of data (key + value) records are squeezed together in a block and the block is written to the HFile OutputStream with the specified compression, if specified. Compression Algorithm and Block size are both (long)constructor arguments.

One thing that SequenceFile is not good at, is adding Metadata. Metadata can be added to SequenceFile just from the constructor, so you need to prepare all your metadata before creating the Writer.

HFile adds two "metadata" type. One called Meta-Block and the other called FileInfo. Both metadata types are kept in memory until close() is called. 

Meta-Block is designed to keep large amount of data and its key is a String, while FileInfo is a simple Map and is preferred for small information and keys and values are both byte-array. Region-server' StoreFile uses Meta-Blocks to store a BloomFilter, and FileInfo for Max SequenceId, Major compaction key and Timerange info.

On close(), Meta-Blocks and FileInfo is written to the OutputStream. To speedup lookups an Index is written for Data-Blocks and Meta-Blocks, Those indices contains n records (where n is the number of blocks) with block information (block offset, size and first key). 
At the end a Fixed File Trailer is written, this block contains offsets and counts for all the HFile Indices, HFile Version, Compression Codec and other few information.

Once the file is written, the next step is reading it. You've to start by loading FileInfo, the loadFileInfo() of HFile.Reader loads in memory the Trailer-block and all the indices, that allows to easily query keys. Through the HFileScanner you can seek to a specified key, and iterate over.
The picture above, describe the internal format of HFile...
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值