Scanning in HBase

In HBASE-5268 I proposed a "prefix delete marker" (a delete marker that would mark a set of columns identified by a column prefix as deleted).

As it turns out, I didn't quite think through how scanning/seeking in HBase works, especially when delete markers are involved. So I thought I'd write about that here.
At the end of the article I hope you will understand why column prefix delete marker that are sorted with the KeyValues that they affect cannot work in HBase.

This blog entry describes how deletion works in HBase. One interesting piece of information not mentioned there is that version and column delete markers are ordered in line together with the KeyValues that they affect and family delete markers are always sorted to the beginning of their row.

Generally each column family is represented by a Store, which manages one or more StoreFiles. Scanning is a form of merge-sort performed by a RegionScanner, which merges results of one or more StoreScanners (one per family), who in turn merge results of one or more StoreFileScanners (one for each file for this family):


RegionScanner
/ \
StoreScanner StoreScanner
/ \ / \
StoreFileScanner StoreFileScanner StoreFileScanner StoreFileScanner
| | | |
StoreFile StoreFile StoreFile StoreFile


Say you performed the following actions (T is time):
put: row1, family, col1, value1, T
delete family: row1, family, T+1
put: row1, family, col1, value2, T+2
delete columns: row1, family, col1, T+3
put: row1, family, col1, value3, T+4

What we will find in the StoreFile for "family" is this:
family-delete row1, T+1
row1,col1,value3, T+4
column-delete row1,col1, T+3
row1,col1,value2, T+2
row1,col1,value1, T

KeyValues are ordered in reverse chronological order (within their row and column). The family delete marker, however, is always first on the row. That makes sense, because family delete marker affects potentially many columns in this row, so in order to allow scanners to scan forward-only, the family delete markers need to be seen by a scanner first.

That also means that
even if we are only looking for a specific column, we always seek to the beginning of the row to check if there is a family delete with a timestamp that is greater of equal to the versions of column that we are interested in. After that the scanner seeks to the column.
And
even if we are looking for a past version of a column we have to seek to the "beginning" of the column (i.e. the potential delete marker right before it), before we can scan forward to the version we're interested in.
My initial patch for HBASE-5268 would sort the prefix delete markers just like column delete markers.

By now it should be obvious why this does not work.

The beginning of a row is a known point, so it the "beginning" of a column. The beginning of a prefix of a column is not. So to find out whether a column is marked for deletion we would have to start at the beginning of the row and then scan forward to find all prefix delete markers. That clearly is not efficient.

My 2nd attempt placed the all prefix delete markers at the beginning of the row. That technically works. But notice that a column delete marker only has to be retained by the scanner for a short period of time (until after we scanned past all versions that it affects). For prefix delete markers we'd have to keep them into memory until we scanned past all columns that start with the prefix. In addition the number of prefix delete markers for a row is not principally limited.
Family delete markers do not have this problem because (1) the number of column families is limited for other reasons and (2) store files are per family, so all we have to remember for a family in a StoreScanner is a timestamp.

来源http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值