HBase0.94在flush操作时候的一个漏洞

最新推荐文章于 2023-11-26 00:13:20 发布

杨步涛的博客

最新推荐文章于 2023-11-26 00:13:20 发布

阅读量2.4k

点赞数

分类专栏： hbase 源码分析(HBase&Solr&Lucene) HBase存储文章标签： HBase snapshot flush

本文链接：https://blog.csdn.net/yangbutao/article/details/17360759

版权

在HBase0.94版本，高并发写操作时可能出现数据丢失，日志显示：Snapshot called again without clearing previous. 问题源于HBase flush操作的bug，可能导致数据丢失。flush包括snapshot、flush cache和commit三步。当异常导致snapshot未清理，后续flush将丢失部分数据。解决方法是确保新flush时合并现有snapshot。

摘要由CSDN通过智能技术生成

基于HBase0.94版本，在高并发写操作时，运行时偶尔出现丢失数据的情况，查看了HBase的日志，出现一下信息，

WARN org.apache.hadoop.hbase.regionserver.MemStore: Snapshot called again without clearing previous. Doing nothing. Another ongoing flush or did we fail last attempt?

这个问题是HBase在flush操作时的一个bug，HBase JIRA中有对应的patch，HBASE-7671(Flushing memstore again after last failure could cause data loss)

下面对出现这个问题的原因分析一下，

我们知道HBase在memstore达到一定阈值时会进行flush操作，flush主要包括三个过程，snapshot，flush cache，commit

snapshot是对memstore中的内存对象进行快照，在快照过程中需要阻塞write操作，snapshot完成后，重置内存中的对象。

flush cache过程基于snapshot生成tmp HFile，同时在HFile中的元数据中保存seqNum，这个seqNum是基于HLog的当前region最后完成的seqNum(写数据的HLog key value 对会带上seqNum)。

commit提交flush，把storeFile加入到Store中，并清除memstore snapshot(此时判断是否需要进行compact操作)(调用Store.updateStorefiles，会对store的lock加写锁，这样其他的写数据的操作就会堵塞<读锁>