hbase 执行mob compact操作后mob 文件数不是减少了而是增多了问题分析

hbase的表static_file用来保存图片文件, 利用了MOB特性. 执行mob compact操作前有8000多文件, 执行结束后增加到16000多个文件, 比较奇怪.

1 首先查看regionserver的log:

     2023-03-20 21:21:16,879 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HStore: Validating hfile at hdfs://mycluster/hbase/mobdir/.tmp/.bulkload/default/static_file/f033ab37c30201f73f142449d037028d20211108/data/2154061093b7460a964b2b8de3042847 for inclusion in store data region static_file,80,1530843120472.fe6e813ce1a34afdf3811da510387d71.
2023-03-20 21:21:16,882 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HRegion: Flushing 1/1 column families, dataSize=6.60 MB heapSize=6.00 MB
2023-03-20 21:21:16,910 INFO  [RpcServer.default.RWQ.Fifo.write.handler=52,queue=4,port=16020] regionserver.HRegion: writing data to region static_file,80,1530843120472.fe6e813ce1a34afdf3811da510387d71. with WAL disabled. Data may be lost in the event of a crash.
2023-03-20 21:21:17,518 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HMobStore: Renaming flushed file from hdfs://mycluster/hbase/mobdir/.tmp/f033ab37c30201f73f142449d037028d202303204a6e52548e8a4170b1c174eb02ebbcec to hdfs://mycluster/hbase/mobdir/data/default/static_file/b426cd82141c106581ecc09f598198da/data/f033ab37c30201f73f142449d037028d202303204a6e52548e8a4170b1c174eb02ebbcec
2023-03-20 21:21:17,980 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] mob.DefaultMobStoreFlusher: Mob store is flushed, sequenceid=7828890, memsize=6.6 M, hasBloomFilter=true, into tmp file hdfs://mycluster/hbase/data/default/static_file/fe6e813ce1a34afdf3811da510387d71/.tmp/data/f29614a3cda84181ae40ce138b5ecb37
2023-03-20 21:21:18,139 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HStore: Added hdfs://mycluster/hbase/data/default/static_file/fe6e813ce1a34afdf3811da510387d71/data/f29614a3cda84181ae40ce138b5ecb37, entries=182, sequenceid=7828890, filesize=36.1 K
2023-03-20 21:21:18,140 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HRegion: Finished flush of dataSize ~6.60 MB/6918478, heapSize ~6.62 MB/6938312, currentSize=536.34 KB/549217 for fe6e813ce1a34afdf3811da510387d71 in 1258ms, sequenceid=7828890, compaction requested=true
2023-03-20 21:21:18,574 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HStore: Loaded HFile hdfs://mycluster/hbase/staging/root__static_file__valhos8bnrsbihsmv5hs2qmhrlueesgchh17dv9n2fa1hps8bsuqk9ks29mk26ln/data/2154061093b7460a964b2b8de3042847 into store 'data' as hdfs://mycluster/hbase/data/default/static_file/fe6e813ce1a34afdf3811da510387d71/data/c40d6498f4e94fdb82ae2582199e2654_SeqId_7828890_ - updating store file list.
2023-03-20 21:21:18,578 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HStore: Loaded HFile hdfs://mycluster/hbase/data/default/static_file/fe6e813ce1a34afdf3811da510387d71/data/c40d6498f4e94fdb82ae2582199e2654_SeqId_7828890_ into store 'data
2023-03-20 21:21:18,578 INFO  [RpcServer.default.RWQ.Fifo.read.handler=135,queue=15,port=16020] regionserver.HStore: Successfully loaded store file hdfs://mycluster/hbase/staging/root__static_file__valhos8bnrsbihsmv5hs2qmhrlueesgchh17dv9n2fa1hps8bsuqk9ks29mk26ln/data/2154061093b7460a964b2b8de3042847 into store data (new location: hdfs://mycluster/hbase/data/default/static_file/fe6e813ce1a34afdf3811da510387d71/data/c40d6498f4e94fdb82ae2582199e2654_SeqId_7828890_)

发现在read handler线程中会执行flush操作, 比较奇怪.  一般flush是写操作导致的.

mob compact过程中会bulkload操作,

HRegion.bulkLoadHFiles(Collection<Pair<byte[], String>> familyPaths,
      boolean assignSeqId, BulkLoadListener bulkLoadListener, boolean copyFile)

// We need to assign a sequential ID that's in between two memstores in order to preserve
      // the guarantee that all the edits lower than the highest sequential ID from all the
      // HFiles are flushed on disk. See HBASE-10958.  The sequence id returned when we flush is
      // guaranteed to be one beyond the file made when we flushed (or if nothing to flush, it is
      // a sequence id that we can be sure is beyond the last hfile written).
      if (assignSeqId) {
        FlushResult fs = flushcache(true, false, FlushLifeCycleTracker.DUMMY);
        if (fs.isFlushSucceeded()) {
          seqId = ((FlushResultImpl)fs).flushSequenceId;
        } else if (fs.getResult() == FlushResult.Result.CANNOT_FLUSH_MEMSTORE_EMPTY) {
          seqId = ((FlushResultImpl)fs).flushSequenceId;
        } else if (fs.getResult() == FlushResult.Result.CANNOT_FLUSH) {
          // CANNOT_FLUSH may mean that a flush is already on-going
          // we need to wait for that flush to complete
          waitForFlushes();
        } else {
          throw new IOException("Could not bulk load with an assigned sequential ID because the "+
            "flush didn't run. Reason for not flushing: " + ((FlushResultImpl)fs).failureReason);
        }
      }

改函数的参数assignSeqId为TRUE,  会导致flushcache()而生成新的seqId. 这个seqId会用在产生文件名, 因此可以关闭掉.

HStore.preBulkLoadHFile(finalPath, seqId) / HRegionFileSystem.preCommitStoreFile / generateUniqueName((seqNum < 0) ? null : "_SeqId_" + seqNum + "_");

2  PartitionedMobCompactor.bulkloadRefFile() 中会new LoadIncrementalHFiles对象.

3 class LoadIncrementalHFiles: 

private static final String ASSIGN_SEQ_IDS = "hbase.mapreduce.bulkload.assign.sequenceNumbers";

assignSeqIds = conf.getBoolean(ASSIGN_SEQ_IDS, true);

因此在hmaster的配置文件中hbase.mapreduce.bulkload.assign.sequenceNumbers 为false,

关闭掉就可以解决该问题.

作者  伍增田

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值