记一次hbase写入慢排查分析

版本:1.2.6

现象:hbase写入耗时长,单机slow put数量上升,单机flush慢,内存达到高水位block request至高水位下(内存达到低水位会不断flush至低水位下),目前24G内存,高水位9.6G(24*0.4),低水位7.68G(9.6*0.8)

具体分析:

1 flush慢
并非写入hdfs慢,怀疑是scan耗时太长,甚至可能死锁了

2 flush的size大
很好理解,并不是说达到128MB立即就flush了,是扔到flush队列了,flush队列可能有普通region的flush(队列,先进先出)和水位线flush(优先flush占内存大的),如果flush很慢,等处理到你的时候,你可能内存已经远超阈值了

3 hlog数量狂增
怀疑删除机制问题?

解决方法:将高水位升至14.4G(24*0.6), 尽量避免请求block

日志:
水位线的长耗时flush-flush所有store
Flush of region hubble_app_log_diagnosis,574D9B34-A876-4A4B-A846-B5D0873BA2CC92233704017307906970718,1635282389698.d0b92fc4724e4f230616b26d03cffa17. due to global heap pressure. Total Memstore size=9.4 G, Region memstore size=168.0 M
2022-01-18 08:32:09,640 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 2/2 column families, memstore=168.04 MB
2022-01-18 08:33:12,603 INFO [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=2549251141, memsize=168.0 M, hasBloomFilter=true, into tmp file hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/d0b92fc4724e4f230616b26d03cffa17/.tmp/81e64cda1e87440dbfbd168bb4f9b318
2022-01-18 08:33:12,610 INFO [MemStoreFlusher.1] regionserver.HStore: Added hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/d0b92fc4724e4f230616b26d03cffa17/cf/81e64cda1e87440dbfbd168bb4f9b318, entries=520165, sequenceid=2549251141, filesize=19.6 M
2022-01-18 08:33:12,611 INFO [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~168.04 MB/176205272, currentsize=4.94 MB/5180784 for region hubble_app_log_diagnosis,574D9B34-A876-4A4B-A846-B5D0873BA2CC92233704017307906970718,1635282389698.d0b92fc4724e4f230616b26d03cffa17. in 62971ms, sequenceid=2549251141, compaction requested=true

普通flush-某region的memstore总大小大于128MB,触发region的flush,默认flushPolicy是flush大于16MB的store,若region的memstore总大小达到128MB*4-触发block, 抛出RegionTooBusyException异常,这个和上面的slow put(block至高水位以下)不一样
Flushing 1/2 column families, memstore=128.00 MB; cf=128.01 MB
2022-01-20 22:32:39,330 INFO [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=2366406713, memsize=128.0 M, hasBloomFilter=true, into tmp file hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/da586b925c3f41515c0d6870e01371f5/.tmp/b932318cba5948b9baa0a92fc9763ed6
2022-01-20 22:32:39,342 INFO [MemStoreFlusher.1] regionserver.HStore: Added hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/da586b925c3f41515c0d6870e01371f5/cf/b932318cba5948b9baa0a92fc9763ed6, entries=396461, sequenceid=2366406713, filesize=14.5 M
2022-01-20 22:32:39,343 INFO [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~128.01 MB/134223640, currentsize=58.96 KB/60376 for region hubble_app_log_diagnosis,4352a67c50b1ebf3e70b712c75d4bb3992233704432624727940749,1624348860256.da586b925c3f41515c0d6870e01371f5. in 547ms, sequenceid=2366406713, compaction requested=true

相关issue:
https://issues.apache.org/jira/browse/HBASE-2087
1 The method MemStoreFlusher#checkStoreFileCount is called from flushRegion. flushRegion is called by MemStoreFlusher#run thread. If the checkStoreFileCount finds too many store files, it’ll stick around waiting on a compaction to happen. While its hanging, the MemStoreFlusher#run is held up. No other region can flush. Meantime WALs will be rolling and memory will be accumulating writes.
2 The issue of delaying memstore flush still occurs after backport hbase-15871.Reverse scan takes a long time to seek previous row in the memstore full of deleted cells.
3 Memstore flush doesn’t finish because of backwardseek() in memstore scanner.
4 In write heavy scenario one of the regions does not get flushed causing RegionTooBusyExceptione

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值