[HBase事故] HBase regionserver被强制关闭问题分析

[HBase命案系列] HBase regionserver被强制关闭问题分析

一.案发现场

2018-09-06 15:51:59,685 ERROR [regionserver/node-210:16020.logRoller] regionserver.LogRoller: Log rolling failed
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(String.java:1967)
    at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWALPrefixFromWALName(AbstractFSWALProvider.java:515)
    at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.preLogRoll(ReplicationSourceManager.java:555)
    at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener.preLogRoll(ReplicationSourceWALActionListener.java:50)
    at org.apache.hadoop.hbase.wal.DisabledWALProvider$DisabledWAL.rollWriter(DisabledWALProvider.java:126)
    at org.apache.hadoop.hbase.wal.DisabledWALProvider$DisabledWAL.rollWriter(DisabledWALProvider.java:144)
    at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:184)
    at java.lang.Thread.run(Thread.java:748)
2018-09-06 15:51:59,707 ERROR [regionserver/node-210:16020.logRoller] regionserver.HRegionServer: ***** ABORTING region server node-210,16020,1536216717222: Log rolling failed *****

二、案情分析:在使用多线程对HBase进行高频数据插入时,RegionServer无故被停止服务!需要找出真凶!从日志当中发现疑犯LogRoller,RegionServer被杀死前有疑犯留下的痕迹“String index out of range: -1”。对此进行进一步推理。
三、案情推理:
1.根据痕迹,找到帮凶,源码 AbstractFSWALProvider.java

  .....
  @VisibleForTesting
  public static final String META_WAL_PROVIDER_ID = ".meta";
  .....
  public static String getWALPrefixFromWALName(String name) {
    int endIndex = name.replaceAll(META_WAL_PROVIDER_ID, "").lastIndexOf(".");
    return name.substring(0, endIndex);
  }

进一步确定,endIndex出问题了,死亡的信息来源于此。
2.查看其老大ReplicationSourceManager.java

 public void preLogRoll(Path newLog) throws IOException {
    String logName = newLog.getName();
    String logPrefix = AbstractFSWALProvider.getWALPrefixFromWALName(logName);
    ......

发现凶器可能来源于logName,因为log name不存在,而导致字符数组越界!

3.追踪凶器来源

 public void preLogRoll(Path oldPath, Path newPath) throws IOException {
    manager.preLogRoll(newPath);
  }

再查看其死者生前的日志记录hbase-hadoop-regionserver-node-218.log

2018-09-04 18:22:23,832 INFO  [RS_OPEN_META-regionserver/node-210:16020-0] wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB, prefix=node-210%2C16020%2C1536056537213.meta, suffix=.meta, logDir=hdfs://node-210:9000/hbase/WALs/node-210,16020,1536056537213, archiveDir=hdfs://node-210:9000/hbase/oldWALs
2018-09-04 18:22:23,847 INFO  [RS_OPEN_META-regionserver/node-210:16020-0] wal.AbstractFSWAL: New WAL /hbase/WALs/node-210,16020,1536056537213/node-210%2C16020%2C1536056537213.meta.1536056543835.meta

初步判断.meta的WAL文件可能出问题,导致数组越界。

四、案情复现
因HDFS上面的.meta文件出问题(被删除?memstore flush时被改名?)
1.使用ycsb工具对HBase不间断进行上亿条数据插入

../bin/ycsb load hbase20 -P workloada -threads 100 -p table=t1 -p columnfamily=family -p recordcount=100000000 -s

2.删除HDFS的hbase的.meta表

]# su - hadoop -c "/usr/local/hadoop/bin/hdfs dfs -rm /hbase/WALs/node-218,16020,1536235084000/node-218%2C16020%2C1536235084000.meta.1536237574487.meta"
18/09/06 20:41:13 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /hbase/WALs/node-218,16020,1536235084000/node-218%2C16020%2C1536235084000.meta.1536237574487.meta

查看regionserver日志

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /hbase/WALs/gz-52-218,16020,1536235084000/gz-52-218%2C16020%2C1536235084000.meta.1536237591217.meta (inode 22978): File does not exist. [Lease.  Holder: DFSClient_NONMAPREDUCE_95794702_1, pendingcreates: 3]

······
2018-09-06 20:41:14,281 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted

案发现场初步复现,因.meta.文件出错,导致HRegionServer被杀死。

后续需要对.meta.凶器进行进一步分析,达到最终结案。
【未完持续】

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值