今天在一个CDH环境中启动HBase时HBase Master启动发生异常,HBase Master采用的是两台HMaster做一个HA。从CDH管理界面查询启动成功后,HBase Master状态并不太正常(如下图所示),
以上截图看似正常,实质不正常。正常情况下两个HMaster主备可用时,Master后面应该有一个标识是活动还是备份的状态,但上面图片中缺少状态信息。
于是查看两个HMaster日志,其中一台HBase Master日志信息正常,另外一台HBase Master日志一直在刷SplitLogManager相关的日志,过一会之后即有如下报错信息,
2020-06-20 20:00:54,345 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://nameservice1/hbase/WALs/zfnode05.esgyn.cn,60020,1592556004866-splitting] installed = 1 but only 0 done
2020-06-20 20:00:54,345 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://nameservice1/hbase/WALs/zfnode07.esgyn.cn,60020,1592556014755-splitting] installed = 1 but only 0 done
2020-06-20 20:00:54,346 WARN org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Failed serverName=zfnode05.esgyn.cn,60020,1592556004866, state=SERVER_CRASH_SPLIT_LOGS; retry
java.io.IOException: error or interrupted while splitting logs in [hdfs://nameservice1/hbase/WALs/zfnode05.esgyn.cn,60020,1592556004866-splitting] Task = installed = 1 done = 0 error = 0
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:291)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:436)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:326)
at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:449)
at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.