【排错与运维】hadoop-3.3.1 HDFS中出现 Operation category JOURNAL is not supported in state standby

一、日志

namenode节点日志

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2092)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1543)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5088)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1356)
        at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:148)
        at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:16688)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:600)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:568)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:552)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2966)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1573)
        at org.apache.hadoop.ipc.Client.call(Client.java:1519)
        at org.apache.hadoop.ipc.Client.call(Client.java:1416)

namenode.out.1

继续观察其他日志

 grep -C20 "ERROR"  hadoop-xxx-namenode.out.1 
  WARNING: A sub-resource method, public javax.ws.rs.core.Response org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.deleteRoot(org.apache.hadoop.security.UserGroupInformation,org.apache.hadoop.hdfs.web.resources.DelegationParam,org.apache.hadoop.hdfs.web.resources.UserParam,org.apache.hadoop.hdfs.web.resources.DoAsParam,org.apache.hadoop.hdfs.web.resources.DeleteOpParam,org.apache.hadoop.hdfs.web.resources.RecursiveParam,org.apache.hadoop.hdfs.web.resources.SnapshotNameParam) throws java.io.IOException,java.lang.InterruptedException, with URI template, "/", is treated as a resource method
log4j:ERROR Failed to flush writer,
java.io.IOException: 设备上没有空间
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
        at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
        at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
        at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
        at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
        at org.apache.log4j.Category.callAppenders(Category.java:206)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.log(Category.java:856)
        at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
        at org.apache.hadoop.ipc.Server.logException(Server.java:3020)
        at org.apache.hadoop.ipc.Server$RpcCall.populateResponseParamsOnError(Server.java:1074)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1038)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2966    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2966)
~                                                                                                                                            

zkfc

grep -C10 "ERROR" hadoop-uoms-zkfc-docp1.out.2
max memory size         (kbytes, -m) unlimited
open files                      (-n) 655350
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31598
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
log4j:ERROR Failed to flush writer,
java.io.IOException: 设备上没有空间
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
        at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
        at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)

 

二、解决方法

1. 本文的问题

namenode主备切换时,因为zkfc所需空间不足,导致不能提供主备切换服务,导致HDFS出现问题。
解决方案:在zkfc所在机器下,清理无用空间,然后重新启动zkfc 和 hdfs 即可。

 
 

2. 其他可能的问题(仅供参考)

原因:两个namenode节点均处于standby状态,没有active状态的节点
解决办法:hdfs haadmin -transitionToActive --forcemanual nna(想要转换为active状态的namenode的名称)

2022-08-23 16:40:02,092 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.0.8.31:18138]
2022-08-23 16:40:03,094 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.0.8.31:18138]
2022-08-23 16:40:04,095 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17012 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.0.8.31:18138]
2022-08-23 16:40:05,097 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.0.8.31:18138]
2022-08-23 16:40:05,622 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: docp4/10.0.8.34:18138. Already tried 32 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)
2022-08-23 16:40:06,098 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19015 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.0.8.31:18138]
2022-08-23 16:40:07,084 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.0.8.34:18138, 10.0.8.31:18138, 10.0.8.32:18138], stream=QuorumOutputStream starting at txid 125954))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:113)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:115)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:109)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:529)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:389)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:54)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:525)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:714)
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

roman_日积跬步-终至千里

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值