namenode 异常关闭问题查询

namenode 节点异常关闭查询

  • journalnode节点没有响应,导致namenode节点关闭
2017-10-03 09:48:15,982 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 77072390
2017-10-03 09:48:34,996 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19015 ms (timeout=20000 ms) for a response for startLogSegment(77072390). Succeeded so far: [10.20.9.35:8485]
2017-10-03 09:48:36,133 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2017-10-03 09:48:36,477 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: starting log segment 77072390 failed for required journal (JournalAndStream(mgr=QJM to [10.20.9.35:8485, 10.20.9.42:8485, 10.20.9.17:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:219)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1206)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1175)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6441)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002)
        at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
        at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

但是我们在三个节点中均有看到正常写入的log segment,三个journalnode只有一个有正常返回,其余两个节点在20秒内没有返回值
实际上,这两个节点在09:48:16已经创建segment文件,在09:48:17-09:48:36 这20秒内没有日志记录,但是没有响应namenode的请求。

[root@namenode current]# stat edits_inprogress_0000000000077072390
  File: `edits_inprogress_0000000000077072390'
  Size: 1048576     Blocks: 2048       IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 22308845    Links: 1
Access: (0644/-rw-r--r--)  Uid: (  495/    hdfs)   Gid: (  492/    hdfs)
Access: 2017-10-03 09:48:16.021692269 +0800
Modify: 2017-10-03 09:48:16.022692269 +0800
Change: 2017-10-03 09:48:16.022692269 +0800

[root@datanode1 current]# stat edits_inprogress_0000000000077072390
  File: `edits_inprogress_0000000000077072390'
  Size: 1048576     Blocks: 2048       IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 48772669    Links: 1
Access: (0644/-rw-r--r--)  Uid: (  495/    hdfs)   Gid: (  492/    hdfs)
Access: 2017-10-03 09:48:16.006940368 +0800
Modify: 2017-10-03 09:48:16.008940368 +0800
Change: 2017-10-03 09:48:16.008940368 +0800

[root@datanode2 current]# stat edits_inprogress_0000000000077072390
  File: `edits_inprogress_0000000000077072390'
  Size: 1048576     Blocks: 2048       IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 56632186    Links: 1
Access: (0644/-rw-r--r--)  Uid: (  495/    hdfs)   Gid: (  492/    hdfs)
Access: 2017-10-03 09:49:47.126971673 +0800
Modify: 2017-10-03 09:49:47.304971702 +0800
Change: 2017-10-03 09:49:47.304971702 +0800

journalnode 日志:

17/10/03 09:46:05 INFO namenode.FileJournalManager: Finalizing edits file /data/namenode/jn/pasc/current/edits_inprogress_0000000000077072134 -> /data/namenode/jn/pasc/current/edits_0000000000077072134-0000000000077072387
17/10/03 09:48:15 INFO namenode.FileJournalManager: Finalizing edits file /data/namenode/jn/pasc/current/edits_inprogress_0000000000077072388 -> /data/namenode/jn/pasc/current/edits_0000000000077072388-0000000000077072389
17/10/03 09:49:39 WARN ipc.Server: IPC Server handler 4 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.startLogSegment from 10.20.9.42:44554 Call#101643 Retry#0: output error
17/10/03 09:49:40 INFO ipc.Server: IPC Server handler 4 on 8485 caught an exception
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
        at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2621)
        at org.apache.hadoop.ipc.Server.access$1900(Server.java:134)
        at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:989)
        at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1054)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2141)

文件什么的都成功了,看上去,像是QJournalProtocol.startLogSegment 这个RPC失败了

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值