linux 关闭刷屏错误,Namenode停止报错 Error: flush failed for required journal

最新推荐文章于 2022-04-04 00:30:02 发布

无人机中的城堡

最新推荐文章于 2022-04-04 00:30:02 发布

阅读量792

点赞数

文章标签： linux 关闭刷屏错误

hadoop集群主Namenode突然停止，报错如下：

2016-03-23 17:12:25,877 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(1153)) - Ending log segment 574144342

2016-03-23 17:12:26,350 WARN client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 19047 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [192.168.14.16:8485]

2016-03-23 17:12:27,304 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(364)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.14.14:8485, 192.168.14.15:8485, 192.168.14.16:8485], stream=QuorumOutputStream starting at txid 574144342))

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:499)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:359)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:495)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:623)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:3188)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3149)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:701)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:523)

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

2016-03-23 17:12:27,304 WARN client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 574144342

2016-03-23 17:12:27,308 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1

2016-03-23 17:12:27,313 INFO namenode.NameNode (StringUtils.java:run(640)) - SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at nn01

************************************************************/

实现HA的集群变成必须依赖于JournalNode Quorum才能正常工作。

在这一点上，和HBase对Zookeeper的依赖有点类似。如果NameNode无法获取JournalNode Quorum，HDFS则会无法格式化或无法启动，会提示如下错误信息：

这些JournalNode的负载不大，建议是可以运行在Master daemon的机器上。

在配置方面，除了常规HA外，需要指定JournalNode Quorum和JournalNode用于存储的目录位置。这两者分别通过“dfs.namenode.shared.edits.dir”和“dfs.journalnode.edits.dir”来指定。

以上内容的报错，应该就是找不到JournalNode的原因，

在写journalnode超时时，触发了 ExitUtil类的terminate 方法，终止当前的进程：

JournalSet类中：

for (JournalAndStream jas : journals) {

try {

closure.apply(jas);

} catch (Throwable t) {

if (jas.isRequired()) {

final String msg = "Error: " + status + " failed for required journal ("

+ jas + ")" ;

LOG .fatal(msg, t);

// If we fail on *any* of the required journals, then we must not

// continue on any of the other journals. Abort them to ensure that

// retry behavior doesn't allow them to keep going in any way.

abortAllJournals();

// the current policy is to shutdown the NN on errors to shared edits

// dir . There are many code paths to shared edits failures - syncs ,

// roll of edits etc. All of them go through this common function

// where the isRequired() check is made. Applying exit policy here

// to catch all code paths.

terminate(1, msg);

} else {

LOG .error("Error: " + status + " failed for (journal " + jas + ")" , t);

badJAS.add(jas);

}

ExitUtil类的terminate方法，调用了System.exit方法：

/**

* Terminate the current process. Note that terminate is the *only* method

* that should be used to terminate the daemon processes.

* @param status exit code

* @param msg message used to create the ExitException

* @throws ExitException if System.exit is disabled for test purposes

public static void terminate(int status, String msg) throws ExitException {

LOG.info( "Exiting with status " + status);

if (systemExitDisabled) {

ExitException ee = new ExitException(status, msg);

LOG.fatal( "Terminate called", ee);

if (null == firstExitException) {

firstExitException = ee;

}

throw ee;

}

System.exit(status);

}

最后启动下Namenode就好了，所以hadoop有HA还是很有必要的，一个Namenode停止不影响集群。

可选优化方法(网上搜索未进行配置)：

1)调节journalnode 的写入超时时间

dfs.qjournal.write-txns.timeout.ms

2)调整namenode 的java参数，提前触发 full gc，这样full gc 的时间就会小一些。

3)默认namenode的fullgc方式是parallel gc，是stw模式的，更改为cms的格式。调整namenode的启动参数：

-XX:+UseCompressedOops

-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled

-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0

-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC

-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75

-XX:SoftRefLRUPolicyMSPerMB=0

阅读(4945) | 评论(0) | 转发(0) |

无人机中的城堡

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
linux 关闭刷屏错误,Namenode停止报错 Error: flush failed for required journal

hadoop集群主Namenode突然停止，报错如下：2016-03-23 17:12:25,877 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(1153)) - Ending log segment 5741443422016-03-23 17:12:26,350 WARN client.QuorumJournal...
复制链接

扫一扫