修复HDFS主备节点edits不一致导致的无法进行checkpoint的问题

文章描述了一次针对HDFS环境中Namenode频繁宕机问题的排查与解决过程。故障表现为Namenode与JournalNode间的通信超时,编辑日志发送响应延迟。通过调整配置参数和重新搭建journalnode及standbynamenode,最终解决了问题,使系统稳定运行超过24小时。
摘要由CSDN通过智能技术生成

背景

项目上一套HDFS环境,从4月起会偶发HDFS namenode宕机的问题,后来出现的越来越频繁,最后甚至启动后四五分钟就会宕机,接到需求开始进行排查。

排查过程

日志报错

2023-05-27 22:50:21,844 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.xxx.xxx.xxx:8485, 192.xxx.xxx.yyy:8485, 192.xxx.xxx.zzz:8485], stream=QuorumOutputStream starting at txid 8169027))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:710)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188)
        at java.lang.Thread.run(Thread.java:748)

怀疑是namenode与journalnode通信超时了
参考其他博客,尝试调整dfs.qjournal.write-txns.timeout.ms参数至一分钟(默认是20s)。namenode宕机的频率下降到4-6小时一次,没有从根本上解决问题

二次排查

2023-05-29 22:00:25,517 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 58044 ms (timeout=60000 ms) for a response for sendEdits. Succeeded so far: [192.xxx.xxx.xxx:8485]
2023-05-29 22:00:26,518 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 59045 ms (timeout=60000 ms) for a response for sendEdits. Succeeded so far: [192.xxx.xxx.xxx:8485]
2023-05-29 22:00:27,474 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.xxx.xxx.xxx:8485, 192.xxx.xxx.yyy:8485, 192.xxx.xxx.zzz:8485], stream=QuorumOutputStream starting at txid 8180981))
java.io.IOException: Timed out waiting 60000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:710)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188)
        at java.lang.Thread.run(Thread.java:748)
2023-05-29 22:00:27,477 WARN  client.QuorumJournalManager (QuorumOutputStream.java:abort(74)) - Aborting QuorumOutputStream starting at txid 8180981
2023-05-29 22:00:27,482 INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.xxx.xxx.xxx:8485, 192.xxx.xxx.yyy:8485, 192.xxx.xxx.zzz:8485], stream=QuorumOutputStream starting at txid 8180981))

发现是nanenode 向journalnode发送edit日志等待响应的时候超时报错,只有一台journalnode返回成功,其他两台没有响应,触发了namenode的shutdownHook,导致namenode进程中止。

进而排查journalnode 日志,发现是三个节点都正常完成了edits文件写入

2023-05-29 22:01:00,255 INFO  server.Journal (Journal.java:scanStorageForLatestEdits(211)) - Latest log is EditLogFile(file=/hadoop/hdfs/journal/journalid/current/edits_inprogress_0000000000008180981,first=0000000000008180981,last=0000000000008180985,inProgress=true,hasCorruptHeader=false) ; journal id: journalid
2023-05-29 22:01:00,308 INFO  server.Journal (Journal.java:getSegmentInfo(740)) - getSegmentInfo(8180981): EditLogFile(file=/hadoop/hdfs/journal/journalid/current/edits_inprogress_0000000000008180981,first=0000000000008180981,last=0000000000008180985,inProgress=true,hasCorruptHeader=false) -> startTxId: 8180981 endTxId: 8180985 isInProgress: true ; journal id: journalid
2023-05-29 22:01:00,308 INFO  server.Journal (Journal.java:prepareRecovery(784)) - Prepared recovery for segment 8180981: segmentState { startTxId: 8180981 endTxId: 8180985 isInProgress: true } lastWriterEpoch: 92 lastCommittedTxId: 8180984 ; journal id: journalid
2023-05-29 22:01:00,378 INFO  server.Journal (Journal.java:getSegmentInfo(740)) - getSegmentInfo(8180981): EditLogFile(file=/hadoop/hdfs/journal/journalid/current/edits_inprogress_0000000000008180981,first=0000000000008180981,last=0000000000008180985,inProgress=true,hasCorruptHeader=false) -> startTxId: 8180981 endTxId: 8180985 isInProgress: true ; journal id: journalid
2023-05-29 22:01:00,378 INFO  server.Journal (Journal.java:acceptRecovery(872)) - Skipping download of log startTxId: 8180981 endTxId: 8180985 isInProgress: true: already have up-to-date logs ; journal id: journalid
2023-05-29 22:01:00,386 WARN  util.AtomicFileOutputStream (AtomicFileOutputStream.java:close(96)) - Unable to delete tmp file /hadoop/hdfs/journal/journalid/current/paxos/8180981.tmp
2023-05-29 22:01:00,386 INFO  server.Journal (Journal.java:acceptRecovery(905)) - Accepted recovery for segment 8180981: segmentState { startTxId: 8180981 endTxId: 8180985 isInProgress: true } acceptedInEpoch: 93 ; journal id: journalid
2023-05-29 22:01:00,424 INFO  server.Journal (Journal.java:finalizeLogSegment(634)) - Validating log segment /hadoop/hdfs/journal/journalid/current/edits_inprogress_0000000000008180981 about to be finalized ; journal id: journalid
2023-05-29 22:01:00,426 INFO  namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(143)) - Finalizing edits file /hadoop/hdfs/journal/journalid/current/edits_inprogress_0000000000008180981 -> /hadoop/hdfs/journal/journalid/current/edits_0000000000008180981-0000000000008180985
2023-05-29 22:01:00,566 INFO  namenode.TransferFsImage (TransferFsImage.java:copyFileToStream(396)) - Sending fileName: /hadoop/hdfs/journal/journalid/current/edits_0000000000008180981-0000000000008180985, fileSize: 1048576. Sent total: 1048576 bytes. Size of last segment intended to send: -1 bytes.

继续排查发现三台journalnode的的edits文件不一致,联想到一线反馈前期集群namenode重启过多次,结合Ambari告警,已经几个月没有checkpoint了,猜测可能导致edits文件与fsimage文件不匹配,导致的问题。于是决定手动重新大家standby namenode与journalnode。

重新搭建journal node

  • 停止两个namenode,停止3个journal node
  • 清空journal node current目录中所有的edits文件,包括已完成的和inprogress的
  • 启动journal node

重新搭建standby namenode

  • 清空standby namenode current目录,保留seen_txid与VERSION文件
  • 从active namenode中同步一份最新的fsimage与对应的md5过来,修改文件属组为hdfs:hadoop
  • 启动standby namenode,等待它从journal node同步edits文件,自动合并fsimage

问题解决

重新搭建journalnode 和standby namenode之后,namenode已正常运行超过24h,问题解决

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值