hdfs3.x HA部署时,standby namenode启动失败,查看日志的报错:
[root@centos62 logs]# pwd
/usr/local/hadoop-3.1.2/logs
[root@centos62 logs]# cat hadoop-root-namenode-centos62.log
2019-07-30 17:39:39,766 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.FileNotFoundException: No valid image files found
at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:158)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:672)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1097)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
2019-07-30 17:39:39,768 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.FileNotFoundException: No valid image files found
解决:将active的namenode下的namenode持久化文件copy到standby namenode的持久化文件夹下,再次启动standby namenode就ok了!
[root@centos60 namenode]# pwd
/hadoop/namenode
[root@centos60 namenode]# ls
current
[root@centos60 namenode]# scp -r current/ root@centos62:/hadoop/namenode/
[root@centos62 sbin]# pwd
/usr/local/hadoop-3.1.2/sbin
[root@centos62 sbin]# start-dfs.sh
问题原因:启动时没有在从namenode节点执行初始化命令:hdfs namenode -bootstrapStandby(前提是主namenode启动了:hdfs namenode),其实hdfs namenode -bootstrapStandby命令做的就是从主namenode拉取最新的 FSimage,同步主namenode的元数据,所以我们直接将主namenode的持久化文件夹copy到从namenode节点也是可以的。