配置hadoopHA(高可用集群)常见错误解决办法

 

   乾坤未定,你我皆是黑马。

 

 

在学习hadoop过程中,

 

1.在启动第二个节点的namenode时候,出现错误。

 

InconsistentFSStateException: Directory /opt/modules/hadoopha/hadoop-2.5.2/data/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

原因分析:在core-sitexml中定义的存储位置下的versionID不符合导致的例如你设置的位置是下面这样,

<configuration>
  <property>
    <!--  hdfs 地址,ha中是连接到nameservice -->
    <name>fs.defaultFS</name>
    <value>hdfs://ns1</value>
  </property>
  <property>
    <!--  -->
    <name>hadoop.tmp.dir</name>
    <value>/opt/modules/hadoopha/hadoop-2.5.2/data/tmp</value>
  </property>
</configuration>

进入tmp目录下name里面删除version。然后进入hadoop-2.5.2/bin

目录下执行重新格式话

hadoop namenode -format

之后重新启动namenode,问题解决。

 

2.namdenode启动失败,错误原因如下

.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:

检查hdfs-site.xml中 的下列配置

  <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3/ns1</value>
  </property>

然后关闭hadoop集群

 sbin/stop-all.sh 后,重新启动namenode

 

3.改变namenode状态为Active时出现错误


Operation failed: Failed on local exception: java.io.EOFException; Host Details : local host is:destination host is

本来我是将hadoop文件夹删除之后重新解压,配置变量,结果依然出现这个问题,

出现这个错误的原因是节点下多次格式化的导致的,具体的原因也太清楚。

解决办法:

进入你设置的namenode目录下,进入data/dfs/...目录下,删除name文件夹,

然后重新格式化 : bin/hdfs namenode -format   

格式化之后,使用reboot重启虚拟机后,重新打开namenode节点然后不用再次格式化,直接

使用bin/hdfs haadmin -transitionToActive nn1  改变状态

查看50070端口,成功!!!

 

4.格式化namenode出现拒绝连接错误,如下所示

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:
192.168.129.128:8485: Call From bigdata-senior01/192.168.129.128 to bigdata-senior01:8485 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)
	at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:875)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:922)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1354)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1473)
19/04/04 19:28:36 INFO ipc.Client: Retrying connect to server: bigdata-senior02/192.168.129.130:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:28:36 INFO ipc.Client: Retrying connect to server: bigdata-senior03/192.168.129.133:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:28:36 FATAL namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:
192.168.129.128:8485: Call From bigdata-senior01/192.168.129.128 to bigdata-senior01:8485 failed on connection exception: java.n

   我首先将haodop目录下的临时目录tmp下的data文件删除,然后将core-site.xml中设置的hadoop.tmp.dir文件目录删除之后,(另外我把日志的内容也清空了,个人感觉删不删都行) 然后重新 bin/hdfs namenode -format  之后发现仍然有错误,查看资料之后,再格式化之前要启动journalnode

 sbin/hadoop-daemon.sh start journalnode

然后执行格式化命令   ;成功格式化



5,在对第二个namenode进行-bootstrapStandby格式化时出现错误

  

:57:35 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
19/04/04 19:57:35 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
19/04/04 19:57:37 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:38 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:39 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:40 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:41 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:42 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:43 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:44 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:45 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:46 INFO ipc.Client: Retrying connect to server: bigdata-senior01/192.168.129.128:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/04/04 19:57:46 FATAL ha.BootstrapStandby: Unable to fetch namespace information from active NN at bigdata-senior01/192.168.129.128:8020: Call From bigdata-senior02/192.168.129.130 to bigdata-senior01:8020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
19/04/04 19:57:46 INFO util.ExitUtil: Exiting with status 2
19/04/04 19:57:46 INFO namenode.NameNode: SHUTDOWN_MSG: 

  首先要检查防火墙是否关闭,一般都设置为开机自动关闭的,另一个可能的问题时主机器没有开启namenode

 所以首先要在第一个namenode节点开启,然后进行格式化 sbin/hdfs nameno -bootstrapStandby  

看到出现 Stdorage directory /home/xxx/xxx/name has been successfully formatted,表示格式化成功!!

  • 4
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
以下是hadoop高可用集群配置的步骤: 1. 配置hadoop集群的core-site.xml文件,增加如下配置: ```xml <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>zk1:2181,zk2:2181,zk3:2181</value> </property> ``` 2. 配置hadoop集群的hdfs-site.xml文件,增加如下配置: ```xml <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>node1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>node2:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>node1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>node2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/data/journal</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> ``` 3. 配置hadoop集群的mapred-site.xml文件,增加如下配置: ```xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> ``` 4. 配置hadoop集群的yarn-site.xml文件,增加如下配置: ```xml <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>mycluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node2</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>node1:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>node2:8088</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>zk1:2181,zk2:2181,zk3:2181</value> </property> ``` 5. 配置zookeeper集群的zoo.cfg文件,增加如下配置: ```cfg server.1=node1:2888:3888 server.2=node2:2888:3888 server.3=node3:2888:3888 ``` 6. 在每个节点上创建一个myid文件,文件内容为该节点在zookeeper集群中的编号,例如在node1上创建myid文件,文件内容为1。 7. 启动zookeeper集群。 8. 在hadoop集群的每个节点上启动journalnode: ```bash hadoop-daemon.sh start journalnode ``` 9. 在namenode1上格式化hdfs: ```bash hdfs namenode -format ``` 10. 在namenode1上启动hdfs: ```bash start-dfs.sh ``` 11. 在namenode1上启动yarn: ```bash start-yarn.sh ``` 12. 在namenode1上启动自动故障转移: ```bash hdfs haadmin -transitionToActive nn1 ``` 13. 在namenode2上启动hdfs: ```bash start-dfs.sh ``` 14. 在namenode2上启动yarn: ```bash start-yarn.sh ``` 15. 在namenode2上启动自动故障转移: ```bash hdfs haadmin -transitionToStandby nn2 ``` 16. 测试hadoop高可用集群是否正常工作。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值