项目场景:
配置YARN的HA模式,单独启动Resourcemanager进程,感受YARN HA模式的故障自动转移功能
问题描述:
配置好YARN的HA模式,并且使用脚本同步后,在hadoop102和hadoop103节点中启动Resourcemanager进程都失败
[henry@hadoop103 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-henry-resourcemanager-hadoop103.out
[henry@hadoop103 ~]$ jps
2486 DataNode
2428 ZooKeeperMain
2844 Jps
2589 JournalNode
2350 QuorumPeerMain
#可以看到,没有Resourcemanager进程。
原因分析:
yarn-site.xml 文件配置有问题
解决方案:
①查看日志信息,看看是什么问题。2021-09-05 20:38:16,339 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.net.ConnectException: Call From hadoop102/192.168.128.102 to hadoop102:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy14.sendHeartbeat(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:554)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
可以看到,提示是连接失败异常
②尝试启动namenode,同样没有进程,在浏览器也无法以主机名+端口的形式登录查看。
③查看namenode的启动日志:
[henry@hadoop102 ~]$ cat /opt/module/hadoop-2.7.2/logs/hadoop-henry-namenode-hadoop102.out
[Fatal Error] yarn-site.xml:17:52: The string "--" is not permitted within comments.
(kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 3816
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
提示yarn-site.xml 文件中注释的书写有误!
⑤把yarn-site.xml 文件中注释的书写改正确,重新启动Resourcemanager,成功。