1、NodeManager 没起来
2013-07-25 20:06:22,266 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:196)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:329)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351)
Caused by: org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:248)
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 3 more
Caused by: org.apache.hadoop.yarn.YarnException: Failed to check for existence of remoteLogDir [/yarn/apps]
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:179)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.start(LogAggregationService.java:132)
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 5 more
/yarn/apps 目录其实存在的
重启后居然又起来了,莫名其妙
这种情况有时是因为 IP 不对 :
SHUTDOWN_MSG: Shutting down NodeManager at localhost.localdomain/192.168.1.109
日志发现不是当前 IP,待ip手动或自动配置正确后重启
2、NodeManager 又没起来,这是个更常见的错误
Caused by: java.net.ConnectException: Call From localhost.localdomain/192.168.1.109 to localhost.localdomain:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
SHUTDOWN_MSG: Shutting down NodeManager at localhost.localdomain/192.168.1.109
************************************************************/
检查 hosts 文件
192.168.1.109 localhost localhost.localdomain
检查 yarn 监控页面 http://192.168.1.109:8088/ 不能访问
查看系统有 RM 进程.
查看 RM 日志 ,并没有启动日志,每次给 RM 进程加上 Debug 参数这个进程就没日志了,看来还是参数没加好啊
调整参数后,再启动,在 Eclipse 中连接到调试端口后,再用 jps 查看时就不会出现 cannot sync ..错误了
但发现 NodeManager 还是没起来,查看日志还是上面的错误,又是8031:
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
<description>
host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
<property>
yarn 监控页面可以访问,其实8031也有监听
[root@localhost yuming]# netstat -tln | grep 8031
tcp6 0 0 192.168.1.109:8031 :::* LISTEN
3、为 RM 加上调试参数后,NM 又又没起来的问题:
RM 日志:
2013-07-29 09:40:48,750 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8031
NM 日志:
2013-07-29 09:36:17,783 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager
Caused by: java.net.ConnectException: Call From localhost.localdomain/192.168.0.137 to localhost.localdomain:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
时间现实 NM 连接 8031 时 8031 还没起来呢,差了4秒,因为 RM 在等待调试器连接
单独再启一次 NM 就可以了