CDH集群nodemanager启动不了

NodeManager启动不了的故障总结:

报错如下:

下午11点36:07.339 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource
Resource hdfs://ns1/user/hdfs/.staging/job_1451440500748_16896/libjars/htrace-core-2.04.jar(->/dfs/data3/yarn/nm/usercache/hdfs/filecache/1168/htrace-core-2.04.jar) transitioned from INIT to LOCALIZED
下午11点36:07.339 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
Recovering localized resource { hdfs://ns1/user/hdfs/.staging/job_1451440500748_16896/libjars/hbase-hadoop-compat.jar, 1452201034053, FILE, null } at /dfs/data3/yarn/nm/usercache/hdfs/filecache/1170/hbase-hadoop-compat.jar
下午11点36:07.339 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource
Resource hdfs://ns1/user/hdfs/.staging/job_1451440500748_16896/libjars/hbase-hadoop-compat.jar(->/dfs/data3/yarn/nm/usercache/hdfs/filecache/1170/hbase-hadoop-compat.jar) transitioned from INIT to LOCALIZED
下午11点36:07.369 INFO org.apache.hadoop.service.AbstractService
Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
下午11点36:07.380 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit
下午11点36:07.381 INFO org.apache.hadoop.service.AbstractService
Service NodeManager failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
下午11点36:07.382 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl
Stopping NodeManager metrics system...
下午11点36:07.383 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl
NodeManager metrics system stopped.
下午11点36:07.383 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl
NodeManager metrics system shutdown complete.
下午11点36:07.383 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager
Error starting NodeManager
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
下午11点36:07.386 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager
SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at DN-BJ-MXY-2-220/10.26.2.220
************************************************************/



问题发生的原因可能是,在停yarn的时候还有运行的任务在集群中执行,此种情况可能是集群namenode所在机器故障,或者是人为停止yarn造成。

问题的解决方法是,删除任务运行时残留的文件

删除本地文件/tmp/hadoop-yarn/yarn-nm-recovery下的两个文件夹

参考url : http://stackoverflow.com/questions/27065011/cdh-5-2-error-starting-nodemanager-service-nodemanager-failed-in-state-inited-c


  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值