Error starting NodeManager-Service NodeManager failed in state INITED; cause: java.lang.Null

2 篇文章 0 订阅
2 篇文章 0 订阅

版本:CDH 5.2

1、异常日志

2014-11-25 14:21:27,873 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Adding container_1415870304563_7717_01_000011 to recently stopped containers
2014-11-25 14:21:27,874 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
2014-11-25 14:21:27,883 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit
2014-11-25 14:21:27,884 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
2014-11-25 14:21:27,886 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2014-11-25 14:21:27,886 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2014-11-25 14:21:27,886 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2014-11-25 14:21:27,886 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
2014-11-25 14:21:27,889 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************


2、分析:

levelDB document:
LevelDB never writes in place: it always appends to a log file, or merges existing files together to produce new ones. So an OS crash will cause a partially written log record (or a few partially written log records). LevelDB recovery code uses checksums to detect this and will skip the incomplete records.

Based on above information, if the incomplete record is the CONTAINER_REQUEST_KEY_SUFFIX record used to store container startRequest, this issue will happen. NM can't protect OS crash. This means we must add the error handling code to avoid NM shutdown due to NPE. This justify the patch.


3、解决方案
删除 /tmp/hadoop-yarn/ 后重新启动NodeManager

4、参考资料

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
这个错误是关于 Hadoop YARN 中的辅助服务(Auxiliary Services)初始化失败的问题。根据错误信息,发生了两个异常:`java.lang.RuntimeException` 和 `java.lang.ClassNotFoundException`。 首先,`java.lang.RuntimeException` 表示在初始化辅助服务时出现了一个运行时异常。其次,`java.lang.ClassNotFoundException` 表示找不到类 `org.apache.spark.network.yarn.YarnShuffleService`。 这个错误通常发生在 Spark Shuffle 服务配置不正确或 Spark Shuffle 相关的类路径设置有问题的情况下。 要解决这个问题,您可以尝试以下几个步骤: 1. 确保您的环境中已正确安装和配置了 Spark,并且 Spark 相关的 JAR 文件在您的应用程序或 Hadoop 的类路径中可用。 2. 检查您的配置文件,确保在 YARN 的配置中包含了正确的 Spark Shuffle 服务配置。您可以参考 Spark 和 Hadoop 的官方文档以了解正确的配置方式。 3. 确保在 YARN 的 `yarn-site.xml` 配置文件中正确设置了 Spark Shuffle 服务的类路径。您可以检查 `yarn.nodemanager.aux-services` 和 `yarn.nodemanager.aux-services.spark_shuffle.class` 这两个属性的设置。 4. 如果您使用的是自定义的 Spark Shuffle 服务实现,确保相关的 JAR 文件已正确放置在 YARN 的类路径中。 请注意,具体解决方案可能因您的环境和配置而异。如果问题仍然存在,您可以查看 Hadoop 和 Spark 的官方文档、社区支持资源以及相关错误日志,以获取更多关于此错误的详细信息和解决方案。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值