CDH环境中NodeManager无法启动,ResourceManager无法启动

CDH环境中NodeManager无法启动,ResourceManager无法启动

1.NodeManager无法启动可能产生的原因

1.1 可能是在该nodemanager停止的时候,向集群中新添加了其他的nodemanager,导致nodemanager启动的时候校验不通过

可能抛出的错误信息
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/xxxxx.sst

2022-05-05 11:24:11,415 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst
	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:281)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:354)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst
	at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
	at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
	at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:1517)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1504)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:342)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	... 5 more

解决方案:删除该nodemanager所在机器的 /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state 文件夹下的全部信息

rm -rf /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/*

启动之前查看一下8041端口是否被占用,没有信息就是没占用,占用的话如果是nodemanager进程就kill掉,如果是其他进程建议就看一下是谁占用的,看能不能关掉或者是为nodemanager换一个端口。搜索配置yarn.nodemanager.address更改默认端口

lsof -i:8041

在这里插入图片描述

1.2 可能是启动端口被占用了

可能抛出的错误信息:
java.net.BindException: Address already in use;

INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master01:8041] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException

解决方案:参考本文1.1,查看yarn.nodemanager.address的端口是否被占用
然后在CDH界面重启相应的NodeManager

2.ResourceManager无法启动

这里我遇到的错误如下,都是端口被占用的错误,解决方案可参考前文

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master01:8031] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException

3. 如何查看CDH的日志

3.1 在页面上查看相关服务的日志,这个有时因为服务自身的原因可能查看不了。

在这里插入图片描述

3.2 在服务器查看日志文件

登录到需要查看的服务所在的机器上

# CDH安装的服务的日志文件大都在这里
[root@slave01 ~]# cd /var/log/
[root@slave01 log]# ll
total 3160
......
drwxrwxr-x  3 hdfs         hadoop             4096 May  5 14:29 hadoop-hdfs
drwxrwxr-x  3 yarn         hadoop             4096 May  5 14:31 hadoop-yarn
......
# 前文的Nodemanager属于yarn范畴,所以这里可以进入hadoop-yarn
[root@slave01 log]# cd hadoop-yarn/
[root@slave01 hadoop-yarn]# ll
total 2868
-rw-r--r-- 1 yarn yarn   2925441 May  5 14:31 hadoop-cmf-yarn-NODEMANAGER-slave01.log.out
-rw-r--r-- 1 yarn yarn         0 May  4 12:59 SecurityAuth-yarn.audit
drwxr-xr-x 2 yarn hadoop    4096 May  4 14:11 stacks

在 /var/log/hadoop-yarn中可以看到名为*NODEMANAGER*的日志文件,查看该日志文件即可看到具体是因为什么原因抛出错误,然后对症下药。如果是查看其他服务日志,都可以通过对应服务的日志文件的名称找到。

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值