yarn ——集群节点丢失,重启后也连不上可用节点

解决步骤:

查看页面发现可用资源全显示0,下面图片是解决后的。

解决步骤,查了很多网上资料,有说关闭yarn.nodemanager.vmem-check-enabled 关闭线程检查内存。试过之后不好使。

后来去监控hadoop的log日志

tailf hadoop-root-nodemanager-craw-node212.log

tailf hadoop-root-resourcemanager-craw-node212.log

发现 hadoop-root-nodemanager-craw-node212.log 打印信息如下:

2022-04-08 14:00:50,775 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir error, used space above threshold of 90.0%, removing from list of valid directories
2022-04-08 14:00:50,775 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /home/data/software/hadoop-3.2.2/logs/userlogs error, used space above threshold of 90.0%, removing from list of valid directories
2022-04-08 14:00:50,776 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed: 1/1 local-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir : used space above threshold of 90.0% ] ; 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/logs/userlogs : used space above threshold of 90.0% ] 
2022-04-08 14:00:50,776 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir : used space above threshold of 90.0% ] ; 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/logs/userlogs : used space above threshold of 90.0% ] 
2022-04-08 14:00:50,797 INFO org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl:  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@54e041a4
2022-04-08 14:00:50,798 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
2022-04-08 14:00:50,800 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService
2022-04-08 14:00:50,800 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: AMRMProxyService is disabled

因为nodemanager检测到本地磁盘使用量超过90%。 

解决办法:

1 把节点上的不用的东西删完,删到90%以下即可

2 在yarn-site.xml中添加以下配置信息,修改上限和下限

  <property>
     <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
     <value>0.0</value>
  </property>
  <property>
     <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
     <value>100.0</value>
 </property>

 

此外: 上面报错会引起下面报错信息:解决上面的问题即可。

2022-04-08 14:06:51,033 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Could not carry out resource dir checks for /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir, which was marked as good
java.io.FileNotFoundException: File /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir/filecache does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:668)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:989)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值