Most of the disks failed. 1/1 local-dirs have errors: [ /opt/ha/hadoop/data/nm-local-dir : Cannot

项目场景:

ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs have errors: [ /opt/ha/hadoop/data/nm-local-dir : Cannot create directory : /opt/ha/hadoop/data/nm-local-dir, error mkdir of /opt/ha/hadoop/data/nm-local-dir failed ]

问题描述

在配置hadoop的kerberos時,發現出現了如下的bug,一直都無法解決

2023-06-20 16:00:01,301 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2023-06-20 16:00:01,350 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2023-06-20 16:00:01,350 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system started
2023-06-20 16:00:01,385 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user nm/hadoop103@EXAMPLE.COM using keytab file /etc/security/keytab/nm.service.keytab
2023-06-20 16:00:01,402 INFO org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2023-06-20 16:00:01,412 INFO org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2023-06-20 16:00:01,462 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Unable to create directory /opt/ha/hadoop/data/nm-local-dir error mkdir of /opt/ha/hadoop/data/nm-local-dir failed, removing from the list of valid directories.
2023-06-20 16:00:01,464 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs have errors:  [ /opt/ha/hadoop/data/nm-local-dir : Cannot create directory : /opt/ha/hadoop/data/nm-local-dir, error mkdir of /opt/ha/hadoop/data/nm-local-dir failed ] 
2023-06-20 16:00:01,485 INFO org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl:  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@62656be4
2023-06-20 16:00:01,487 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
2023-06-20 16:00:01,488 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService
2023-06-20 16:00:01,489 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: AMRMProxyService is disabled
2023-06-20 16:00:01,489 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: per directory file limit = 8192
2023-06-20 16:00:01,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2023-06-20 16:00:01,497 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker
2023-06-20 16:00:01,530 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding auxiliary service mapreduce_shuffle, "mapreduce_shuffle"
2023-06-20 16:00:01,571 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@68f1b

原因分析:

一直報錯無法創建這個目錄,但是這個目錄,我已經創建出來了,而且也賦予了權限,但是依舊報錯,目前也不知道是什麼原因,可能是由於其父目錄的權限是hfds:

例如:Handler 发送消息有两种方式,分别是 Handler.obtainMessage()Handler.sendMessage(),其中 obtainMessage 方式当数据量过大时,由于 MessageQuene 大小也有限,所以当 message 处理不及时时,会造成先传的数据被覆盖,进而导致数据丢失。
在这里插入图片描述

在这里插入图片描述

解决方案:

在hadoop所有節點中,重新心創建了nm-local-dir目錄,並賦予權限

[root@hadoop104 logs]# mkdir /opt/ha/hadoop/nm-local-dir/
[root@hadoop104 logs]# chown -R yarn:hadoop /opt/ha/hadoop/nm-local-dir/
[root@hadoop104 logs]# chmod -R 775 /opt/ha/hadoop/nm-local-dir/
[root@hadoop104 logs]# cd ..
[root@hadoop104 hadoop]# ll
total 212
drwxr-xr-x 2 sarah sarah    4096 Sep 12  2019 bin
drwx------ 6 hdfs  hadoop   4096 Jun 20 14:51 data
drwxr-xr-x 3 root  hadoop   4096 Sep 12  2019 etc
drwxr-xr-x 2 sarah sarah    4096 Sep 12  2019 include
drwxr-xr-x 3 sarah sarah    4096 Sep 12  2019 lib
drwxr-xr-x 4 sarah sarah    4096 Sep 12  2019 libexec
-rw-rw-r-- 1 sarah sarah  147145 Sep  4  2019 LICENSE.txt
drwxrwxr-x 3 hdfs  hadoop   4096 Jun 21 08:20 logs
drwxrwxr-x 2 yarn  hadoop   4096 Jun 21 08:59 nm-local-dir
-rw-rw-r-- 1 sarah sarah   21867 Sep  4  2019 NOTICE.txt
-rw-rw-r-- 1 sarah sarah    1366 Sep  4  2019 README.txt
drwxr-xr-x 3 sarah sarah    4096 Jun 20 16:08 sbin
drwxr-xr-x 4 sarah sarah    4096 Sep 12  2019 share

修改yarn-site文件,將yarn.nodemanager.local-dirs改為/opt/ha/hadoop/nm-local-dir

 ` <property>
        <description>List of directories to store localized files in. An
            application's localized file directory will be found in:
            ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
            Individual containers' work directories, called container_${contid}, will
            be subdirectories of this.
        </description>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/opt/ha/hadoop/nm-local-dir</value>
    </property>`

分發yarn-site文件,重啟yarn集群
在这里插入图片描述
所有節點從unhealthy狀態變為active,done~~~~~~~~

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

m0_37759590

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值