Hadoop Troubleshooting

最新推荐文章于 2022-07-29 23:18:26 发布

chansonzhang

最新推荐文章于 2022-07-29 23:18:26 发布

阅读量2.7k

点赞数

分类专栏： Big Data 文章标签： Hadoop

本文链接：https://blog.csdn.net/chansonzhang/article/details/83865441

版权

Big Data 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

错误：org.apache.hadoop.security.AccessControlException: Permission denied

原因：当前用户没有权限

解决方案：在hdfs-site.xml中加入

<property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

错误：org.apache.hadoop.hdfs.server.namenode.SafeModeException

原因：NameNode在启动的时候首先进入安全模式

解决：如果是此种原因，等启动完成后再重新运行程序

在windows下访问远程服务器上的hadoop，报错

problem	solution
INFO client.RMProxy: Connecting to ResourceManager at namenode:8032 Exception in thread “main” java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: “namenode”:8032; java.net.UnknownHostException;	将`yarn-site.xml`中`yarn.resourcemanager.hostname`的值由`hostname`改为可访问的IP地址,如果hadoop部署在NAT的子网内，则还需进行端口映射
Exception in thread “main” org.apache.hadoop.security.AccessControlException: Permission denied: user=xxx, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------	`hadoop fs -rm -R /tmp/hadoop-yarn`然后重新运行
ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.	下载hadoop-windows-native-master.zip在将`hadoop-windows-native-master\2.5.2\bin\VS2013\x64\bin`目录下的文件复制到本地Hadoop目录的`bin`目录下
INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/zhangchen/.staging/job_1457095941103_0036 Exception in thread “main” org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/zhangchen/.staging/job_1457095941103_0036/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
There are 0 datanode(s) running and no node(s) are excluded in this operation.
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:	`stop-all.sh`删除namenode和所有datanode中`/home/hadoop/hd_space/hdfs/`目录下的所有文件，然后`htfs namenode -format` 接着`start-dfs.sh`、`start-yarn.sh`
Failed to APPEND_FILE test.txt for DFSClient_NONMAPREDUCE_-656499820_1 on 127.0.0.1 because lease recovery is in progress. Failed to APPEND_FILE test1.txt for DFSClient_NONMAPREDUCE_1211927326_1 on 127.0.0.1 because this file lease is currently owned by DFSClient_NONMAPREDUCE_-90637176_1 on 127.0.0.1	lease hardlimit过期时间为一小时，要么干等一小时，要么手动删除吧，但下次记住把outputstream close掉吧
Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.

ssh localhost need password in mac
reason: ~/.ssh/config has specified an IdentityFile other than ~/.ssh/id_rsa
solution: just add a new line in the end of config file IdentityFile ~/.ssh/id_rsa

Directory /private/tmp/hadoop-zhangchen/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
Reason: I followed the hadoop setting up guide from it’s officical document, this guide hasn’t tell me to set a name and data directory expicitly, so it defaults to /tmp dir in my mac. after a reboot, everything goes wrong.
Solution:
add following config to etc/hadoop/core-site.xml

<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/share/data/hadoop</value>
 </property>

re-format the namenode(DANGEROUS!!!)
bin/hdfs namenode -format
restart hdfs
sbin/start-dfs.sh

ssh: Could not resolve hostname 186.3.168.191.isp.timbrasil.com.br: nodename nor servname provided, or not known

solution: just fix your hostname and add it to /etc/hosts
e.g.(on mac)
scutil --set HostName mbp-zc
then add 127.0.0.1 mbp-zc to /etc/hosts

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

java.io.IOException: Incompatible clusterIDs in /private/tmp/hadoop-zhangchen/dfs/data: namenode clusterID = CID-92121909-a8e9-4802-8021-0b52ec8ba1d6; datanode clusterID = CID-c7c456df-8de1-44eb-aa0a-6da8fd5e7aa0
solution: just fix it

9000 端口 connection refused
在cenos7中关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

hadoop fs -ls / report error

GSSException: no valid credentials provided (Mechanism level: Failed to find any kerberos tgt)

reason: kereros is a network authentication protocol. Hadoop uses Kerberos as the basis for strong authentication and identity propagation for both user and services.

solution:
run this command, replace the <ktab file> with your ktab fie path, and <principle> with your principle name
kinit -k -t <ktab file> <principle>

Exception in thread “main” java.io.IOException: Incomplete HDFS URI, no host: hdfs:///user/ds/Wikipedia
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:143)

solution: if running code in intellij idea, just copy core-site.xml to the resource directory:
cp $HADOOP_HOME/etc/hadoop/core-site.xml .

which should include the defaultFS host config

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    ...
</configuration>

run sbin/start-yarn.sh report error:
localhost: ERROR: Cannot set priority of nodemanager process 16149

this error occurred after I add following config to yarn-site.xml

<property>
    <name>yarn.nodemanager.resource-plugins</name>
    <value>yarn.io/gpu</value>
  </property>

when I checked the nodemanager log, I found the root cause is

2021-06-15 10:15:58,500 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer: Failed to
 locate GPU device discovery binary, tried paths: [/bin/nvidia-smi, /usr/bin/nvidia-smi, /usr/local/nvidia/bin/nvidia-smi]! Please 
double check the value of config yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables. Using default binary: nvidia-
smi
2021-06-15 10:15:58,501 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnException: Failed to find GPU discovery executable, please double check yarn.nodemanager.reso
urce-plugins.gpu.path-to-discovery-executables setting. Also tried to find the executable in the default directories: [/usr/bin, /b
in, /usr/local/nvidia/bin]
...
2021-06-15 10:15:58,502 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to find GPU discovery executable, please double check yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables setting. Also tried to find the executable in the default directories: [/usr/bin, /bin, /usr/local/nvidia/bin]

because I am testing the gpu scheduling of yarn on my laptop without nvidia gpu, I have to remove above resource plugin config to make yarn startup normally.

another work around is that we can cheat YARN with an mocked output

cat /usr/local/nvidia/bin/nvidia-smi

#!/bin/sh
sample_output=/Users/zhangchen/code/hadoop-rel-release-3.2.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
cat ${sample_output}

ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Failed to bootstrap configured resource subsystems!

因为yarn非docker模式要使用cgroup进行资源隔离，而mac上没有cgroup，所以只能在centos上操作了。

chansonzhang

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Hadoop Troubleshooting

错误：org.apache.hadoop.security.AccessControlException: Permission denied原因：当前用户没有权限解决方案：在hdfs-site.xml中加入&amp;amp;amp;lt;property&amp;amp;amp;gt; &amp;amp;amp;lt;name&amp;amp;amp;gt;dfs.permissions&amp;amp;amp;lt;/name&amp
复制链接

扫一扫