错误:org.apache.hadoop.security.AccessControlException: Permission denied
原因:当前用户没有权限
解决方案:在hdfs-site.xml中加入
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
错误:org.apache.hadoop.hdfs.server.namenode.SafeModeException
原因:NameNode在启动的时候首先进入安全模式
解决:如果是此种原因,等启动完成后再重新运行程序
在windows下访问远程服务器上的hadoop,报错
problem | solution |
---|---|
INFO client.RMProxy: Connecting to ResourceManager at namenode:8032 Exception in thread “main” java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: “namenode”:8032; java.net.UnknownHostException; | 将yarn-site.xml 中yarn.resourcemanager.hostname 的值由hostname 改为可访问的IP地址,如果hadoop部署在NAT的子网内,则还需进行端口映射 |
Exception in thread “main” org.apache.hadoop.security.AccessControlException: Permission denied: user=xxx, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------ | hadoop fs -rm -R /tmp/hadoop-yarn 然后重新运行 |
ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. | 下载hadoop-windows-native-master.zip在将hadoop-windows-native-master\2.5.2\bin\VS2013\x64\bin 目录下的文件复制到本地Hadoop目录的bin 目录下 |
INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/zhangchen/.staging/job_1457095941103_0036 Exception in thread “main” org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/zhangchen/.staging/job_1457095941103_0036/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. | |
There are 0 datanode(s) running and no node(s) are excluded in this operation. | |
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: | stop-all.sh 删除namenode和所有datanode中/home/hadoop/hd_space/hdfs/ 目录下的所有文件,然后htfs namenode -format 接着start-dfs.sh 、start-yarn.sh |
Failed to APPEND_FILE test.txt for DFSClient_NONMAPREDUCE_-656499820_1 on 127.0.0.1 because lease recovery is in progress. Failed to APPEND_FILE test1.txt for DFSClient_NONMAPREDUCE_1211927326_1 on 127.0.0.1 because this file lease is currently owned by DFSClient_NONMAPREDUCE_-90637176_1 on 127.0.0.1 | lease hardlimit过期时间为一小时,要么干等一小时,要么手动删除吧,但下次记住把outputstream close掉吧 |
Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. |
ssh localhost need password in mac
reason: ~/.ssh/config has specified an IdentityFile other than ~/.ssh/id_rsa
solution: just add a new line in the end of config file IdentityFile ~/.ssh/id_rsa
Directory /private/tmp/hadoop-zhangchen/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
Reason: I followed the hadoop setting up guide from it’s officical document, this guide hasn’t tell me to set a name and data directory expicitly, so it defaults to /tmp dir in my mac. after a reboot, everything goes wrong.
Solution:
add following config to etc/hadoop/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/share/data/hadoop</value>
</property>
re-format the namenode(DANGEROUS!!!)
bin/hdfs namenode -format
restart hdfs
sbin/start-dfs.sh
ssh: Could not resolve hostname 186.3.168.191.isp.timbrasil.com.br: nodename nor servname provided, or not known
solution: just fix your hostname and add it to /etc/hosts
e.g.(on mac)
scutil --set HostName mbp-zc
then add 127.0.0.1 mbp-zc
to /etc/hosts
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
java.io.IOException: Incompatible clusterIDs in /private/tmp/hadoop-zhangchen/dfs/data: namenode clusterID = CID-92121909-a8e9-4802-8021-0b52ec8ba1d6; datanode clusterID = CID-c7c456df-8de1-44eb-aa0a-6da8fd5e7aa0
solution: just fix it
9000 端口 connection refused
在cenos7中关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
hadoop fs -ls /
report error
GSSException: no valid credentials provided (Mechanism level: Failed to find any kerberos tgt)
reason: kereros is a network authentication protocol. Hadoop uses Kerberos as the basis for strong authentication and identity propagation for both user and services.
solution:
run this command, replace the <ktab file>
with your ktab fie path, and <principle>
with your principle name
kinit -k -t <ktab file> <principle>
Exception in thread “main” java.io.IOException: Incomplete HDFS URI, no host: hdfs:///user/ds/Wikipedia
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:143)
solution: if running code in intellij idea, just copy core-site.xml to the resource
directory:
cp $HADOOP_HOME/etc/hadoop/core-site.xml .
which should include the defaultFS host config
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
...
</configuration>
run sbin/start-yarn.sh
report error:
localhost: ERROR: Cannot set priority of nodemanager process 16149
this error occurred after I add following config to yarn-site.xml
<property>
<name>yarn.nodemanager.resource-plugins</name>
<value>yarn.io/gpu</value>
</property>
when I checked the nodemanager log, I found the root cause is
2021-06-15 10:15:58,500 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer: Failed to
locate GPU device discovery binary, tried paths: [/bin/nvidia-smi, /usr/bin/nvidia-smi, /usr/local/nvidia/bin/nvidia-smi]! Please
double check the value of config yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables. Using default binary: nvidia-
smi
2021-06-15 10:15:58,501 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnException: Failed to find GPU discovery executable, please double check yarn.nodemanager.reso
urce-plugins.gpu.path-to-discovery-executables setting. Also tried to find the executable in the default directories: [/usr/bin, /b
in, /usr/local/nvidia/bin]
...
2021-06-15 10:15:58,502 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to find GPU discovery executable, please double check yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables setting. Also tried to find the executable in the default directories: [/usr/bin, /bin, /usr/local/nvidia/bin]
because I am testing the gpu scheduling of yarn on my laptop without nvidia gpu, I have to remove above resource plugin config to make yarn startup normally.
another work around is that we can cheat YARN with an mocked output
cat /usr/local/nvidia/bin/nvidia-smi
#!/bin/sh
sample_output=/Users/zhangchen/code/hadoop-rel-release-3.2.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
cat ${sample_output}
ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Failed to bootstrap configured resource subsystems!
因为yarn非docker模式要使用cgroup进行资源隔离,而mac上没有cgroup,所以只能在centos上操作了。