HADOOP常见问题总结

1:
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
Answer:
程序里面需要打开多个文件,进行剖析,系统一般默认数量是1024,(用ulimit -a可以看到)对于正常运用是够了,但是对于程序来讲,就太少了。
修改办法:
修改2个文件。
/etc/security/limits.conf
vi /etc/security/limits.conf
加上:

  • soft nofile 102400
  • hard nofile 409600
    $cd /etc/pam.d/
    $sudo vi login
    添加 session required /lib/security/pam_limits.so
    针对第一个我纠正下答案:
    这是reduce预处理阶段shuffle时获取已完成的map的输出失败次数超过上限造成的,上限默认为5。引起此的方式可能会有很多种,比如网络连接不正常,连接超时,带宽较差以及端口阻塞等。。。通常框架内网络情况较好是不会出现此错误的。
    2:
    Too many fetch-failures
    Answer:
    出现这个主要是结点间的连通不够全面。
  1. 检查 、/etc/hosts
    要求高手本机ip 对应 服务器名
    要求高手要包含所有的服务器ip + 服务器名
  2. 检查 .ssh/authorized_keys
    要求高手包含所有服务器(包括其自身)的public key
    3:
    处理速度特别的慢 出现map很快 但是reduce很慢 而且反复出现 reduce=0%
    Answer:
    结合第二点,然后
    修改 conf/hadoop-env.sh 中的export HADOOP_HEAPSIZE=4000
    4:
    能够开启datanode,但无法访问,也无法结束的错误
    在重新格式化一个新的分布式文件时,需要将你NameNode上所配置的dfs.name.dir这一namenode用来存放NameNode 持久存储名字空间及事务日志的本地文件系统路径删除,同时将各DataNode上的dfs.data.dir的路径 DataNode 存放块数据的本地文件系统路径的目录也删除。如本此配置就是在NameNode上删除/home/hadoop/NameData,在DataNode上删除/home/hadoop/DataNode1和/home/hadoop/DataNode2。这是因为Hadoop在格式化一个新的分布式文件系统时,每个存储的名字空间都对应了建立时间的那个版本(可以查看/home/hadoop /NameData/current目录下的VERSION文件,上面记录了版本信息),在重新格式化新的分布式系统文件时,最好先删除NameData 目录。必须删除各DataNode的dfs.data.dir。这样才可以使namedode和datanode记录的信息版本对应。
    注意:删除是个很危险的动作,不能确认的情况下不能删除!!做好删除的文件等通通备份!!
    5:
    java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_log/src_20090724_log
    出现这种情况大多是结点断了,没有连接上。
    6:
    java.lang.OutOfMemoryError: Java heap space
    出现这种出错,明显是jvm内存不够得原因,要修改所有的datanode的jvm内存大小。
    Java -Xms1024m -Xmx4096m
    一般jvm的最大内存运用应该为总内存大小的一半,我们运用的8G内存,所以设置为4096m,这一值可能依旧不是最优的值。

7:
Namenode in safe mode
解决方法
bin/hadoop dfsadmin -safemode leave
8:
java.net.NoRouteToHostException: No route to host
j解决方法:
sudo /etc/init.d/iptables stop
9:
更改namenode后,在hive中运行select 依旧指向之前的namenode地址
这是因为:When youcreate a table, hive actually stores the location of the table (e.g.
hdfs://ip:port/user/root/…) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive’s metastore is still pointing to the locations within the old
cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master
所以要将metastore中的之前出现的namenode地址全部更换为现有的namenode地址

10:
[color=]Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put).
解决方法:
Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.
If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).
If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin’s df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.
11:Your DataNodes won’t start, and you see something like this in logs/datanode:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
原因:
Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.
解决方法:
You need to do something like this:
bin/stop-all.sh
rm -Rf /tmp/hadoop-your-username/*
bin/hadoop namenode -format
12:
[color=]You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won’t work.
原因:
You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.
解决方法:
Use absolute paths like this from the tutorial:
bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar
-mapper $HOME/proj/hadoop/multifetch.py
-reducer $HOME/proj/hadoop/reducer.py
-input urls/*
-output titles

13:
ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ““PARTITIONS”” in Catalog “” Schema “”. JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “org.jpox.autoCreateTables”
原因:就是因为
在 hive-default.xml 里把 org.jpox.fixedDatastore 设置成 true 了
14:
把IP换成主机名,datanode 挂不上
解决方法:
把temp文件删除,重启hadoop集群就行了
是因为多次部署,造成temp文件与namenode不一致的原因

15:
从url中解析出中文,但hadoop中打印出来仍是乱码?我们曾经以为hadoop是不支持中文的,后来经过查看源代码,发现hadoop仅仅是不支持以gbk格式输出中文而己。
这是TextOutputFormat.class中的代码,hadoop默认的输出都是继承自FileOutputFormat来的,FileOutputFormat的两个子类一个是基于二进制流的输出,一个就是基于文本的输出TextOutputFormat。
public class TextOutputFormat extends FileOutputFormat {
protected static class LineRecordWriter
implements RecordWriter {
private static final String utf8 = “UTF-8″;//这里被写死成了utf-8
private static final byte[] newline;
static {
try {
newline = “\n”.getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
}
}

public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
this.out = out;
try {
this.keyValueSeparator = keyValueSeparator.getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
}
}

private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());//这里也需要修改
} else {
out.write(o.toString().getBytes(utf8));
}
}

}
可以看出hadoop默认的输出写死为utf-8,因此如果decode中文正确,那么将Linux客户端的character设为utf-8是可以看到中文的。因为hadoop用utf-8的格式输出了中文。
因为大多数数据库是用gbk来定义字段的,如果想让hadoop用gbk格式输出中文以兼容数据库咋办吗?
我们可以定义一个新的类:
public class GbkOutputFormat extends FileOutputFormat {
protected static class LineRecordWriter
implements RecordWriter {
//写成gbk即可
private static final String gbk = “gbk”;
private static final byte[] newline;
static {
try {
newline = “\n”.getBytes(gbk);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
}
}

public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
this.out = out;
try {
this.keyValueSeparator = keyValueSeparator.getBytes(gbk);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
}
}

private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
// Text to = (Text) o;
// out.write(to.getBytes(), 0, to.getLength());
// } else {
out.write(o.toString().getBytes(gbk));
}
}

}
然后在mapreduce代码中加入conf1.setOutputFormat(GbkOutputFormat.class)
即可以gbk格式输出中文。
16:
某次正常运行mapreduce例子时某次正常运行mapreduce例子时,抛出错误
java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…
at org.apache.hadoop.dfs.DFSClient D F S O u t p u t S t r e a m . p r o c e s s D a t a n o d e E r r o r ( D F S C l i e n t . j a v a : 2158 ) a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t DFSOutputStream.processDatanodeError(DFSClient.java:2158) at org.apache.hadoop.dfs.DFSClient DFSOutputStream.processDatanodeError(DFSClient.java:2158)atorg.apache.hadoop.dfs.DFSClientDFSOutputStream.access 1400 ( D F S C l i e n t . j a v a : 1735 ) a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t 1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient 1400(DFSClient.java:1735)atorg.apache.hadoop.dfs.DFSClientDFSOutputStream D a t a S t r e a m e r . r u n ( D F S C l i e n t . j a v a : 1889 ) j a v a . i o . I O E x c e p t i o n : C o u l d n o t g e t b l o c k l o c a t i o n s . A b o r t i n g … a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t DataStreamer.run(DFSClient.java:1889) java.io.IOException: Could not get block locations. Aborting… at org.apache.hadoop.dfs.DFSClient DataStreamer.run(DFSClient.java:1889)java.io.IOException:Couldnotgetblocklocations.Abortingatorg.apache.hadoop.dfs.DFSClientDFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 1400 ( D F S C l i e n t . j a v a : 1735 ) a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t 1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient 1400(DFSClient.java:1735)atorg.apache.hadoop.dfs.DFSClientDFSOutputStream$DataStreamer.run(DFSClient.java:1889)
经查明,原因是linux机器打开了过多的文件导致。用命令ulimit -n可以发现linux默认的文件打开数目为1024,修改/ect/security/limit.conf,增加hadoop soft 65535
再重新运行程序(最好所有的datanode都修改),解决
17:
运行一段时间后hadoop不能stop-all运行一段时间后hadoop不能stop-all.sh的,显示出错
no tasktracker to stop ,no datanode to stop
的原因是hadoop在stop的时候依据的是datanode上的mapred和dfs进程号。而默认的进程号保存在/tmp下,linux默认会每隔一段时间(一般是一个月或者7天左右)去删除这个目录下的文件。因此删掉hadoop-hadoop-jobtracker.pid和hadoop-hadoop-namenode.pid两个文件后,namenode自然就找不到datanode上的这两个进程了。
在配置文件中的export HADOOP_PID_DIR可以解决这个

18:
Incompatible namespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID = 405233244966; datanode namespaceID = 33333244
原因:
在每次运行hadoop namenode -format时,都会为NameNode生成namespaceID,,但是在hadoop.tmp.dir目录下的DataNode还是保留上次的namespaceID,因为namespaceID的不一致,而导致DataNode无法开启,所以只要在每次运行hadoop namenode -format之前,先删除hadoop.tmp.dir目录就可以开启成功。请注意是删除hadoop.tmp.dir对应的本地目录,而不是HDFS目录。
19:
bin/hadoop jps后报如下出错:
Exception in thread “main” java.lang.NullPointerException
at sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127)
at sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133)
at sun.tools.jps.Jps.main(Jps.java:45)
原因为:
系统根目录/tmp文件夹被删除了。重新建立/tmp文件夹即可。
bin/hive
中出现 unable to create log directory /tmp/…也可能是这个原因

//以上是来自网上很多前辈的总结

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值