hadoop错误问题总结

1.namenode循环报错如下:
2012-08-21 09:20:24,486 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log, edits.new files already exists in all healthy directories:
/data/work/hdfs/name/current/edits.new
/backup/current/edits.new
2012-08-21 09:20:25,357 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.net.ConnectException: Connection refused
2012-08-21 09:20:25,357 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.net.ConnectException: Connection refused
2012-08-21 09:20:25,359 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.net.ConnectException: Connection refused

secondarynamenode也有相关错误。
搜到一个说法原因是:
With 1.0.2, only one checkpoint process is executed at a time. When the namenode gets an overlapping checkpointing request, it checks edit.new in its storage directories. If all of them have this file, namenode concludes the previous checkpoint process is not done yet and prints the warning message you've seen.

这样的话如果你确保edits.new文件是之前错误操作残留下的没有用的文件的话,那么可以删掉,检测之后是否还有这样的问题。
另外请确保namenode的hdfs-site.xml的配置有如下项:
<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:50090</value>
</property>

将上述的0.0.0.0修改为你部署secondarynamenode的主机名
secondarynamenode的hdfs-site.xml有如下项:
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>

将上述的0.0.0.0修改为你部署namenode的主机名


2.namenode循环报错如下:
2014-06-18 09:28:12,280 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hive1 cause:java.io.IOException: File /tmp/hive-hive1/hive_2014-06-18_09-28-11_245_4808015493735494213-1/-mr-10003/b47c8250-3fea-4b5f-b015-3a62cf97ce53/reduce.xml could only be replicated to 0 nodes, instead of 1
2014-06-18 09:28:12,280 INFO org.apache.hadoop.ipc.Server: IPC Server handler 33 on 54310, call addBlock(/tmp/hive-hive1/hive_2014-06-18_09-28-11_245_4808015493735494213-1/-mr-10003/b47c8250-3fea-4b5f-b015-3a62cf97ce53/reduce.xml, DFSClient_NONMAPREDUCE_-430436880_1, null) from 192.168.49.94:45419: error: java.io.IOException: File /tmp/hive-hive1/hive_2014-06-18_09-28-11_245_4808015493735494213-1/-mr-10003/b47c8250-3fea-4b5f-b015-3a62cf97ce53/reduce.xml could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hive-hive1/hive_2014-06-18_09-28-11_245_4808015493735494213-1/-mr-10003/b47c8250-3fea-4b5f-b015-3a62cf97ce53/reduce.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
2014-06-18 09:28:12,286 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Increasing replication for /tmp/hive-hive1/hive_2014-06-18_09-28-11_245_4808015493735494213-1/-mr-10003/b47c8250-3fea-4b5f-b015-3a62cf97ce53/reduce.xml. New replication is 10
2014-06-18 09:28:12,456 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 3 to reach 3
Not able to place enough replicas


同时查看jobtracker的日志,也是循环抛出异常,日志如下:
2014-06-18 09:49:20,725 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOEx ception: File /home/sa/hadoop-1.2.1/mapred/sys/jobtracker.info could only be replicated to 0 nodes, instead of 1
9916 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
9917 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
9918 at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
9919 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
9920 at java.lang.reflect.Method.invoke(Method.java:606)
9921 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
9922 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
9923 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
9924 at java.security.AccessController.doPrivileged(Native Method)
9925 at javax.security.auth.Subject.doAs(Subject.java:415)
9926 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
9927 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
9928
9929 at org.apache.hadoop.ipc.Client.call(Client.java:1113)
9930 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
9931 at com.sun.proxy.$Proxy7.addBlock(Unknown Source)
9932 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
9933 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
9934 at java.lang.reflect.Method.invoke(Method.java:606)
9935 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
9936 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
9937 at com.sun.proxy.$Proxy7.addBlock(Unknown Source)
9938 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
9939 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
9940 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
9941 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)

解决方案:网上搜了的方案都是了,结果还是不行。仔细看日志Not able to place enough replicas,于是查看节点的空间,发现都还有一些空间。怀疑是配置文件block设置过大的原因,于是查看hadoop的配置文件,配置如下:
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
<property>
block块大小是256M也不大,应该不会出现空间不足的问题。另外还有一个配置
<property>
<name>dfs.datanode.du.reserved</name>
<value>10474836480</value>
</property>
配置磁盘预留空间比较大,于是干掉。另外为了节省空间将dfs.replication设置为1.重启集群错误问题消失。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值