hadoop伪分布式错误总结

最近在做hadoop平台实现if-tef算法,遇到些错误,整理了一下。

先说环境:ubuntu 12.4, hadoop1.1.2伪分布式,操作文件个数4000多个,然后输出4000多个文件

第一个错误:

14/03/26 23:47:11 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/A200905-1821-r-00000 could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736)
    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at $Proxy1.addBlock(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
    at $Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989)

14/03/26 23:47:11 WARN hdfs.DFSClient: Error Recovery for block blk_2691864029834286785_35283 bad datanode[0] nodes == null
14/03/26 23:47:11 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/201312-330-r-00000 could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736)
    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at $Proxy1.addBlock(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
    at $Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989)

14/03/26 23:47:11 WARN hdfs.DFSClient: Error Recovery for block blk_-2788550865263927851_35283 bad datanode[0] nodes == null
14/03/26 23:47:11 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/201312-330-r-00000" - Aborting... 

namenode log中错误:

ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:zcf cause:org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201303-897-r-00000 File does not exist. Holder DFSClient_NONMAPREDUCE_-827696163_1 does not have any open files. 2014-03-26 23:47:36,972 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000, call addBlock(/user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201303-897-r-00000, DFSClient_NONMAPREDUCE_-827696163_1, null) from 127.0.0.1:45363: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201303-897-r-00000 File does not exist. Holder DFSClient_NONMAPREDUCE_-827696163_1 does not have any open files. org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201303-897-r-00000 File does not exist. Holder DFSClient_NONMAPREDUCE_-827696163_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1720) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1711) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1619) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) 2014-03-26 23:47:36,973

datanode log:

 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-301331302-127.0.1.1-50010-1394090873214, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Too many open files at java.io.

 

原因为ubuntu的防火墙没关闭。

执行sudo ufw disable

执行bin/stop-all.sh

执行bin/start-all.sh

第二个错误:

14/03/27 00:55:23 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201008-447-r-00000 File does not exist. Holder DFSClient_NONMAPREDUCE_-1418615021_1 does not have any open files.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1720)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1711)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1619)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at $Proxy1.addBlock(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
    at $Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989)

14/03/27 00:55:23 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
14/03/27 00:55:23 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201008-447-r-00000" - Aborting...
14/03/27 00:55:23 ERROR hdfs.DFSClient: Failed to close file /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201008-447-r-00000
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/C201008-447-r-00000 File does not exist. Holder DFSClient_NONMAPREDUCE_-1418615021_1 does not have any open files.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1720)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1711)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1619)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at $Proxy1.addBlock(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
    at $Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989)

 

datanode log:

 

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-301331302-127.0.1.1-50010-1394090873214, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Too many open files at sun.nio.ch.IOUtil.initPipe(Native Method) at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) at java.lang.Thread.run(Thread.java:662) 2014-03-27 00:55:22,067 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_3521061117842042385_39404 src: /127.0.0.1:33711 dest: /127.0.0.1:50010 2014-03-27 00:55:22,064

解决方案(试行待确认)修改任务运行时间或修改hsfs操作文件最大数量

1.

修改 hadoop的配置文件 conf/hdfs-site.xml

 

解决办法,调整xcievers参数

默认是4096,改为8192

vi /home/dwhftp/opt/hadoop/conf/hdfs-site.xml

<property>

<name>dfs.datanode.max.xcievers</name>

<value>8192</value>

</property>

dfs.datanode.max.xcievers 参数说明

一个 Hadoop HDFS Datanode 有一个同时处理文件的上限. 这个参数叫 xcievers (Hadoop的作者把这个单词拼错了). 在你加载之前,先确认下你有没有配置这个文件conf/hdfs-site.xml里面的xceivers参数,至少要有4096: 原配置如下

<property>

<name>dfs.datanode.max.xcievers</name>

<value>4096</value>

</property>

2.

原因:mapred.task.timeout设置时间过短,在200秒左右任务状态没有任何变化,hadoop将该任务kill,并清理临时目录,后续遍找不到临时数据了。

修改参数,原配置
<property>
<name>mapred.task.timeout</name>
<value>200000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
mapred.task.timeout修改称10分钟600000即可。

第三个错误:

14/03/27 21:22:31 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block blk_-2042385761318660035_55886java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at java.io.DataInputStream.readLong(DataInputStream.java:399)
    at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3127)

14/03/27 21:22:31 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block blk_7875931486506714650_55886java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at java.io.DataInputStream.readLong(DataInputStream.java:399)
    at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3127)
去datanode log查看发现有如下错误

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-301331302-127.0.1.1-50010-1394090873214, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Too many open files at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:883) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:482) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:453) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1554) at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1440) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:113) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:304) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) at java.lang.Thread.run(Thread.java:662) 2014-03-27 21:22:32,443 INFO org.mortbay.log: Completed FSVolumeSet.checkDirs. Removed=0volumes. List of current volumes: /usr/local/hadoop/hdfsconf/data/current 2014-03-27 21:22:32,443 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8569122199629694446_55886 received exception java.io.IOException: Too many open files 2014-03-27 21:22:32,443 

报错很明显打开开了太多的文件,而且有时候datanode 会死掉。这个发生的原因是因为很多jobclient打开太多文件,有因为linux上预设单一程序能打开的文件上线是1024个,可以通过ulimit -a命令查看。

 

1. 使用ps -ef |grep java   (java代表你程序,查看你程序进程) 查看你的进程ID,记录ID号,假设进程ID为12

2. 使用:lsof -p 12 | wc -l    查看当前进程id为12的 文件操作状况

 

改正错误的方法是修改/etc/security/limit.conf, 在末尾加上

 

* soft nofile 8192

* hard nofile 8192

因为每次重新登陆时参数起效,所以重新启动,再用ulimit -a察看发现

....

open files                      (-n) 8192
.....

问题解决。

 

看其他资料有说用ulimit -n 4096,不用重启可以修改,但用户重新登陆后会回到默认值。我没试过。

最后发测试改会一二错误的参数没报错,可见第三个才是最终错误

 

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值