Hadoop配置:file could only be replicated to 0 nodes, instead of 1问题解决方法

今天执行bin/hadoop fs -copyFromLocal /Users/hadoop/Weibo/input/FavoriteFile.txt /user/hadoop/FavoriteFile.txt时候碰到了以下问题:

file “*********”could only be replicated to 0 nodes, instead of 1

然后check了一下dfshealth,发现是datanode挂掉了。


然后google了下官方给的建议,发现没有提供好的方案。

http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

A common message people see is "could only be replicated to 0 nodes, instead of ...".

What does this mean? It means that the Block Replication mechanism of HDFS could not make any copies of a file it wanted to create. This can be caused by

  • No DataNode instances being up and running. Action: look at the servers, see if the processes are running.

  • The DataNode instances cannot talk to the server, through networking or Hadoop configuration problems. Action: look at the logs of one of the DataNodes.

  • Your DataNode instances have no hard disk space in their configured data directories. Action: look at the dfs.data.dir list in the node configurations, verify that at least one of the directories exists, and is writeable by the user running the Hadoop processes. Then look at the logs.

  • Your DataNode instances have run out of space. Look at the disk capacity via the Namenode web pages. Delete old files. Compress under-used files. Buy more disks for existing servers (if there is room), upgrade the existing servers to bigger drives, or add some more servers.

  • The reserved space for a DN (as set in dfs.datanode.du.reserved is greater than the remaining free space, so the DN thinks it has no free space
  • You may also get this message due to permissions, eg if JT can not create jobtracker.info on startup.

This is not a problem in Hadoop, it is a problem in your cluster that you are going to have to fix on your own. Sorry.


追究源码,NameNode身边的 ReplicationTargetChooser#isGoodTarget方法给出了说明: 

Java代码   收藏代码
  1. // check the communication traffic of the target machine  
  2.     if (considerLoad) {  
  3.       double avgLoad = 0;  
  4.       int size = clusterMap.getNumOfLeaves();  
  5.       if (size != 0) {  
  6.         avgLoad = (double)fs.getTotalLoad()/size;  
  7.       }  
  8.       if (node.getXceiverCount() > (2.0 * avgLoad)) {  
  9.         logr.debug("Node "+NodeBase.getPath(node)+  
  10.                   " is not chosen because the node is too busy");  
  11.         return false;  
  12.       }  
  13. }  

  isGoodTarget方法对预选的数据节点做出了终审判决,然而除了磁盘空间可利用外,另外需稳定在一定的压力之下,这里的标准是Datanode中XceiverServer所接受的连接数,我们在使用Hadoop时,这个值很容易被忽略,因为这个值不方便被统计到。上段代码说明当前节点的连接数,不得大于集群所有节点平均连接数的两倍。为了使我的系统尽量独力,我在dfshealth.jsp 页面把每台节点的连接数打印了出来,结果发现正好符合上述代码的判断。 

 

比如ReplicationTargetChooser选择了node13,那么即使node13有大片的空间可写,最终也会被上述代码认为是一个不符合条件的节点。 
Java代码   收藏代码
  1. 157 > ((27 + 45 + 44 + 54 + 35 + 50 + 104 + 55 + 73 + 69 + 157 + 146)/12 * 2)  

这样的异常,一般解决办法是添加节点,或是在节点允许的情况下,对这段算法进行上调。 

后来手动将/hadoop/tmp/下面所有文件都删除(不需要删除文件夹),并重启了hadoop,通过jps发现namenode又起来了。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值