could only be replicated to 1 nodes instead of minReplication (=2). There are 3 datanode(s) running

在一个bash中写了很多连续的hive脚本,总是跑一段时间后出现
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/mqq/.staging/job_1540025341471_0619/libjars/mail-1.4.1.jar could only be replicated to 1 nodes instead of minReplication (=2).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3112)

下次跑有时候就没事

could only be replicated to 1 nodes instead of minReplication (=2).  There are 3 datanode(s) running and no node(s) are excluded in this operation.

意思是3个datanodes都没有问题,但是数据只能复制到1个datanode,满足不了最低要求2,我理解是分布式文件系统最少是2个copy吧。

 

查看hive.log对应的job错误

2018-10-25 20:13:14,880 WARN  [Thread-209]: hdfs.DFSClient (DFSOutputStream.java:run(557)) - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/mqq/.staging/job_1540025341471_0619/libjars/mail-1.4.1.jar could only be replicated to 1 nodes instead of minReplication (=2).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
 

没有得到更多的信息

查看运行hive脚本的机器上的hadoop log

ries=10, sleepTime=1000 MILLISECONDS)
2018-10-25 20:10:54,813 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: svr1.master.hadoop.xx/10.28.0.23:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRet
ries=10, sleepTime=1000 MILLISECONDS)
2018-10-25 20:10:55,813 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: svr1.master.hadoop.xx/10.28.0.23:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRet
ries=10, sleepTime=1000 MILLISECONDS)
2018-10-25 20:10:56,814 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: svr1.master.hadoop.xx/10.28.0.23:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRet
ries=10, sleepTime=1000 MILLISECONDS)
2018-10-25 20:10:57,319 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.28.0.21:50010, datanodeUuid=10d02f06-b673-4bbb-afe8-c560f4cecdd7, infoPort=50075, infoSecurePort=0, ipcPort
=50020, storageInfo=lv=-56;cid=CID-ac6b0b1a-6fb2-434d-b5d2-d8051a1ba3d5;nsid=1362816802;c=0) Starting thread to transfer BP-843453834-10.28.0.22-1540029719983:blk_1074105134_367475 to 10.28.0.23:50010 
2018-10-25 20:10:57,319 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.28.0.21:50010, datanodeUuid=10d02f06-b673-4bbb-afe8-c560f4cecdd7, infoPort=50075, infoSecurePort=0, ipcPort
=50020, storageInfo=lv=-56;cid=CID-ac6b0b1a-6fb2-434d-b5d2-d8051a1ba3d5;nsid=1362816802;c=0) Starting thread to transfer BP-843453834-10.28.0.22-1540029719983:blk_1074105135_367476 to 10.28.0.23:50010 
2018-10-25 20:10:57,320 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: Transmitted BP-843453834-10.28.0.22-1540029719983:blk_1074105134_367475 (numBytes=6664) to /10.28.0.23:50010
2018-10-25 20:10:57,320 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: Transmitted BP-843453834-10.28.0.22-1540029719983:blk_1074105135_367476 (numBytes=4377) to /10.28.0.23:50010
2018-10-25 20:10:57,814 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: svr1.master.hadoop.xx/10.28.0.23:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRet
ries=10, sleepTime=1000 MILLISECONDS)
2018-10-25 20:10:57,815 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: svr1.master.hadoop.xx/10.28.0.23:9000
 

意思好像是,连接到svr1.master.hadoop.xx/10.28.0.23:9000出现错误。

 

我在当前机器上10.28.0.21用telnet 10.28.0.23 9000,果然没法连,然后telnet 10.28.0.22 9000,连接成功

问题大概找到,查网上答案,是检查slave和master和防火墙。

我在两台机器上service iptables status,都显示没信息,说明没有防火墙。

 

查看哪几台机器是namenodes

hdfs getconf namenodes
svr1.master.hadoop.xx (23) svr2.master.hadoop.xx (22)

发现当前机器不是namenode,23这台机器上是namenode

去这台机器,jps发现namenode未启动

查看namenode的状态,是active还是standby
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2

同样发现未启动

考虑去23这台机器上去运行bash hive脚本,开始发现正常,过段时间又出现

could only be replicated to 1 nodes instead of minReplication (=2).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
 

感觉已疯

直接在这台机器上hadoop namenode,试图启动namenode

发现启动失败

原因:

java.io.IOException: NameNode is not formatted.
 

看网上答案,还是由于namenode和datanode 集群ID什么不一致,混乱就成的,解决方法就是

删原来目录,然后namenode format什么的

有前车之鉴,namenode format会把原来的hive表全删除,不能这么做。

考虑让namenode 目录下的文件在三台机器一致,我就拷贝了22另外一台namenode机器上的current文件夹到23之台机器。

发现

hadoop namenode开始运行了,但是最后一致仍然循环报错

 

18/10/25 23:50:27 INFO ipc.Server: IPC Server handler 14 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from 10.28.0.21:41567 Call#25812 Retry#13: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby
18/10/25 23:50:27 INFO ipc.Client: Retrying connect to server: svr2.master.hadoop.xx/10.28.0.22:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/10/25 23:50:28 INFO ipc.Server: IPC Server handler 13 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from 10.28.0.21:41567 Call#25813 Retry#7: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby
18/10/25 23:50:28 INFO ipc.Client: Retrying connect to server: svr2.master.hadoop.xx/10.28.0.22:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/10/25 23:50:29 INFO ipc.Client: Retrying connect to server: svr2.master.hadoop.xx/10.28.0.22:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/10/25 23:50:30 INFO ipc.Client: Retrying connect to server: svr2.master.hadoop.xx/10.28.0.22:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/10/25 23:50:31 INFO ipc.Client: Retrying connect to server: svr2.master.hadoop.xx/10.28.0.22:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/10/25 23:50:32 INFO ipc.Server: IPC Server handler 13 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from 10.28.0.23:46477 Call#20330 Retry#11: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby
18/10/25 23:50:32 INFO ipc.Client: Retrying connect to server: svr2.master.hadoop.xx/10.28.0.22:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

 

看起来还是到22这台机器的连接有问题,telnet 22 9000这台机器不通。

~~~~~~~不知道怎么弄通。

 

算了

 

minreplication为2,我现在只能是1,那么我改成minreplication=1如何

然后在hadoop的conf文件夹下找哪个文件有replication

fgrep  "replication" -n *.xml 

发现hdfs-site.xml有

<name>dfs.replication.min</name> 

修改值为1

三台机器都改

 

然后stop-all.sh, start-all.sh重启集群

 

再执行bash hive

成功

 

 


 

 

 

 


                                                                                                            

 

 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值