最近在VMware上部署了Hadoop2的集群,datanode启动后经常自动就挂掉了,从网上搜到了原因,在这里记一下。
使用bin/hadoop dfsadmin -report查看系统状态
[root@crxy1 hadoop]# bin/hadoop dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)
在datanode上查看日志,报如下错:
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/tmp/dfs/data: namenode clusterID = CID-9cf35cc0-1da5-4e44-9770-b0d3911f9426; datanode clusterID = CID-18bf8735-ef35-4fb3-be4f-62258dc5aa09
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:477)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:226)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:254)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:974)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:945)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:278)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
at java.lang.Thread.run(Thread.java:745)
clusterIDs 与namenode的不一致,是因为我多次格式化了namenode,导致namenode的clusterIDs 与datanode上的不一致,
解决办法:
1、在hdfs-site.xml配置文件中,配置了dfs.namenode.name.dir,在master中,该配置的目录下有个current文件夹,里面有个VERSION文件,内容如下:
#Thu Mar 13 10:51:23 CST 2014
namespaceID=1615021223
clusterID=CID-8e201022-6faa-440a-b61c-290e4ccfb006
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1257313099-10.10.208.38-1394679083528
layoutVersion=-40
2、在core-site.xml配置文件中,配置了hadoop.tmp.dir(我的为/usr/local/hadoop/tmp/dfs/data/current/),在slave中,该配置的目录下有个dfs/data/current目录,里面也有一个VERSION文件,内容
#Wed Mar 12 17:23:04 CST 2014
storageID=DS-414973036-10.10.208.54-50010-1394616184818
clusterID=clustername
cTime=0
storageType=DATA_NODE
layoutVersion=-40
3、一目了然,两个内容不一样,导致的。删除slave中的错误内容,重启,搞定!
参考资料:http://blog.csdn.net/wodeyuer125/article/details/21666937
参考资料:http://blog.csdn.net/wanghai__/article/details/5752199
使用bin/hadoop dfsadmin -report查看系统状态
[root@crxy1 hadoop]# bin/hadoop dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)