今天启动Hadoop2.2.0集群后,发现datanode进程没启动,查看日志发现如下报错:
2014-05-15 14:46:50,788 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-2020521428-192.168.0.166-1397704506565 (storage id DS-432251277-192.168.0.166-50010-1397704557407) service to singlehadoop/192.168.0.166:8020
java.io.IOException:
Incompatible clusterIDs in /home/casliyang/hadoop2/hadoop-2.2.0/metadata/data:
namenode clusterID = CID-2cc69ada-3730-4c79-8384-c725fa85859a
;
datanode clusterID = CID-3e649eb6-cdb3-4a0c-aad8-5948c66bf282
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:722)
上网查了下,有些文章说的解决办法是删掉数据文件,格式化,重启集群,但这办法实在太暴力,根本无法在生产环境实施,所以还是参考另一类文章的解决办法,修改clusterID:
step1:
查看hdfs-site.xml,找到存namenode元数据和datanode元数据的路径:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/casliyang/hadoop2/hadoop-2.2.0/metadata/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/casliyang/hadoop2/hadoop-2.2.0/metadata/data</value>
</property>
step2:
打开namenode路径下的current/VERSION文件:
casliyang@singlehadoop:~/hadoop2/hadoop-2.2.0/metadata/name/current$ cat VERSION
#Thu May 15 14:46:39 CST 2014
namespaceID=1252551786
clusterID=
CID-2cc69ada-3730-4c79-8384-c725fa85859a
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2020521428-192.168.0.166-1397704506565
layoutVersion=-47
打开datanode路径下的current/VERSION文件:
casliyang@singlehadoop:~/hadoop2/hadoop-2.2.0/metadata/data/current$ cat VERSION
#Thu Apr 17 11:15:57 CST 2014
storageID=DS-432251277-192.168.0.166-50010-1397704557407
clusterID=
CID-3e649eb6-cdb3-4a0c-aad8-5948c66bf282
cTime=0
storageType=DATA_NODE
layoutVersion=-47
我们可以看到,name节点元数据的clusterID和data节点元数据的clusterID不一致了,并且和报错信息完全对应上!
接下来
将data节点的clusterID修改成和name节点的clusterID一致,重启集群即可。