首先,查看namenode日志
发现大量的相同报错信息
ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode:
Data node DatanodeRegistration(192.168.216.102:9866, datanodeUuid=d789da46-1139-4fbe-94a6-a4efdb7ae1dc,
infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-0b72ffbb-0179-41ee-905e-fd097c164726;nsid=327303387;c=1602073784643)
is attempting to report storage ID d789da46-1139-4fbe-94a6-a4efdb7ae1dc. Node 192.168.216.104:9866 is expected to serve this storage.
ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode:
Data node DatanodeRegistration(192.168.216.104:9866, datanodeUuid=d789da46-1139-4fbe-94a6-a4efdb7ae1dc
, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-0b72ffbb-0179-41ee-905e-fd097c164726;nsid=327303387;c=1602073784643)
is attempting to report storage ID d789da46-1139-4fbe-94a6-a4efdb7ae1dc. Node 192.168.216.102:9866 is expected to serve this storage.
可能是之前数次将namenode格式化,还是未知的误操作,导致了102节点和104节点的/opt/module/hadoop-3.1.3/data/dfs/data/current/VERSION文件中datanode的storageID和datanodeUuid是一样的
最终,
2020-12-23 13:02:14,963 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.216.104:9866
2020-12-23 13:02:14,963 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.216.102:9866
2020-12-23 13:02:14,886 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.216.102:9866
2020-12-23 13:02:14,887 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.216.104:9866
导致102和104只能同时存在一个!
解决方案:修改$HADOOP_HOME/data/dfs/data/current下的VERSION文件中的storageID和datanodeUuid使得102和104不一样即可
#Wed Dec 23 15:25:55 CST 2020
storageID=DS-1c05934e-0889-4d3d-9adb-130ee65c7e2d
clusterID=CID-0b72ffbb-0179-41ee-905e-fd097c164726
cTime=0
datanodeUuid=d789da46-1139-4fbe-94a6-a4efdb7ae1dd
storageType=DATA_NODE
layoutVersion=-57