登录到节点,服务都正常启动着,尝试把节点的datanode服务重启了下,与当前Namenode通信又正常了。
切换后当前主Namenode日志:
2016-06-18 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-1651824977329564981_102396163 on 172.x.x.x:50010 size 496 does not belong to any file.
2016-06-18 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-1651824977329564981 to 172.x.x.x:50010
2016-06-18 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-8075220412997159517_101639855 on 172.x.x.x:50010 size 496 does not belong to any file.
2016-06-18 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-8075220412997159517 to 172.x.x.x:50010
2016-06-18 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_2245696672665686485_98393215 on 172.x.x.x:50010 size 496 does not belong to any file.
切换后当前从Namenode日志:
2016-06-19 13:03:56,787 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:03:59,788 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:02,787 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:02,799 INFO datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(429)) - Verification succeeded for BP-334845286-172.16.8.4-1418890858930:blk_1161466841_87797749
2016-06-19 13:04:05,788 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:08,787 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:11,788 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:14,787 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:17,787 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:20,788 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:23,787 WARN datanode.DataNode (BPOfferService.java:processCommandFromStandby(675)) - Got a command from standby NN - ignoring command:2
2016-06-19 13:04:24,198 INFO datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(429)) - Verification succeeded for BP-334845286-172.16.8.4-1418890858930:blk_1131917253_58214721
2016-06-19 13:04:25,198 INFO datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(429)) - Verification succeeded for BP-334845286-172.16.8.4-1418890858930:blk_1166753034_93084509
2016-06-19 13:04:25,211 INFO datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(429)) - Verification succeeded for BP-334845286-172.16.8.4-1418890858930:blk_1173433733_99772109
hdfs UI 截图:
所以在HA自动切换后查看下hdfs UI中datanode节点与Namenode通信是否正常,避免出现异常问题。
问题排查完成后建议将Namenode服务切换后原来的主Namenode提供服务,因为你的从standby Namenode节点可能会同时存在其他的服务,避免单台压力。