背景:
集群升级的时候,触发了一个UnderReplicationBlocks数的告警。
查看一下NameNode的监控,发现14:45分左右有几个队列的值突增,随后降下去了。
例如:pendingReconstruction、neededReconstruction等。
因为怀疑是切主导致,因此找对应的逻辑。
方法调用链路如下:
ActiveState#enterState -> startActiveServices -> initializeReplQueues -> processMisReplicatedBlocks -> processMisReplicatesAsync
在processMisReplicatesAsync里会对blocksMap里的所有块进行迭代,鉴定它是不是副本数超了或者副本数低了,然后根据不同的状态放到各自的队列里。
/*
* Since the BlocksMapGset does not throw the ConcurrentModificationException
* and supports further iteration after modification to list, there is a
* chance of missing the newly added block while iterating. Since every
* addition to blocksMap will check for mis-replication,