前言
前一篇文章中刚刚分析完HDFS的异构存储以及相关的存储类型选择策略,浏览量还是不少的,说明大家对于HDFS的异构存储方面的功能还是很感兴趣的.但是其实一个文件Block块从最初的产生到最后的落盘,存储类型选择策略只是其中1步,因为存储类型选择策略只是帮你先筛选了一些符合存储类型要求的存储节点目录位置列表,通过这些候选列表,你还需要做进一步的筛选,这就是本文所准备阐述的另外一个主题,HDFS的副本放置策略.在写本文之前,我搜过网上关于此方面的资料与文章,还是有许多文章写的非常不错的,所以我会在本文中涉及到其他相关方面的个人感觉有用的知识点与大家分享,不至于文章显得太千篇一律了.
何为副本放置策略
首先这里要花一些篇幅来介绍什么是副本放置策略, 有人也会叫他为副本选择策略,这源于此策略的名称, BlockPlacementPolicy.所以这个策略类重在block placement.先来看下这个策略类的功能说明:
This interface is used for choosing the desired number of targets for placing block replicas.
大意就是说选择期望的目标节点供副本block存放.
现有副本放置策略
目前在HDFS中现有的副本防止策略类有2大继承子类,分别为BlockPlacementPolicyDefault, BlockPlacementPolicyWithNodeGroup,其中继承关系如下所示:
我们日常生活中提到最经典的3副本策略用的就是BlockPlacementPolicyDefault策略类.3副本如何存放在这个策略中得到了非常完美的实现.在BlockPlacementPolicyDefault类中的注释具体解释了3个副本的存放位置:
The class is responsible for choosing the desired number of targets
for placing block replicas.
The replica placement strategy is that if the writer is on a datanode,
the 1st replica is placed on the local machine,
otherwise a random datanode. The 2nd replica is placed on a datanode
that is on a different rack. The 3rd replica is placed on a datanode
which is on a different node of the rack as the second replica.
简要概况起来3点:
- 1st replica. 如果写请求方所在机器是其中一个datanode,则直接存放在本地,否则随机在集群中选择一个datanode.
- 2nd replica. 第二个副本存放于不同第一个副本的所在的机架.
- 3rd replica.第三个副本存放于第二个副本所在的机架,但是属于不同的节点.
所以总的存放效果图如下所示
这里橙色区域表示的就是writer写请求者,绿色区域就是1个副本.从这里可以看出,HDFS在容错性的设计上还是做了很多的思考的.从下文开始主要分析的就是BlockPlacementPolicyDefault默认放置策略,至于BlockPlacementPolicyWithNodeGroup也会稍微提一提,但是二者主要区别其实不大.
BlockPlacementPolicyDefault默认副本放置策略的分析
BlockPlacementPolicyDefault这个类中的选择目标节点的处理逻辑还是有些复杂的,我会尽量讲的简单化,如有不理解之处,读者可以自己对照源码进行进一步的学习.
策略核心方法chooseTargets
在默认放置策略方法类中,核心方法就是chooseTargets,但是在这里有2种同名实现方法,唯一的区别是有无favoredNodes参数.favoredNodes的意思是偏爱,喜爱的节点.这2个方法的介绍如下
/**
* choose <i>numOfReplicas</i> data nodes for <i>writer</i>
* to re-replicate a block with size <i>blocksize</i>
* If not, return as many as we can.
*
* @param srcPath the file to which this chooseTargets is being invoked.
* @param numOfReplicas additional number of replicas wanted.
* @param writer the writer's machine, null if not in the cluster.
* @param chosen datanodes that have been chosen as targets.
* @param returnChosenNodes decide if the chosenNodes are returned.
* @param excludedNodes datanodes that should not be considered as targets.
* @param blocksize size of the data to be written.
* @return array of DatanodeDescriptor instances chosen as target
* and sorted as a pipeline.
*/
public abstract DatanodeStorageInfo[] chooseTarget(String srcPath,
int numOfReplicas,
Node writer,
List<DatanodeStorageInfo> chosen,
boolean returnChosenNodes,
Set<Node> excludedNodes,
long blocksize,
BlockStoragePolicy storagePolicy);
/**
* Same as {@link #chooseTarget(String, int, Node, Set, long, List, StorageType)}
* with added parameter {@code favoredDatanodes}
* @param favoredNodes datanodes that should be favored as targets. This
* is only a hint and due to cluster state, namenode may