hadoop balance官网介绍:
HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
Policy to keep one of the replicas of a block on the same node as the node that is writing the block.
Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack.
One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.
Spread HDFS data uniformly across the DataNodes in the cluster.
Due to multiple competing considerations, data might not be uniformly placed across the DataNodes. HDFS provides a tool for administrators that analyzes block placement and rebalanaces data across the DataNode. A brief administrator’s guide for balancer is available at HADOOP-1652
翻译:
HDFS数据可能并不总是均匀地放置在DataNode上。一个常见原因是向现有集群添加了新的DataNode。放置新块时(文件的数据存储为一系列块),NameNode在选择接收这些块的DataNode之前会考虑各种参数。一些考虑因素是:
将块中的一个副本保留在与写入块的节点相同的节点上的策略。
需要在机架上传播块的不同副本,以便群集可以在整个机架丢失时存活。
其中一个副本通常与写入文件的节点放在同一