hadoop balancer简单介绍及balancer平衡速度的优化

最新推荐文章于 2024-05-20 11:46:11 发布

流一恩典

最新推荐文章于 2024-05-20 11:46:11 发布

阅读量1.8k

点赞数

分类专栏： hadoop篇

本文链接：https://blog.csdn.net/czz1141979570/article/details/96020681

版权

hadoop balance官网介绍:

HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:

Policy to keep one of the replicas of a block on the same node as the node that is writing the block.

Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack.

One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.

Spread HDFS data uniformly across the DataNodes in the cluster.

Due to multiple competing considerations, data might not be uniformly placed across the DataNodes. HDFS provides a tool for administrators that analyzes block placement and rebalanaces data across the DataNode. A brief administrator’s guide for balancer is available at HADOOP-1652

翻译:

HDFS数据可能并不总是均匀地放置在DataNode上。一个常见原因是向现有集群添加了新的DataNode。放置新块时（文件的数据存储为一系列块），NameNode在选择接收这些块的DataNode之前会考虑各种参数。一些考虑因素是：

将块中的一个副本保留在与写入块的节点相同的节点上的策略。

需要在机架上传播块的不同副本，以便群集可以在整个机架丢失时存活。

其中一个副本通常与写入文件的节点放在同一

最低0.47元/天解锁文章

流一恩典

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
hadoop balancer简单介绍及balancer平衡速度的优化

hadoop balance官网介绍: HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for...
复制链接

扫一扫