ccah-500 第48题 Choose three reasons why should you run the HDFS balancer periodically

48.Choose three reasons why should you run the HDFS balancer periodically?(Choose three)

A. To ensure that there is capacity in HDFS for additional data

B. To ensure that all blocks in the cluster are 128MB in size

C. To help HDFS deliver consistent performance under heavy loads

D. To ensure that there is consistent disk utilization across the DataNodes

E. To improve data locality MapReduce

Answer: C,D,E

Explanation:

https://www.quora.com/It-is-recommended-that-you-run-the-HDFS-balancer-periodically-Why-Choose-3

E: Balancer does not take data locality into consideration unless it is moving a block.  In a cluster that is balanced up to its threshold, it will not move a block just because it is violating the locality policy. (Use a setrep +/setrep - process instead.)

稍微解释下,参考下面的oreily,此选项合力。
A: Think of the Towers of Hanoi puzzle.  If you move a ring from one peg to another, has the total number of rings changed?  No.  Same thing with HDFS: balancing has no impact on total capacity, which would be the same if you balanced or not.  It does help the NN place new blocks by allowing for more possible places to place data and ensure that it can replicate blocks more evenly -- i.e., better utilize that capacity.  But capacity doesn't magically become available by moving it unless your systems are so broken that they are losing blocks during the move.
C: Under heavy loads, rack locality is more important than node locality because newer data is more likely to be read than the older data.   (See studies by Y! and others). Given A above, running balancer is less likely to have any significant impact on performance unless the datanodes are extremely unbalanced (either by a major delete, recently added nodes, etc).
B: Clearly balancer doesn't change the block size.  distcp, however, can.
D: This was the sole reason that the balancer was written.  Balancing the utilization across disks goes back to the explanation given in B. Source: I was there.

 

oreily:

Over time, the distribution of blocks across datanodes can become unbalanced. An unbalanced cluster can affect locality for MapReduce, and it puts a greater strain on the highly utilized datanodes, so its best avoided.

 

The balancer program is a Hadoop daemon that redistributes blocks by moving them from overutilized datanodes to underutilized datanodes, while adhering to the block replica placement policy that makes data loss unlikely by placing block replicas on different racks

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值