HDFS 磁盘写及balance

最新推荐文章于 2024-05-12 21:08:38 发布

Lion...

最新推荐文章于 2024-05-12 21:08:38 发布

阅读量382

点赞数

分类专栏：大数据文章标签： hdfs hadoop big data

本文链接：https://blog.csdn.net/weixin_44129801/article/details/122357314

版权

大数据专栏收录该内容

17 篇文章 0 订阅

订阅专栏

1. HDFS写策略

第一复本写本地, 第二复本写其他机架, 第三复本写其他机架的不同节点
目的: 尽可能地容灾, 不仅防止单台机器宕机, 也防止整个机架异常; 同时保证写的速度 (本地更快)

The class is responsible for choosing the desired number of targets for placing block replicas.
The replica placement strategy is that if the writer is on a datanode, the 1st replica is placed on the local machine, otherwise a random datanode.
The 2nd replica is placed on a datanode that is on a different rack.
The 3rd replica is placed on a datanode which is on a different node of the rack as the second replica.

1.1. 本地偏好配置

dfs.namenode.block-placement-policy.default.prefer-local-node
默认为true, 当存在本地put操作时, 优先选择本机, 最终结果是本机datanode存储使用率高

Controls how the default block placement policy places the first replica of a block. When true, it will prefer the node where the client is running. When false, it will prefer a node in the same rack as the client. Setting to false avoids situations where entire copies of large files end up on a single node, thus creating hotspots.

2. 磁盘卷组选择

配置项: dfs.datanode.fsdataset.volume.choosing.policy

RoundRobinVolumeChoosingPolicy, choose volumes with the same storage type in round-robin order
AvailableSpaceVolumeChoosingPolicy, A DN volume choosing policy which takes into account the amount of free space on each of the available volumes when considering where to assign a new replica allocation. By default this policy prefers assigning replicas to those volumes with more available free space, so as to over time balance the available space of all the volumes within a DN

2.1 Round-Robin策略

本质就是轮询

2.2. Available Space策略

If none of the volumes with low free space have enough space for the replica, always try to choose a volume with a lot of free space
权重计算: (充裕磁盘数 * 0.75) / (充裕磁盘卷数 * 0.75 + 紧缺磁盘卷数 * 0.25), 通过随机数与权重比较, 从概率的角度选择磁盘

3. Balancer命令

# hdfs balancer --help
Usage: hdfs balancer
	[-policy <policy>]	the balancing policy: datanode or blockpool
	[-threshold <threshold>]	Percentage of disk capacity
	[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]	Excludes the specified datanodes.
	[-include [-f <hosts-file> | <comma-separated list of hosts>]]	Includes only the specified datanodes.
	[-source [-f <hosts-file> | <comma-separated list of hosts>]]	Pick only the specified datanodes as source nodes.
	[-idleiterations <idleiterations>]	Number of consecutive idle iterations (-1 for Infinite) before exit.
	[-runDuringUpgrade]	Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.