HDFS 磁盘写及balance

1. HDFS写策略

第一复本写本地, 第二复本写其他机架, 第三复本写其他机架的不同节点
目的: 尽可能地容灾, 不仅防止单台机器宕机, 也防止整个机架异常; 同时保证写的速度 (本地更快)

  • The class is responsible for choosing the desired number of targets for placing block replicas.
  • The replica placement strategy is that if the writer is on a datanode, the 1st replica is placed on the local machine, otherwise a random datanode.
  • The 2nd replica is placed on a datanode that is on a different rack.
  • The 3rd replica is placed on a datanode which is on a different node of the rack as the second replica.

1.1. 本地偏好配置

dfs.namenode.block-placement-policy.default.prefer-local-node
默认为true, 当存在本地put操作时, 优先选择本机, 最终结果是本机datanode存储使用率高

Controls how the default block placement policy places the first replica of a block. When true, it will prefer the node where the client is running. When false, it will prefer a node in the same rack as the client. Setting to false avoids situations where entire copies of large files end up on a single node, thus creating hotspots.

2. 磁盘卷组选择

配置项: dfs.datanode.fsdataset.volume.choosing.policy

  • RoundRobinVolumeChoosingPolicy, choose volumes with the same storage type in round-robin order
  • AvailableSpaceVolumeChoosingPolicy, A DN volume choosing policy which takes into account the amount of free space on each of the available volumes when considering where to assign a new replica allocation. By default this policy prefers assigning replicas to those volumes with more available free space, so as to over time balance the available space of all the volumes within a DN

2.1 Round-Robin策略

本质就是轮询
image.png

2.2. Available Space策略

  • If none of the volumes with low free space have enough space for the replica, always try to choose a volume with a lot of free space
  • 权重计算: (充裕磁盘数 * 0.75) / (充裕磁盘卷数 * 0.75 + 紧缺磁盘卷数 * 0.25), 通过随机数与权重比较, 从概率的角度选择磁盘
    image.png

3. Balancer命令

# hdfs balancer --help
Usage: hdfs balancer
	[-policy <policy>]	the balancing policy: datanode or blockpool
	[-threshold <threshold>]	Percentage of disk capacity
	[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]	Excludes the specified datanodes.
	[-include [-f <hosts-file> | <comma-separated list of hosts>]]	Includes only the specified datanodes.
	[-source [-f <hosts-file> | <comma-separated list of hosts>]]	Pick only the specified datanodes as source nodes.
	[-idleiterations <idleiterations>]	Number of consecutive idle iterations (-1 for Infinite) before exit.
	[-runDuringUpgrade]	Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.
  • -threshold, 磁盘使用率阀值, 范围区间[0, 100], 超过阀值的节点将向低于阀值的节点迁移数据 (遵循HDFS存储的容错规则)
  • -exclude, 排除指定DataNode
  • -include, 只针对指定DataNode
  • -exclude和-include无法同时指定

3.1. BALANCER原理

计算每个DataNode节点磁盘使用率, 并结合集群平均使用率v1, 以及配置项threshold, 将DataNode划分为四个等级

HDFS集群的平均使用率= sum(DFS Used) * 100 / sum(Capacity)

image.png

相关参数

  • dfs.balancer.max-size-to-move, 最大迁移量, 默认10G
  • dfs.balancer.moverThreads, mover线程数, 默认1000
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
HDFSHadoop Distributed File System)是一个分布式文件系统,它的设计就是为了能够在通用硬件上运行,同时提供高吞吐量的数据访问。下面是HDFS文件读写的一些Java代码示例: 1. 读取文件 ```java // 获取Hadoop配置信息 Configuration conf = new Configuration(); // 构造一个HDFS文件系统对象 FileSystem fs = FileSystem.get(URI.create("hdfs://localhost:9000"), conf); // 构造一个输入流,用于读取文件内容 Path inFile = new Path("/user/hadoop/input/test.txt"); FSDataInputStream in = fs.open(inFile); // 读取文件内容 byte[] buffer = new byte[1024]; int bytesRead = in.read(buffer); while (bytesRead > 0) { System.out.println(new String(buffer, 0, bytesRead)); bytesRead = in.read(buffer); } // 关闭输入流 in.close(); ``` 2. 写入文件 ```java // 获取Hadoop配置信息 Configuration conf = new Configuration(); // 构造一个HDFS文件系统对象 FileSystem fs = FileSystem.get(URI.create("hdfs://localhost:9000"), conf); // 构造一个输出流,用于写入文件内容 Path outFile = new Path("/user/hadoop/output/test.txt"); FSDataOutputStream out = fs.create(outFile); // 写入文件内容 out.write("Hello, World!".getBytes()); // 关闭输出流 out.close(); ``` 3. 检查文件是否存在 ```java // 获取Hadoop配置信息 Configuration conf = new Configuration(); // 构造一个HDFS文件系统对象 FileSystem fs = FileSystem.get(URI.create("hdfs://localhost:9000"), conf); // 检查文件是否存在 Path path = new Path("/user/hadoop/input/test.txt"); boolean exists = fs.exists(path); System.out.println("File exists: " + exists); ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值