HDFS

最新推荐文章于 2023-10-09 11:42:28 发布

小张不嚣张@

最新推荐文章于 2023-10-09 11:42:28 发布

阅读量134

点赞数

本文链接：https://blog.csdn.net/weixin_44311552/article/details/103463688

版权

HDFS

HDFS是一种能够运行在商业硬件上的分布式文件系统，与目前市面上文件系统有很多相似之处，但是又是不同的软件系统。在HDFS中，使用的架构是主从架构（Aactive|Standby），针对的是实现NameNode的高可用。

NameNode会启动一个（或者说在一个集群中，只有一个NameNode在运行），管理文件系统的命名空间和控制外界客户端对文件系统的访问；

DataNode是负责管理在每一个节点上存储的文件。

NameNode：存储系统的元数据（用于描述数据的数据|比如说块到DataNode的映射的数据），负责管理DataNode

DataNode:用于存储的数据块的节点，负责响应客户端对块的读写请求，向NameNode汇报块信息

block:数据块，是对文件拆分的最小单元，默认情况下文件拆分大小为128MB 每个数据块有3个副本

rack:机架使用机架对存储节点的进行物理编排用于优化存储和计算
什么是Block块

默认为128MB 为一个块是HDFS中对文件拆分的最小单元

在这里插入图片描述

<property>
  <name>dfs.blocksize</name>
  <value>134217728</value>
  <description>
      The default block size for new files, in bytes.
      You can use the following suffix (case insensitive):
      k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
      Or provide complete size in bytes (such as 134217728 for 128 MB).
  </description>
</property>

<property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication. 
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

（1）为什么块的大小为128MB

在2.X以前，HDFS的块的大小默认为64MB

工业限制，硬件限制（在06年到10年间）

软件优化通常认为最佳状态是：寻址时间为传输时间的100分之一

（2）Block块能否随意设置？

答案当然是不能如果Block块设置的过小，必然会导致集群中存在几百万个小文件，增加寻址时间，效率低下。

如果设置的块过大，会造成剩余空间的浪费，还会造成存取的时间过长。

什么是机架

机架感知

For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks. This policy improves write performance without compromising data reliability or read performance.

The current, default replica placement policy described here is a work in progress.

在默认情况下，当副本因子为3时，第一个副本存在本地一个机架上，第二存在本地的机架的另外一个节点上，第三个存储除本地机架以外的节点上。

此操作减少了机架间的数据流量传出，降低了聚合带宽的压力。

在这里插入图片描述

小张不嚣张@

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
HDFS

HDFSHDFS是一种能够运行在商业硬件上的分布式文件系统，与目前市面上文件系统有很多相似之处，但是又是不同的软件系统。在HDFS中，使用的架构是主从架构（Aactive|Standby），针对的是实现NameNode的高可用。NameNode会启动一个（或者说在一个集群中，只有一个NameNode在运行），管理文件系统的命名空间和控制外界客户端对文件系统的访问；DataNode是负责管理...
复制链接

扫一扫