HDFS, NameNode堆内存计算

最新推荐文章于 2023-05-21 14:03:48 发布

Lion...

最新推荐文章于 2023-05-21 14:03:48 发布

阅读量1k

点赞数

分类专栏：大数据文章标签： big data hdfs hadoop

本文链接：https://blog.csdn.net/weixin_44129801/article/details/124960714

版权

17 篇文章 0 订阅

订阅专栏

1. 问题

NameNode Heap size配置多大合适?

以后标准引用自cloudera ( Sizing NameNode Heap Memory | 5.16.x | Cloudera Documentation), 供参考, 准确性请自行考究

In HDFS, data and metadata are decoupled. Data files are split into block files that are stored, and replicated, on DataNodes across the cluster. The filesystem namespace tree and associated metadata are stored on the NameNode.
Namespace objects are file inodes and blocks that point to block files on the DataNodes. These namespace objects are stored as a file system image (fsimage) in the NameNode’s memory and also persist locally.
Each namespace object on the NameNode consumes approximately 150 bytes.
Replication affects disk space but not memory consumption.
A more conservative estimate is 1 GB of memory for every million blocks.

假设共有1024M数据, HDFS块大小128M

1个文件=1 file inode, 占用8个块 (1024 / 128) = 8 blocks metadata

消耗namenode堆内存: 150 bytes * (1 + 8) = 1350 bytes

8个文件 = 8 file inodes, 占用8个块 = 8 blocks metadata

消耗namenode堆内存: 150 bytes * (8 + 8) = 2400 bytes

1024个文件 = 1024 file inodes, 占用1024个块 = 1024 blocks metadata

消耗namenode堆内存: 150 bytes * (1024 + 1024) = 307200 bytes

假设有100台存储主机, 每台数据存储16T, HDFS块大小128M, 副本数为3

实际情况, 文件不可能按128M完美切分, 小文件的存在, 使NameNode的堆内存使用率比想象的高

如果块的平均大小为1M, 那128M的理想大小块消耗内存变为原来的128倍, 所以1G内存真实对应3T数据 (400T / 128)

官方认为1G内存对应1百万个块

按照上面的前提计算, 1百万块实际大约消耗内存300M ( 150 bytes * (1000000 + 1000000)), 远低于1G, 当然NameNode堆内存除了存储元数据, 还需要供自身进程使用

关注

专栏目录