1. 问题
NameNode Heap size配置多大合适?
2. 分析
2.1. 前提
以后标准引用自cloudera ( Sizing NameNode Heap Memory | 5.16.x | Cloudera Documentation), 供参考, 准确性请自行考究
- In HDFS, data and metadata are decoupled. Data files are split into block files that are stored, and replicated, on DataNodes across the cluster. The filesystem namespace tree and associated metadata are stored on the NameNode.
- Namespace objects are file inodes and blocks that point to block files on the DataNodes. These namespace objects are stored as a file system image (fsimage) in the NameNode’s memory and also persist locally.
- Each namespace object on the NameNode consumes approximately 150 bytes.
- Replication affects disk space but not memory consumption.
- A more conservative estimate is 1 GB of memory for every million blocks.
2.2. NameNode堆内存使用计算评估
假设共有1024M数据, HDFS块大小128M
2.2.1. 场景一, 1个1024M文件
1个文件=1 file inode, 占用8个块 (1024 / 128) = 8 blocks metadata
消耗namenode堆内存: 150 bytes * (1 + 8) = 1350 bytes
2.2.2. 场景二, 8个128M文件
8个文件 = 8 file inodes, 占用8个块 = 8 blocks metadata
消耗namenode堆内存: 150 bytes * (8 + 8) = 2400 bytes
2.2.3. 场景三, 1024个1M文件
1024个文件 = 1024 file inodes, 占用1024个块 = 1024 blocks metadata
消耗namenode堆内存: 150 bytes * (1024 + 1024) = 307200 bytes
2.3. NameNode堆内存大小需求评估
假设有100台存储主机, 每台数据存储16T, HDFS块大小128M, 副本数为3
- 最大数据量: 100 * 12 T = 1200 T
- 块数量: 1200 T / (128M * 3) = 3125000
- 需要的堆内存: 3125000/ 1000000 ~= 3G
- 平均1G内存对应400T数据
实际情况, 文件不可能按128M完美切分, 小文件的存在, 使NameNode的堆内存使用率比想象的高
如果块的平均大小为1M, 那128M的理想大小块消耗内存变为原来的128倍, 所以1G内存真实对应3T数据 (400T / 128)
3. 疑惑
官方认为1G内存对应1百万个块
按照上面的前提计算, 1百万块实际大约消耗内存300M ( 150 bytes * (1000000 + 1000000)), 远低于1G, 当然NameNode堆内存除了存储元数据, 还需要供自身进程使用