hadoop命令 - Hadoop Non DFS Used concept

使用hadoop dfadmin –report,结果如下:

11.jpg


[grid@h1 hadoop]$ bin/hadoop dfsadmin-report
Configured Capacity: 33518518272 (31.22 GB)
Present Capacity: 17089126400 (15.92 GB)
DFS Remaining:17088819200 (15.92 GB)
DFS Used: 307200 (300 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3(3 total, 0 dead)
Name: 192.168.4.123:50010
Decommission Status : Normal
Configured Capacity: 11172839424 (10.41 GB)
DFS Used: 102400 (100 KB)
Non DFS Used: 5521485824 (5.14 GB)
DFS Remaining:5651251200(5.26 GB)
DFS Used%: 0%
DFS Remaining%: 50.58%
Last contact: Thu Nov 22 15:02:18 CST 2012

Name: 192.168.4.122:50010
Decommission Status : Normal
Configured Capacity: 11172839424 (10.41 GB)
DFS Used: 102400 (100 KB)
Non DFS Used: 5407141888 (5.04 GB)
DFS Remaining:5765595136(5.37 GB)
DFS Used%: 0%
DFS Remaining%: 51.6%
Last contact: Thu Nov 22 15:02:18 CST 2012

Name: 192.168.4.124:50010
Decommission Status : Normal
Configured Capacity: 11172839424 (10.41 GB)
DFS Used: 102400 (100 KB)
Non DFS Used: 5500764160 (5.12 GB)
DFS Remaining:5671972864(5.28 GB)
DFS Used%: 0%
DFS Remaining%: 50.77%
Last contact: Thu Nov 22 15:02:18 CST 2012

说明:我的slaves中包含h1 h2 h3以上分别列出3台机器DFS Remaining5G左右

在此特意说明,是为了与下面进行比较


修改hdfs-site.xml如下图,分别将namedata文件夹设置为2个,其他配置合保持不变

22.jpg


然后分别执行以下操作

bin/stop-all.sh

删除tmp以及namedata文件夹

bin/hadoop namenode –format

bin/start-all.sh

并再次执行report命令,结果如下:

[grid@h1 hadoop]$ bin/hadoop dfsadmin-report
Configured Capacity: 67037036544 (62.43 GB)
Present Capacity: 34189770752 (31.84 GB)
DFS Remaining:34189586432 (31.84 GB)
DFS Used: 184320 (180 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)
Name: 192.168.4.124:50010
Decommission Status : Normal
Configured Capacity: 22345678848 (20.81 GB)
DFS Used: 61440 (60 KB)
Non DFS Used: 11001843712 (10.25 GB)
DFS Remaining:11343773696(10.56 GB)
DFS Used%: 0%
DFS Remaining%: 50.76%
Last contact: Thu Nov 22 16:08:18 CST 2012

Name: 192.168.4.123:50010
Decommission Status : Normal
Configured Capacity: 22345678848 (20.81 GB)
DFS Used: 61440 (60 KB)
Non DFS Used: 11043319808 (10.28 GB)
DFS Remaining:11302297600(10.53 GB)
DFS Used%: 0%
DFS Remaining%: 50.58%
Last contact: Thu Nov 22 16:08:18 CST 2012

Name: 192.168.4.122:50010
Decommission Status : Normal
Configured Capacity: 22345678848 (20.81 GB)
DFS Used: 61440 (60 KB)
Non DFS Used: 10802102272 (10.06 GB)
DFS Remaining:11543515136(10.75 GB)
DFS Used%: 0%
DFS Remaining%: 51.66%
Last contact: Thu Nov 22 16:08:17 CST 2012

请注意:每一台机器上的DFS Remaining增长了一倍。

事实上,当修改hdfs-site.xml设置将namedata目录增加到3个时,report的结果中DFS Remaining会增加到原来的3倍,远远超那一台Slaves机器上本身最大的存储空间。

偶然发现这个有趣的现象,如果空间不够岂不是可以通过增加目录来无限扩大?

显然这是不可能的,于是决定查找资料获得一个合理的解释。

以下资料来源于网络。


Configured Capacity = DFS Used + Non-DFSUsed+ DFS Remaining

Non DFS - it contains all the operatingsystem files + non HDFS data.即硬盘里其他文件占的空间大小

以下是HDFS-api中的源码截图:

55.jpg


44.jpg


33.jpg


关于DFS Remaining的具体计算方法资料甚少,在apache网站找到下面一段或许能够算得上一个合理的解释:

1.    The totalcapacity of dfs. This is a sum of all datanode's capacities, each of which iscalculated by datanode summing all data directories disk space.

2.    The totalremaining space of dfs. This is a sum of all datanodes's remaining space. Eachdatanode's remaining space is calculated by using the following formula:remaining space = unused space - capacity*unusableDiskPercentage - reservedspace. So the remaining space shows how much space that the dfs can still use,but it does not show the size of unused space.

针对以上两点的字面意思做了如下测试

[grid@h1 ~]$ df -ah /home/grid/hadoop_dfs/data1

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/vg_master-lv_root

                       11G  5.0G 5.4G  48% /

[grid@h1 ~]$ df -ah/home/grid/hadoop_dfs/data2

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/vg_master-lv_root

                       11G  5.0G 5.4G  48% /

如果只是单纯地sum每一个datanode目录的磁盘空间的话,那么确实会出现配置两个data目录则空间翻倍的情况,包括Capacitydfs remaining

出现这种情况的一个特殊原因是因为我把data目录全部放在一个磁盘上,如果data目录位于不同的磁盘的话就不存在这个虚胖的问题了。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值