Hadoop之Non DFS Used大小分析

Non DFS Used为非hadoop文件系统所使用的空间,比如说本身的linux系统使用的,或者存放的其它文件。

计算公式:Non DFS used=(总容量-预留空间)- 剩余容量 - DFS使用容量

具体计算过程:

Non DFS Used = Configured Capacity - Remaining Space - DFS Used
Configured Capacity = Total Disk Space - Reserved Space.
Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Let's take a example. Assuming I have 500 GB disk, and I set the reserved space (dfs.datanode.du.reserved) to 50 GB.
In the disk, the system and other files used up to 120 GB, DFS Used 100 GB. If you run df -h , you will see the available space is 280GB for that disk volume.
In HDFS web UI, it will show
Non DFS used = 500GB(Total) - 50 GB( Reserved) - 100 GB (DFS used) - 280GB(Remaining) = 70 GB

So it actually means, you initially configured to reserve 50G for non dfs usage, and 450 G for HDFS. However, it turns out non dfs usage exceeds the 50G reservation and eat up 100 GB space which should belongs to HDFS!
The term "Non DFS used" should really be renamed to something like "How much configured DFS capacity are occupied by non dfs use"
"Non DFS used" 应该解释为"配置的dfs的空间有多少空间被不是hdfs的文件占用了的"

One useful command is" lsof | grep delete", which will help you identify those open file which has been deleted. Sometimes, Hadoop processes (like hive, yarn, and mapred and hdfs) may hold reference to those already deleted files. And these references will occupy disk space.
使用命令lsof | grep delete 这将帮助你识别那些已被删除的文件
很多流程(像hive, yarn, and mapred and hdfs)可能引用那些已经删除文件。这些引用将占用磁盘空间。


可以使用 du -hsx * | sort -rh | head -10 帮助最大的十大文件夹列表

du -hsx * | sort -rh | head -10 helps list the top ten largest folders.

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值