【HDFS运维】HDFS回收箱机制：原理、配置、配置可能导致的问题分析

最新推荐文章于 2024-07-26 11:42:11 发布

roman_日积跬步-终至千里

最新推荐文章于 2024-07-26 11:42:11 发布

阅读量1.1k

点赞数 21

分类专栏： # hadoop运维文章标签： hdfs 运维 hadoop

本文链接：https://blog.csdn.net/hiliang521/article/details/134802513

版权

hadoop运维专栏收录该内容

23 篇文章 3 订阅

订阅专栏

文章目录

一. HDFS回收箱机制逻辑
- 1. 基本逻辑
- 2. 举例说明
二. 配置测试
- 1. 配置
- 2. 回收箱相关命令
三. 其他问题讨论
- 1. api不会走trash机制
- 2. 因为设置了Trash configuration导致nn无法响应

一. HDFS回收箱机制逻辑

1. 基本逻辑

If trash configuration is enabled, files removed by FS Shell is not immediately removed from HDFS. Instead, HDFS moves it to a trash directory (each user has its own trash directory under /user//.Trash). The file can be restored quickly as long as it remains in trash.

Most recent deleted files are moved to the current trash directory (/user//.Trash/Current), and in a configurable interval, HDFS creates checkpoints (under /user//.Trash/) for files in current trash directory and deletes old checkpoints when they are expired. See expunge command of FS shell about checkpointing of trash.

After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.

当hdfs配置了回收箱后，文件删除后会移动到回收箱目录，每个用户都有自己的回收箱目录：/user/<username>/.Trash。只要文件删除后还在回收箱，就可以随时恢复。

删除的文件被移动到/user/<username>/.Trash/Current目录下。配置时间间隔后，HDFS为当前垃圾目录下的文件创建检查点(在/user//. trash /下)，并在过期时删除旧的检查点。

当过期时，namenode会从namespace删除此文件。文件删除后会释放与之相关的块。文件删除后，磁盘空间的增加会有延迟。

参考：File Deletes and Undeletes

2. 举例说明

This will result in deleted files being move to trash and retained in trash for 4 days (i.e. fs.trash.interval); every 12 hours (i.e. fs.trash.checkpoint.interval) the Trash is scanned and files older than 4 days will be deleted (“expunged”) from Trash.

比如删除文件后，文件将会在回收箱保留4天（通过fs.trash.interval设置），每12小时（通过 fs.trash.checkpoint.interval设置）扫描回收箱，并将过期（超过4天）的文件删除。

二. 配置测试

1. 配置

在hdfs上面开启trash功能，默认是没有开启的。只需要在hadoop的配置文件core-site.xml中添加下面的内容：

<property>
     <name>fs.trash.interval</name>
     <value>360</value>
     <description>检查点被删除后的分钟数。如果为零，垃圾桶功能将被禁用。
     该选项可以在服务器和客户端上配置。如果垃圾箱被禁用服务器端，则检查客户端配置。
     如果在服务器端启用垃圾箱，则会使用服务器上配置的值，并忽略客户端配置值。
     </description>
</property>
 
<property>
     <name>fs.trash.checkpoint.interval</name>
     <value>0</value>
     <description>垃圾检查点之间的分钟数。应该小于或等于fs.trash.interval。
     如果为零，则将该值设置为fs.trash.interval的值。每次检查指针运行时，
     它都会从当前创建一个新的检查点，并删除比fs.trash.interval更早创建的检查点。
     </description>
</property>

不需要重启，直接执行

2. 回收箱相关命令

# 删除
 bin/hdfs dfs -rm /conf.tar.gz
2023-12-05 14:54:43,989 INFO fs.TrashPolicyDefault: Moved: 'hdfs://xxx/conf.tar.gz' to trash at: hdfs://xmanhdfs3/user/taiyi/.Trash/Current/conf.tar.gz

# 查看回收箱文件
bin/hdfs dfs -ls hdfs://xxx/user/taiyi/.Trash/Current/conf.tar.gz
-rw-r--r--   3 taiyi supergroup       7605 2023-12-05 14:54 hdfs://xxx/user/taiyi/.Trash/Current/conf.tar.gz


# 文件恢复：就是将文件从回收箱中移出
bin/hdfs dfs -mv  hdfs://xxx/user/taiyi/.Trash/Current/conf.tar.gz /

# 清空回收站
bin/hdfs dfs -expunge

# 跳过回收站直接删除
hdfs dfs -rm -r -skipTrash /user/root/123123

三. 其他问题讨论

1. api不会走trash机制

但如果直接调用hadoop delete api进行删除操作，是默认不会走trash机制的，同时也未配置快照功能的情况下，文件所对应的block数据已经开始真正从底层文件系统层面进行删除，此时需要快速的做出决断进行恢复操作。

因为需要停止数据服务（nn、dn），所以需要综合考虑，去权衡恢复数据和停服对线上服务的影响两者之间的利害关系。

参考：

恢复数据
 如何有效恢复误删的HDFS文件

ing

2. 因为设置了Trash configuration导致nn无法响应

Hadoop NameNode becomes un-responsive due to Trash configuration

Resolving The Problem
In order to prevent the NameNode having to perform an extreme amount of file to block map maintenance (which will also impact the DataNode(s)), the settings for fs.trash.interval and fs.trash.checkpoint.interval should be set so that the amount of data to be expunged at a single point of time is within the capability of the environment; a suggestion being under 10GB.

namenode单次删除回收箱的数据不大于10G