GVM(垃圾回收机制)
原理:当在集群中删除目录文件,集群自动在/user zkpk/.Trash/Current的集群目录中保存一定时间(根据ccore参数而定),在改时间内可以读对文件恢复,一旦超出规定时间,则无法恢复.
在Hadoop集群中有一下四个配置文件,他们都有自己的默认文件如下所示.
当启动集群或者重启后,会依次加载default.xml和site.xml如果二者有冲突以site.xml准
yarn-site.xml-->yarn-default.xml
hdfs-site.xml--hdfs-default.xml
mapred-site.xml--mapred-default.xml
一 在 hadoop-2.5.2搜索xml配置文件
[zkpk@master ~]$ cd /home/zkpk/hadoop-2.5.2
[zkpk@master hadoop-2.5.2]$ find . |grep xml
显示(部分):./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
三 进入core-default.xm
[zkpk@master hadoop-2.5.2]$ vim ./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
copy
<property>
<name>fs.trash.interval</name>
<value>1140</value>
<description>Number of minutes after which the checkpoint
gets deleted. If zero, the trash feature is disabled.
This option may be configured both on the server and the
client. If trash is disabled server side then the client
side configuration is checked. If trash is enabled on the
server side then the value configured on the server is
used and the client configuration value is ignored.
</description>
</property>
到core-site.xml中
四 测试:
1在集群中创建文件
[zkpk@master ~]$ hdfs fs -mkdir /input
2导入数据
3删除目录
[zkpk@master ~]$ hdfs fs -rmr /input
当在集群中创建文件或目录时,在集群中的user目录下生成 zkpk/.Trash/Current/input/sogou.10k
该文件保存时间为core-site.xml中是指的时间以分钟为单位
[zkpk@master ~]$ hadoop fs -lsr /user/zkpk
内容:
-rw-r--r-- 3 zkpk supergroup 1141839 2015-11-02 01:54 /user/zkpk/.Trash/Current/input/sogou.10k
五 恢复
[zkpk@master ~]$ hadoop fs -mv /user/zkpk/.Trash/Current/input /input
六 恢复查看
[zkpk@master ~]$ hadoop fs -ls /input
内容:
-rw-r--r-- 3 zkpk supergroup 1141839 2015-11-02 01:54 /input/sogou.10k
相关连接:http://www.cnblogs.com/ggjucheng/archive/2012/04/18/2454683.html