HDFS Block损坏恢复实践

最新推荐文章于 2023-12-24 13:07:34 发布

冬瓜螺旋雪碧

最新推荐文章于 2023-12-24 13:07:34 发布

阅读量1k

点赞数 2

分类专栏： Hadoop 文章标签： hdfs block损坏恢复 block恢复

本文链接：https://blog.csdn.net/kzw11/article/details/99966653

版权

Hadoop 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

文章目录

一，介绍：

①：hdfs fsck /path
检查path中文件的健康状况
②：hdfs fsck /path -files -blocks -locations
打印文件块的位置信息(-locations) 需要和-files -blocks一起使用
③：hdfs fsck /path -list-corruptfileblocks
查看文件中损坏的块（-list-corruptfileblocks）
④：hdfs fsck /path -delete
这是删除损坏的文件(它在hdfs上)

二，实践

①在hdfs创建文件夹，上传测试文件，并检查文件健康状况

[hadoop@hadoop001 ~]$ hdfs dfs -mkdir /blocktest
[hadoop@hadoop001 ~]$ hdfs dfs -put testHdfsFile.txt /blocktest/
[hadoop@hadoop001 ~]$ hdfs dfs -ls /blocktest/
Found 1 items
-rw-r--r--   3 hadoop hadoop         60 2019-08-21 14:23 /blocktest/testHdfsFile.txt
[hadoop@hadoop001 ~]$ hdfs fsck /blocktest/
Connecting to namenode via http://hadoop001:50070/fsck?ugi=hadoop&path=%2Fblocktest
FSCK started by hadoop (auth:SIMPLE) from /172.19.252.139 for path /blocktest at Wed Aug 21 14:25:05 CST 2019
.Status: HEALTHY
 Total size:    60 B
 Total dirs:    1
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 60 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Wed Aug 21 14:25:05 CST 2019 in 1 milliseconds

The filesystem under path '/blocktest' is HEALTHY
[hadoop@hadoop001 ~]$

②找出块位置，并且删除一个block副本和block元数据信息

[root@hadoop001 subdir0]#  hdfs fsck /blocktest/testHdfsFile.txt -files -blocks -locations
-bash: hdfs: command not found
[root@hadoop001 subdir0]# su - hadoop
Last login: Wed Aug 21 14:12:44 CST 2019 on pts/0
[hadoop@hadoop001 ~]$ hdfs fsck /blocktest/testHdfsFile.txt -files -blocks -locations
Connecting to namenode via http://hadoop001:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&path=%2Fblocktest%2FtestHdfsFile.txt
FSCK started by hadoop (auth:SIMPLE) from /172.19.252.139 for path /blocktest/testHdfsFile.txt at Wed Aug 21 14:47:17 CST 2019
/blocktest/testHdfsFile.txt 60 bytes, 1 block(s):  OK
0. BP-577895678-172.19.252.139-1566271200217:blk_1073741826_1002 len=60 Live_repl=3 [DatanodeInfoWithStorage[172.19.252.141:50010,DS-ffd3fa19-ddbb-4f5a-b487-d1ecb6a6d95b,DISK], DatanodeInfoWithStorage[172.19.252.140:50010,DS-ce5c4933-ca59-4955-bfcd-b1c6c0276f1f,DISK], DatanodeInfoWithStorage[172.19.252.139:50010,DS-afdf9c32-a7f5-4b9b-b9ff-32bf4ea876e2,DISK]]

Status: HEALTHY
 Total size:    60 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 60 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Wed Aug 21 14:47:17 CST 2019 in 1 milliseconds

The filesystem under path '/blocktest/testHdfsFile.txt' is HEALTHY
[hadoop@hadoop001 ~]$ logout
[root@hadoop001 subdir0]# find / -name "*blk_1073741826_1002*"
/home/hadoop/data/dfs/data/current/BP-577895678-172.19.252.139-1566271200217/current/finalized/subdir0/subdir0/blk_1073741826_1002.meta
[root@hadoop001 subdir0]# cd /home/hadoop/data/dfs/data/current/BP-577895678-172.19.252.139-1566271200217/current/finalized/subdir0/subdir0/
[root@hadoop001 subdir0]# ll
total 20
-rw-rw-r-- 1 hadoop hadoop 4233 Aug 20 12:32 blk_1073741825
-rw-rw-r-- 1 hadoop hadoop   43 Aug 20 12:32 blk_1073741825_1001.meta
-rw-rw-r-- 1 hadoop hadoop   60 Aug 21 14:23 blk_1073741826
-rw-rw-r-- 1 hadoop hadoop   11 Aug 21 14:23 blk_1073741826_1002.meta
#删除块和meta⽂件
[root@hadoop001 subdir0]# rm -rf blk_1073741826*
[root@hadoop001 subdir0]# ll
total 12
-rw-rw-r-- 1 hadoop hadoop 4233 Aug 20 12:32 blk_1073741825
-rw-rw-r-- 1 hadoop hadoop   43 Aug 20 12:32 blk_1073741825_1001.meta
[root@hadoop001 subdir0]#

③重启hdfs，直接模拟损坏效果，然后hdfs fsck /path 进行检出

[hadoop@hadoop001 subdir0]$ hdfs fsck /
Connecting to namenode via http://hadoop001:50070
FSCK started by hadoop (auth:SIMPLE) from /127.0.0.1 for path / at Mon Apr 29 18:51:06 CST 2019
..
/blockrecover/testcorruptfiles.txt: CORRUPT blockpool BP-2041209051-127.0.0.1-1556350579057 block blk_1073741890

/blockrecover/testcorruptfiles.txt: MISSING 1 blocks of total size 51 B.............Status: CORRUPT
 Total size:    654116 B
 Total dirs:    12
 Total files:   14
 Total symlinks:                0
 Total blocks (validated):      14 (avg. block size 46722 B)
  ********************************
  CORRUPT FILES:        1
  MISSING BLOCKS:       1
  MISSING SIZE:         51 B
  CORRUPT BLOCKS:       1
  ********************************
 Minimally replicated blocks:   13 (92.85714 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     0.9285714
 Corrupt blocks:                1
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          1
 Number of racks:               1
FSCK ended at Mon Apr 29 18:51:06 CST 2019 in 41 milliseconds
The filesystem under path '/' is CORRUPT
[hadoop@hadoop001 subdir0]$

Corrupt blocks: 1
有一个block损坏
（此次模拟是伪分布式，集群模式下可能重启hdfs集群就已经自动修复了，看不到损坏的block）

三，修复

①hdfs debug 手动修复(推荐）

手动删除损坏的block块。切记，是删除损坏block文件和meta文件，而不是删除hdfs文件。然后用命令修复：

[hadoop@hadoop001 subdir0]$ hdfs debug recoverLease -path /blocktest/testHdfsFile.txt -retries 10

-retries 重试次数

②手动修复二

先用命令从hdfs上把文件下载到本地，然后删除hdfs上的对应文件，最后在上传上去，

hdfs dfs -ls /xxxx
hdfs dfs -get /xxxx ./
hdfs dfs -rm /xxx
hdfs dfs -put xxx /        #put到hdfs上之后，它会自动变为3份。

③自动修复

当数据块损坏后，DN节点执⾏directoryscan操作之前，都不会发现损坏；
也就是directoryscan操作是间隔6h
dfs.datanode.directoryscan.interval : 21600
在DN向NN进⾏blockreport前，都不会恢复数据块;
也就是blockreport操作是间隔6h
dfs.blockreport.intervalMsec : 21600000
当NN收到blockreport才会进⾏恢复操作。

四，总结

①，区分好hdfs文件和block之间的关系（通常一个文件有三个block副本）
②，⽣产环境中本⼈⼀般倾向于使⽤⼿动修复⽅式，但是前提要⼿动删除损坏的block块。
切记，是删除损坏block⽂件和meta⽂件，⽽不是删除hdfs⽂件。
当然还可以先把⽂件get下载，然后hdfs删除，再对应上传。
切记删除不要执⾏: hdfs fsck / -delete 这是删除损坏的⽂件，那么数据就直接丢了；除⾮⽆所谓丢数据，或者有信⼼从其他地⽅可以补数据到hdfs！

冬瓜螺旋雪碧

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
HDFS Block损坏恢复实践

文章目录一，介绍：二，实践三，修复①hdfs debug 手动修复(推荐）②手动修复二③自动修复四总结一，介绍：①：hdfs fsck /path检查path中文件的健康状况②：hdfs fsck /path -files -blocks -locations打印文件块的位置信息(-locations) 需要和-files -blocks一起使用③：hdfs fsck /path -l...
复制链接

扫一扫