集群重启未关闭flume采集
报错日志
Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-1528014083-172.17.2.50-1661927643625:blk_1073741895_1071; getBlockSize()=21485; corrupt=false; offset=134217728; locs=[DatanodeInfoWithStorage[172.17.2.54:9866,DS-ed1a9a41-4914-414d-9573-90deac8ab373,DISK], DatanodeInfoWithStorage[172.17.2.52:9866,DS-92b73c05-93dc-40c6-87bf-f530909b03e3,DISK]]}
解决步骤
- 将所有失败文件放入一个文本文件
hdfs fsck / -openforwrite >> out.txt
- 根据表明过滤对应失败文件
1. grep '表名' out.txt >> out1.txt
2. cat out1.txt | awk -F ":" '{print $1}' > out_sys1.txt
- 编辑脚本执行修复
cat /opt/module/flume/out_sys1.txt | while read line
do
hdfs debug recoverLease -path ${line} -retries 3
done
4、注意
如果损坏文件太多,第三步可以nohup运行;
谨记集群重启,一定要先停止flume