shell脚本实现删除hdfs下n天前的数据

Antgeek

已于 2022-03-20 18:30:14 修改

阅读量2.3k

点赞数

分类专栏： Hadoop hive 日常报错文章标签： hdfs hadoop big data shell

于 2022-03-17 16:52:00 首次发布

本文链接：https://blog.csdn.net/weixin_44745147/article/details/123554636

版权

日常报错同时被 3 个专栏收录

46 篇文章 0 订阅

订阅专栏

hive

11 篇文章 0 订阅

订阅专栏

Hadoop

7 篇文章 0 订阅

订阅专栏

问题描述

今天上班的时候，突然发现集群的内存仅剩50G了
查看管理界面才发现是9台机器宕掉了
报错log-dir are bad
猜测应该是数据打满了
一看果然是，9台机器上的存储空间都达到了90%以上
百度了一下发现，如果集群的存储达到90%以上就会变成不可用状态
应该是每天sqoop全量导数据导致的
删一删呗，但是也不能每天手动删除吧，那多累
写个脚本来搞一下

问题解决

#! /bin/bash
tt=`date -d "8 day ago" +%Y-%m-%d`
tt1=`date -d $tt +%s`
hdfs dfs -ls hdfs://xxxxxxx/user/hive/warehouse/ods.db/*/ | grep dt\= | awk '{print $NF}' | while read file
do
    # 切割出日期
    file_dt=${file#*=}
    #echo $file_dt
    # 将日期转换成数字
    file_dt_num=$(date -d ${file_dt} +%s)
    #echo $file_dt_num
    # 删除7天之前的数据
    if [ $file_dt_num -le $tt1 ];then
        echo $file" was deleted"
    fi
done

记得还要加上一句清空回收站的命令，否则删了数据还是占空间
测试的时候别清空回收站，否则删错了就没法找回了

优化版本：

#! /bin/bash
tt=`date -d "8 day ago" +%Y-%m-%d`
tt1=`date -d $tt +%s`
hdfs dfs -ls hdfs://xxxxxx/user/hive/warehouse/ods.db/*/ | grep dt\= | awk '{print $NF}' | while read file
do
    # 切割出日期
    file_dt=${file#*=}
    #echo $file_dt
    # 将日期转换成数字
    file_dt_num=$(date -d ${file_dt} +%s)
    #echo $file_dt_num
    # 删除7天之前的数据
    if [ $file_dt_num -le $tt1 ];then
        echo $file
        echo $file_dt >> file_dt_temp.txt
    fi
done
cat file_dt_temp.txt | sort -u | while read file_dt
do
    echo "hadoop fs -rmr   hdfs://xxxxxx/user/hive/warehouse/ods.db/*/dt=${file_dt}"
done
rm file_dt_temp.txt