前言:
公司要统计 treasury库hive表磁盘空间,写了个脚本,如下:
查询hive仓库表占用hdfs文件大小:
hadoop fs -du -h /user/hive/warehouse/treasury.db > treasury.txt
脚本:
#!/bin/sh # while read line do size=$1 num=` echo $line | cut -d " " -f 1` unit=` echo $line | cut -d " " -f 2` num1=` echo $num | cut -d "." -f 1` #echo $num1
if [[ $unit == "G" && $num1 -gt $size || $line =~ " T" ]] ; then
echo $line
fi
done < treasury.txt
调用命令:
sh filter2.sh 100 表示过滤大于100G的表或者含T的表:
结果:
115.2 G 345.5 G /user/hive/warehouse/treasury.db/dm_user_semester
1.5 T 4.4 T /user/hive/warehouse/treasury.db/ods_user_day