shell脚本删除hive外部表用户指定多分区数据(删除hdfs上元数据)

最新推荐文章于 2023-06-07 16:57:28 发布

herry_liang

最新推荐文章于 2023-06-07 16:57:28 发布

阅读量1k

点赞数

本文链接：https://blog.csdn.net/hetry_liang/article/details/115408932

版权

1、首先我们创建一个多分区的外部表:tmp.clear_external_mulpartitiondata_test2,其中分区字段为year,month,day;并向该表中插入数据。

-- 创建源数据表tmp.cleardata_test2，并手动插入数据
drop table if exists tmp.cleardata_test2;
create table if not exists tmp.cleardata_test2(
    stuid  string,
	stuname  string,
    age int,
	`year` string,
	`month` string,
	`day`   string
)
row format delimited fields terminated by ',';

insert overwrite table tmp.cleardata_test2 values
('1','zhangsan',14,'2007','4','25'),
('2','lisi',12,'2009','5','15'),
('3','wangwu',14,'2007','8','10'),
('4','zhaoliu',15,'2006','10','9'),
('5','sunqi',12,'2009','5','12'),
('6','liuba',14,'2007','2','22'),
('7','yiyi',13,'2008','8','9'),
('8','ersha',15,'2006','11','13'),
('9','sande',11,'2010','3','30'),
('10','sige',14,'2007','4','16'),
('11','zhaoqian',14,'2007','6','7'),
('12','zhengwu',13,'2008','1','13'),
('13','zhouwang',13,'2008','12','1'),
('14','shangguan',15,'2006','9','10'),
('15','ruqian',14,'2007','5','5'),
('16','lilin',12,'2009','4','1'),
('17','wangjing',14,'2007','8','12'),
('18','zhaosi',15,'2006','7','9'),
('19','tangkai',14,'2007','8','10'),
('20','wangliang',14,'2007','8','10');

--创建目标表tmp.clear_external_mulpartitiondata_test2并通过select方式插入数据
drop table if exists tmp.clear_external_mulpartitiondata_test2;
create external table if not exists tmp.clear_external_mulpartitiondata_test2(
    stuid  string,
	stuname  string,
    age int	
) partitioned by(`year` string,`month` string,`day` string)
row format delimited fields terminated by ',';
insert overwrite table tmp.clear_external_mulpartitiondata_test2 partition (`year`,`month`,`day`) select stuid,stuname,age,`year`,`month`,`day` from tmp.cleardata_test2;

2、表数据展示
在这里插入图片描述
3、编写shell脚本，根据用户输入的参数删除hive表中数据，同时删除hdfs上元数据。

#!/bin/bash
# 脚本名称：partition_test.sh 

#首先判断用户输入的参数格式是否小于一个，如小于一个参数，要求用户至少输入数据库.表名
if [ $# -lt 1 ];then
    echo "请至少提供数据库.表名"
fi

#使用正则匹配目标表在hdfs上的路径
info=`hive -e "desc formatted $1"`
info=(`echo $info| sed -r "s/.*?(hdfs:.*?) Table.*?/\1/g"`)
info=${info[0]}

#参数列表 DB.TABLE 第一层分区值 第二层分区值 ...，形成参数列表数组
ps=($*)
#分区字段名数组
pp=("year" "month" "day")
pi=0
#删除参数列表中的DB.TABLE
unset ps[0]
#循环拼多级分区名
for p in ${ps[@]};do
    info=$info"/"${pp[pi]}"="$p
    ((pi++))
done

#hdfs dfs -rm -r -skipTrash $PATH   慎用彻底删除
#删除放入回收站
hdfs dfs -rm -r "$info"

4、在linux中执行该脚本，执行时先授予脚本可执行权限,然后传入参数执行脚本
在这里插入图片描述
5、如上图所示，先授予可执行权限,然后传入参数

数据库.表名=tmp.clear_external_partitiondata_test2
year=2007
month=5
day=5
开始删除hdfs上面的路径，同时hive中表数据也被删除

6、验证hive数据和hdfs上元数据是否存在
在这里插入图片描述

hdfs上2007年5月5日的路径已经不存在，说明元数据已经删除

herry_liang

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
shell脚本删除hive外部表用户指定多分区数据(删除hdfs上元数据)

1、首先我们创建一个多分区的外部表:tmp.clear_external_mulpartitiondata_test2,其中分区字段为year,month,day;并向该表中插入数据。-- 创建源数据表tmp.cleardata_test2，并手动插入数据drop table if exists tmp.cleardata_test2;create table if not exists tmp.cleardata_test2( stuid string, stuname string,
复制链接

扫一扫