最近生产环境出现了一个错误,spark无法写入数据的到hive报以下错误
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Invalid partition for table orc_report_behavior
at org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:208)
at org.apache.hadoop.hive.ql.metadata.Partition.<init>(Partition.java:106)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllPartitionsOf(Hive.java:2103)
... 194 more
Caused by: MetaException(message:Invalid partition key & values; keys [day, ], values [])
at org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:550)
at org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:483)
at org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:192)
... 196 more
首先怀疑hive进程问题,直接通过hive查询表数据,结果还是包以上错误。检查数据源发现有乱码,百度了下,找到解决方案,特记录下。
解决方法
删除所有表分区
#!/bin/bash
source /etc/profile
if [ $# -ne 2 ] ;then
echo "必须输入两个参数,格式为20180801 20180811"
exit -1
fi
startDate=$1
endDate=$2
startSec=`date -d "$startDate" "+%s"`
endSec=`date -d "$endDate" "+%s"`
echo $startSec "==" $endSec
if [ ${startDate} -gt ${endDate} ] ; then
echo "开始时间必须小于结束时间"
fi
for((i=$startSec;i<=$endSec;i+=86400))
do
day=`date -d "@$i" "+%Y%m%d"`
hive -e "use bigdata;alter table orc_report_behavior drop partition(day='$day') "
done
- .删除mysql下元数据信息
2.删除MySQL下的元数据信息
登陆到存放元数据信息的mysql
SELECT * FROM TBLS WHERE TBL_NAME='orc_report_behavior';
找到TBL_ID,根据TBL_ID找到分区信息
select * from PARTITIONS where tbl_id='6552';
找到其中一个分区信息
select * from PARTITION_KEY_VALS where part_id=11426;
select * from PARTITION_PARAMS where part_id=11426;
删除
delete from PARTITION_KEY_VALS where part_id=11426;
delete from PARTITION_PARAMS where part_id=11426;
delete from PARTITIONS where tbl_id='6552' and part_id=11426;;
- 恢复数据
-
MSCK REPAIR TABLE orc_report_behavior;