1、有分区元数据删除分区数据进行分区修复
1.1、删除一个分区数据
2、hdfs外部表分区修复
2.1、建外部表
对于分区表此命令只会将已经存在的目录数据的元数据给添加上去,而不会将分区数据丢失的元数据也删除;
对于hdfs外部表,只命令也适用;对于基于OSS及操作系统本地的外部表,还未进行过测试。
1.1、删除一个分区数据
hadoop dfs -rmr /user/hive/warehouse/xxxxxx_uid_online/pt_day=2017-12-28
1.2、使用分区修复命令hive> MSCK REPAIR TABLE xxxxxx_uid_online;
OK
Partitions missing from filesystem: xxxxxx_uid_online:pt_day=2017-12-28
Time taken: 0.515 seconds, Fetched: 1 row(s)
1.3、拷回原数据hadoop distcp hdfs://10.28.28.177:9000/user/hive/warehouse/xxxxxx_uid_online/2017-12-28 hdfs://emr-cluster/user/hive/warehouse/xxxxxx_uid_online/pt_day=2017-12-28
1.4、使用分区修复命令hive> MSCK REPAIR TABLE xxxxxx_uid_online;
OK
Time taken: 0.106 seconds
1.5、可见:数据缺失使用此命令只会报告分区从文件系统丢失,而不会删除分区;分区与元数据匹配则不会报任何信息。2、hdfs外部表分区修复
2.1、建外部表
drop table if exists xxxxxx_uid_online_external;
CREATE external TABLE `xxxxxx_uid_online_external`(
`datehour` string,
`halfhourtype` string,
`uid` string,
`roomid` string,
`roomcreatoruid` string,
`staytime` string)
PARTITIONED BY (
`pt_day` string)
location '/user/hive/warehouse/xxxxxx_uid_online';
2.2、分区查看hive> show partitions xxxxxx_uid_online_external;
OK
Time taken: 0.101 seconds
2.3、分区修复hive> MSCK REPAIR TABLE xxxxxx_uid_online_external;
OK
Partitions not in metastore: xxxxxx_uid_online_external:pt_day=2017-12-28 xxxxxx_uid_online_external:pt_day=2017-12-29 xxxxxx_uid_online_external:pt_day=2017-12-30 xxxxxx_uid_online_external:pt_day=2017-12-31 xxxxxx_uid_online_external:pt_day=2018-01-01
Repair: Added partition to metastore xxxxxx_uid_online_external:pt_day=2017-12-28
Repair: Added partition to metastore xxxxxx_uid_online_external:pt_day=2017-12-29
Repair: Added partition to metastore xxxxxx_uid_online_external:pt_day=2017-12-30
Repair: Added partition to metastore xxxxxx_uid_online_external:pt_day=2017-12-31
Repair: Added partition to metastore xxxxxx_uid_online_external:pt_day=2018-01-01
Time taken: 0.234 seconds, Fetched: 6 row(s)
2.4、分区查看hive> show partitions xxxxxx_uid_online_external;
OK
pt_day=2017-12-28
pt_day=2017-12-29
pt_day=2017-12-30
pt_day=2017-12-31
pt_day=2018-01-01
Time taken: 0.045 seconds, Fetched: 5 row(s)
2.5、数据查看hive> select * from xxxxxx_uid_online_external limit 10;
OK
2017-12-28 00 0~29 -1 1039079 317125 2 2017-12-28
2017-12-28 00 0~29 -1 1041389 17320298 4 2017-12-28
2017-12-28 00 0~29 -1 23405439 1154360798 5 2017-12-28
2017-12-28 00 0~29 -1 23886701 1155886975 5 2017-12-28
2017-12-28 00 0~29 -1 24682061 1160460359 1 2017-12-28
2017-12-28 00 0~29 -1 45191246 1256186906 1 2017-12-28
2017-12-28 00 0~29 -1 65532546 1186844167 33 2017-12-28
2017-12-28 00 0~29 -1 65898988 1331472417 198 2017-12-28
2017-12-28 00 0~29 1001388813 1604049 33486478 1 2017-12-28
2017-12-28 00 0~29 1003639852 1340051 37107892 4 2017-12-28
Time taken: 0.148 seconds, Fetched: 10 row(s)
3、说明对于分区表此命令只会将已经存在的目录数据的元数据给添加上去,而不会将分区数据丢失的元数据也删除;
对于hdfs外部表,只命令也适用;对于基于OSS及操作系统本地的外部表,还未进行过测试。