数据量比较大时,选择官方推荐方案distcp;
1.创建目标数据库
CREATE DATABASE IF NOT EXISTS xxxxxx LOCATION '/xxx/xxx/xxxx/xxxx.db';
2.创建目标表,与原表信息博保持一致
CREATE [EXTERNAL] TABLE `xxxx`(
`uid` string,
`channel` string)
PARTITIONED BY (
`date` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/xxx/xxxx/xxx.db/xxxx';
3.distcp迁移数据
./hadoop distcp hdfs://source/table/dir hdfs://target/table/dir
具体使用请详细看官网文档:https://hadoop.apache.org/docs/r3.1.1/hadoop-distcp/DistCp.html
4.恢复数据
(1)外部表比较简单:msck repair table xxxxx; 分区也会自己修复好;
(2)内部表:
LOAD DATA INPATH '/xxx/xxx/xxx' OVERWRITE INTO TABLE xxx;
LOAD DATA INPATH '/xxx/xxxx/xxx' OVERWRITE INTO TABLE xxx PARTITION (xxxx);
ALTER TABLE xxxx ADD PARTITION (xxxx) location '/xxx/xxx/xxxx/xxxx';
有问题加QQ群:877769335
或者用QQ扫描二维码加群: