hadoop distcp -update -skipcrccheck -m $num_map $old_table_location $new_table_location
命令的使用。
简单介绍:http://blog.csdn.net/stark_summer/article/details/45869945
如何在两个集群之间进行表数据的拷贝呢?
- 复制表结构;
- 获取旧表的Location、在获取新表的Location,通过下面的命令进行复制:
- 使用msck repair table new_table命令,修复新表的分区元数据(分区表的必备)。
下面我进行相应的操作:
hive> select *
> from t441;
OK
30 beijing dongdong man
40 shanghai lisi woman
Time taken: 0.078 seconds
hive> desc formatted t441;
OK
# col_name data_type comment
id int None
city string None
name string None
sex string None
# Detailed Table Information
Database: fdm
Owner: root
CreateTime: Mon May 01 09:09:36 PDT 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop:9000/warehouse/fdm.db/t441
Table Type: MANAGED_TABLE
从上面我们可以获取到旧表的Location。
而后我们在获取新表的Location。
hive> desc formatted t444;
OK
# col_name data_type comment
id int None
city string None
name string None
sex string None
# Detailed Table Information
Database: fdm
Owner: root
CreateTime: Mon May 01 09:56:57 PDT 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop60:9000/warehouse/fdm.db/t444
Table Type: MANAGED_TABLE
最后我们进行拷贝数据操作:
[root@hadoop local]# hadoop distcp -update -skipcrccheck hdfs://hadoop:9000/warehouse/fdm.db/t441 hdfs://hadoop60:9000/warehouse/fdm.db/t444 ;
17/05/01 10:09:10 INFO tools.DistCp: srcPaths=[hdfs://hadoop:9000/warehouse/fdm.db/t441]
17/05/01 10:09:10 INFO tools.DistCp: destPath=hdfs://hadoop60:9000/warehouse/fdm.db/t444
17/05/01 10:09:10 INFO tools.DistCp: sourcePathsCount=2
17/05/01 10:09:10 INFO tools.DistCp: filesToCopyCount=1
17/05/01 10:09:10 INFO tools.DistCp: bytesToCopyCount=47.0
17/05/01 10:09:11 INFO mapred.JobClient: Running job: job_201705010710_0010
17/05/01 10:09:12 INFO mapred.JobClient: map 0% reduce 0%
17/05/01 10:09:17 INFO mapred.JobClient: map 100% reduce 0%
17/05/01 10:09:17 INFO mapred.JobClient: Job complete: job_201705010710_0010
17/05/01 10:09:17 INFO mapred.JobClient: Counters: 22
17/05/01 10:09:17 INFO mapred.JobClient: Map-Reduce Framework
17/05/01 10:09:17 INFO mapred.JobClient: Spilled Records=0
17/05/01 10:09:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=289374208
17/05/01 10:09:17 INFO mapred.JobClient: Map input records=1
17/05/01 10:09:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=152
17/05/01 10:09:17 INFO mapred.JobClient: Map output records=0
17/05/01 10:09:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=38797312
17/05/01 10:09:17 INFO mapred.JobClient: Map input bytes=130
17/05/01 10:09:17 INFO mapred.JobClient: CPU time spent (ms)=130
17/05/01 10:09:17 INFO mapred.JobClient: Total committed heap usage (bytes)=16252928
17/05/01 10:09:17 INFO mapred.JobClient: distcp
17/05/01 10:09:17 INFO mapred.JobClient: Bytes copied=47
17/05/01 10:09:17 INFO mapred.JobClient: Bytes expected=47
17/05/01 10:09:17 INFO mapred.JobClient: Files copied=1
17/05/01 10:09:17 INFO mapred.JobClient: File Input Format Counters
17/05/01 10:09:17 INFO mapred.JobClient: Bytes Read=230
17/05/01 10:09:17 INFO mapred.JobClient: FileSystemCounters
17/05/01 10:09:17 INFO mapred.JobClient: HDFS_BYTES_READ=429
17/05/01 10:09:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53786
17/05/01 10:09:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=47
17/05/01 10:09:17 INFO mapred.JobClient: File Output Format Counters
17/05/01 10:09:17 INFO mapred.JobClient: Bytes Written=0
17/05/01 10:09:17 INFO mapred.JobClient: Job Counters
17/05/01 10:09:17 INFO mapred.JobClient: Launched map tasks=1
17/05/01 10:09:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
17/05/01 10:09:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/05/01 10:09:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4939
17/05/01 10:09:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
我们随后查看一下数据:
hive> select *
> from t444;
OK
30 beijing dongdong man
40 shanghai lisi woman
Time taken: 0.069 seconds
OK,此时我们的数据拷贝成功。
当然,源表和目标的名字可以不一样的:
hadoop distcp -m 200 -pb -bandwidth 40 -update -delete hdfs://10.198.21.227:8020/user/mart_mobile/app.db/app_jxkh_county_tmp hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county
[bdp_client@BJLFRZ-Client-50-10 total_env]$ hadoop fs -ls hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county
Found 2 items
-rw-r--r-- 3 mart_cfo mart_cfo 57340 2019-04-20 05:41 hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county/000000_0
-rw-r--r-- 3 mart_cfo mart_cfo 57770 2019-04-20 05:41 hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county/part-m-00000
[bdp_client@BJLFRZ-Client-50-10 total_env]$ hadoop fs -ls hdfs://10.198.21.227:8020/user/mart_mobile/app.db/app_jxkh_county_tmp
Found 2 items
-rwxrwxrwx 3 mart_mobile mart_mobile 57340 2019-01-17 11:01 hdfs://10.198.21.227:8020/user/mart_mobile/app.db/app_jxkh_county_tmp/000000_0
-rwxrwxrwx 3 mart_mobile mart_mobile 57770 2019-04-01 11:03 hdfs://10.198.21.227:8020/user/mart_mobile/app.db/app_jxkh_county_tmp/part-m-00000
[bdp_client@BJLFRZ-Client-50-10 total_env]$ hadoop fs -du -s hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county
115110 hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county
[bdp_client@BJLFRZ-Client-50-10 total_env]$ hadoop fs -du -s hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county
115110 hdfs://172.21.2.137:8020/user/mart_cfo/app.db/app_jxkh_county
[bdp_client@BJLFRZ-Client-50-10 total_env]$ hadoop fs -du -s hdfs://10.198.21.227:8020/user/mart_mobile/app.db/app_jxkh_county_tmp
115110 hdfs://10.198.21.227:8020/user/mart_mobile/app.db/app_jxkh_county_tmp