distcp

最新推荐文章于 2024-05-11 22:30:07 发布

fypop1

最新推荐文章于 2024-05-11 22:30:07 发布

阅读量600

点赞数

分类专栏： hadoop 文章标签：大数据 java

本文链接：https://blog.csdn.net/fypop1/article/details/84751086

版权

hadoop 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

[hadoop@hadoopmaster test]$ hadoop distcp hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir
15/11/18 05:39:30 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:39:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:39:31 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0001
15/11/18 05:39:32 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0001/
15/11/18 05:39:33 INFO tools.DistCp: DistCp job-id: job_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: Running job: job_1447853441917_0001
15/11/18 05:39:41 INFO mapreduce.Job: Job job_1447853441917_0001 running in uber mode : false
15/11/18 05:39:41 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:39:48 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: Job job_1447853441917_0001 completed successfully
15/11/18 05:39:50 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216204
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1220
HDFS: Number of bytes written=24
HDFS: Number of read operations=31
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10356
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10356
Total vcore-seconds taken by all map tasks=10356
Total megabyte-seconds taken by all map tasks=10604544
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=156
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=342798336
Virtual memory (bytes) snapshot=1753182208
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=924
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db/test1
Found 1 items
-rw-r--r-- 1 hadoop supergroup 24 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1/test.body
[hadoop@hadoopmaster test]$ hadoop fs -cat /jacktest/todir/jacktest.db/test1/test.body
1,jack
2,josson
3,gavin
[hadoop@hadoopmaster test]$

hive> create table test1(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.454 seconds
hive> select * from test1;
OK
Time taken: 0.65 seconds
hive> show create table test1;
OK
CREATE TABLE `test1`(
`id` int,
`name` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db/test1'
TBLPROPERTIES (
'transient_lastDdlTime'='1447853584')
Time taken: 0.152 seconds, Fetched: 13 row(s)

[hadoop@hadoopmaster test]$ vi test.body

1,jack
2,josson
3,gavin

关于协议
如果两个集群间的版本不一致，那么使用hdfs可能就会产生错误，因为rpc系统不兼容。那么这时候你可以使用基于http协议的hftp协议，但目标地址还必须是hdfs的，象这样：
hadoop distcp hftp://namenode:50070/user/hadoop/input hdfs://namenode:9000/user/hadoop/input1
推荐用hftp的替代协议webhdfs，源地址和目标地址都可以使用webhdfs，可以完全兼容

hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1

[hadoop@hadoopmaster test]$ hadoop fs -mkdir /jacktest/todir1
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:44:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:44:32 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:44:33 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db doesn't exist
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:45:10 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:45:10 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:45:11 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0002
15/11/18 05:45:11 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0002/
15/11/18 05:45:12 INFO tools.DistCp: DistCp job-id: job_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: Running job: job_1447853441917_0002
15/11/18 05:45:18 INFO mapreduce.Job: Job job_1447853441917_0002 running in uber mode : false
15/11/18 05:45:18 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:45:24 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: Job job_1447853441917_0002 completed successfully
15/11/18 05:45:26 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216208
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1200
HDFS: Number of bytes written=24
HDFS: Number of read operations=25
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
HFTP: Number of bytes read=0
HFTP: Number of bytes written=0
HFTP: Number of read operations=0
HFTP: Number of large read operations=0
HFTP: Number of write operations=0
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10014
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10014
Total vcore-seconds taken by all map tasks=10014
Total megabyte-seconds taken by all map tasks=10254336
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=104
CPU time spent (ms)=2240
Physical memory (bytes) snapshot=345600000
Virtual memory (bytes) snapshot=1751683072
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=928
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$

fypop1

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
distcp

[hadoop@hadoopmaster test]$ hadoop distcp hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir15/11/18 05:39:30 INFO tools.DistCp: Input Options: DistCpO...
复制链接

扫一扫

专栏目录