地址:http://blog.csdn.net/azhao_dn/article/details/7054286#
hadoop distcp命令用于在两个不同的集群间拷贝数据,它的优点在于将数据拷贝操作作为mapred程序来执行,这样就大大提高了拷贝的速度,使用distcp命令时必须注意以下事项:
1)数据源集群 的所有节点必须 知道目标集群所有节点ip和host的转换关系
2)目标路径必须存在
3)命令中必须使用主机名,而不是ip地址
测试结果如下:
- bin/hadoop distcp hdfs://hadoopmaster:9000/data/dw/vv/20111208/vv_20111208_05_part-00000.lzo hdfs://hadoopmaster2:9000/user/rsync/test1
- 11/12/08 17:23:43 INFO tools.DistCp: srcPaths=[hdfs://hadoopmaster:9000/data/dw/vv/20111208/vv_20111208_05_part-00000.lzo]
- 11/12/08 17:23:43 INFO tools.DistCp: destPath=hdfs://hadoopmaster2:9000/user/rsync/test1
- 11/12/08 17:23:44 INFO tools.DistCp: sourcePathsCount=1
- 11/12/08 17:23:44 INFO tools.DistCp: filesToCopyCount=1
- 11/12/08 17:23:44 INFO tools.DistCp: bytesToCopyCount=30.2m
- 11/12/08 17:23:45 INFO mapred.JobClient: Running job: job_201112081643_0027
- 11/12/08 17:23:46 INFO mapred.JobClient: map 0% reduce 0%
- 11/12/08 17:24:08 INFO mapred.JobClient: map 100% reduce 0%
- 11/12/08 17:24:13 INFO mapred.JobClient: Job complete: job_201112081643_0027
- 11/12/08 17:24:13 INFO mapred.JobClient: Counters: 18
- 11/12/08 17:24:13 INFO mapred.JobClient: Job Counters
- 11/12/08 17:24:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16764
- 11/12/08 17:24:13 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 11/12/08 17:24:13 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 11/12/08 17:24:13 INFO mapred.JobClient: Launched map tasks=1
- 11/12/08 17:24:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
- 11/12/08 17:24:13 INFO mapred.JobClient: File Input Format Counters
- 11/12/08 17:24:13 INFO mapred.JobClient: Bytes Read=270
- 11/12/08 17:24:13 INFO mapred.JobClient: File Output Format Counters
- 11/12/08 17:24:13 INFO mapred.JobClient: Bytes Written=0
- 11/12/08 17:24:13 INFO mapred.JobClient: FileSystemCounters
- 11/12/08 17:24:13 INFO mapred.JobClient: HDFS_BYTES_READ=31682544
- 11/12/08 17:24:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22361
- 11/12/08 17:24:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=31682124
- 11/12/08 17:24:13 INFO mapred.JobClient: distcp
- 11/12/08 17:24:13 INFO mapred.JobClient: Files copied=1
- 11/12/08 17:24:13 INFO mapred.JobClient: Bytes copied=31682124
- 11/12/08 17:24:13 INFO mapred.JobClient: Bytes expected=31682124
- 11/12/08 17:24:13 INFO mapred.JobClient: Map-Reduce Framework
- 11/12/08 17:24:13 INFO mapred.JobClient: Map input records=1
- 11/12/08 17:24:13 INFO mapred.JobClient: Spilled Records=0
- 11/12/08 17:24:13 INFO mapred.JobClient: Map input bytes=170
- 11/12/08 17:24:13 INFO mapred.JobClient: Map output records=0
- 11/12/08 17:24:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=150