1、 查看数据源的文件是否存在,权限等信息
[hadoop@hadoop1 conf]$ hadoop fs -ls 192.168.2.31:9000/tmp/pageview.log
Found 1 items
-r--r--r-- 2 hc supergroup 187764409 2013-06-07 14:21 /tmp/pageview.log
2、使用distcp命令进行拷贝
[hadoop@hadoop1 conf]$ hadoop distcp -overwrite hdfs://192.168.2.31:9000/tmp/pageview.log hdfs://hadoop1:9000/tmp/pageview.log
13/06/13 13:29:55 INFO tools.DistCp: srcPaths=[hdfs://192.168.2.31:9000/tmp/pageview.log]
13/06/13 13:29:55 INFO tools.DistCp: destPath=hdfs://hadoop1:9000/tmp/pageview.log
13/06/13 13:29:57 INFO tools.DistCp: hdfs://hadoop1:9000/tmp/pageview.log does not exist.
13/06/13 13:29:57 INFO tools.DistCp: sourcePathsCount=1
13/06/13 13:29:57 INFO tools.DistCp: filesToCopyCount=1
13/06/13 13:29:57 INFO tools.DistCp: bytesToCopyCount=179.1m
13/06/13 13:29:57 INFO mapred.JobClient: Running job: job_201305151449_0064
13/06/13 13:29:58 INFO mapred.JobClient: map 0% reduce 0%
13/06/13 13:30:30 INFO mapred.JobClient: map 100% reduce 0%
13/06/13 13:30:35 INFO mapred.JobClient: Job complete: job_201305151449_0064
13/06/13 13:30:35 INFO mapred.JobClient: Counters: 22
13/06/13 13:30:35 INFO mapred.JobClient: Job Counters
13/06/13 13:30:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34609
13/06/13 13:30:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/13 13:30:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/06/13 13:30:35 INFO mapred.JobClient: Launched map tasks=1
13/06/13 13:30:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/06/13 13:30:35 INFO mapred.JobClient: File Input Format Counters
13/06/13 13:30:35 INFO mapred.JobClient: Bytes Read=222
13/06/13 13:30:35 INFO mapred.JobClient: File Output Format Counters
13/06/13 13:30:35 INFO mapred.JobClient: Bytes Written=0
13/06/13 13:30:35 INFO mapred.JobClient: FileSystemCounters
13/06/13 13:30:35 INFO mapred.JobClient: HDFS_BYTES_READ=187764783
13/06/13 13:30:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22404
13/06/13 13:30:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=187764409
13/06/13 13:30:35 INFO mapred.JobClient: distcp
13/06/13 13:30:35 INFO mapred.JobClient: Files copied=1
13/06/13 13:30:35 INFO mapred.JobClient: Bytes copied=187764409
13/06/13 13:30:35 INFO mapred.JobClient: Bytes expected=187764409
13/06/13 13:30:35 INFO mapred.JobClient: Map-Reduce Framework
13/06/13 13:30:35 INFO mapred.JobClient: Map input records=1
13/06/13 13:30:35 INFO mapred.JobClient: Physical memory (bytes) snapshot=59801600
13/06/13 13:30:35 INFO mapred.JobClient: Spilled Records=0
13/06/13 13:30:35 INFO mapred.JobClient: CPU time spent (ms)=12550
13/06/13 13:30:35 INFO mapred.JobClient: Total committed heap usage (bytes)=7864320
13/06/13 13:30:35 INFO mapred.JobClient: Virtual memory (bytes) snapshot=880443392
13/06/13 13:30:35 INFO mapred.JobClient: Map input bytes=122
13/06/13 13:30:35 INFO mapred.JobClient: Map output records=0
2、 监测原文件和目的文件是否一致
[hadoop@hadoop1 conf]$ hadoop fs -ls hdfs://192.168.2.31:9000/tmp/pageview.log
Found 1 items
-r--r--r-- 2 hc supergroup 187764409 2013-06-07 14:21 /tmp/pageview.log
[@more@]来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23721637/viewspace-1060592/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/23721637/viewspace-1060592/