distcp

[hadoop@hadoopmaster test]$ hadoop distcp hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir
15/11/18 05:39:30 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:39:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:39:31 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0001
15/11/18 05:39:32 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0001/
15/11/18 05:39:33 INFO tools.DistCp: DistCp job-id: job_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: Running job: job_1447853441917_0001
15/11/18 05:39:41 INFO mapreduce.Job: Job job_1447853441917_0001 running in uber mode : false
15/11/18 05:39:41 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:39:48 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: Job job_1447853441917_0001 completed successfully
15/11/18 05:39:50 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216204
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1220
HDFS: Number of bytes written=24
HDFS: Number of read operations=31
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10356
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10356
Total vcore-seconds taken by all map tasks=10356
Total megabyte-seconds taken by all map tasks=10604544
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=156
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=342798336
Virtual memory (bytes) snapshot=1753182208
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=924
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db/test1
Found 1 items
-rw-r--r-- 1 hadoop supergroup 24 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1/test.body
[hadoop@hadoopmaster test]$ hadoop fs -cat /jacktest/todir/jacktest.db/test1/test.body
1,jack
2,josson
3,gavin
[hadoop@hadoopmaster test]$


hive> create table test1(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.454 seconds
hive> select * from test1;
OK
Time taken: 0.65 seconds
hive> show create table test1;
OK
CREATE TABLE `test1`(
`id` int,
`name` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db/test1'
TBLPROPERTIES (
'transient_lastDdlTime'='1447853584')
Time taken: 0.152 seconds, Fetched: 13 row(s)


[hadoop@hadoopmaster test]$ vi test.body

1,jack
2,josson
3,gavin


关于协议
如果两个集群间的版本不一致,那么使用hdfs可能就会产生错误,因为rpc系统不兼容。那么这时候你可以使用基于http协议的hftp协议,但目标地址还必须是hdfs的,象这样:
hadoop distcp hftp://namenode:50070/user/hadoop/input hdfs://namenode:9000/user/hadoop/input1
推荐用hftp的替代协议webhdfs,源地址和目标地址都可以使用webhdfs,可以完全兼容


hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1


[hadoop@hadoopmaster test]$ hadoop fs -mkdir /jacktest/todir1
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:44:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:44:32 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:44:33 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db doesn't exist
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:45:10 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:45:10 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:45:11 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0002
15/11/18 05:45:11 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0002/
15/11/18 05:45:12 INFO tools.DistCp: DistCp job-id: job_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: Running job: job_1447853441917_0002
15/11/18 05:45:18 INFO mapreduce.Job: Job job_1447853441917_0002 running in uber mode : false
15/11/18 05:45:18 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:45:24 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: Job job_1447853441917_0002 completed successfully
15/11/18 05:45:26 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216208
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1200
HDFS: Number of bytes written=24
HDFS: Number of read operations=25
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
HFTP: Number of bytes read=0
HFTP: Number of bytes written=0
HFTP: Number of read operations=0
HFTP: Number of large read operations=0
HFTP: Number of write operations=0
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10014
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10014
Total vcore-seconds taken by all map tasks=10014
Total megabyte-seconds taken by all map tasks=10254336
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=104
CPU time spent (ms)=2240
Physical memory (bytes) snapshot=345600000
Virtual memory (bytes) snapshot=1751683072
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=928
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
package com.cliff.common; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.util.Enumeration; import org.apache.tools.zip.ZipEntry; import org.apache.tools.zip.ZipFile; import org.apache.tools.zip.ZipOutputStream; /** * * 类名: ZipUtil.java * 描述:压缩/解压缩zip包处理类 * 创建者:XXX * 创建日期:2015年5月7日 - 下午1:35:02 * 版本: V0.1 * 修改者: * 修改日期: */ public class ZipUtil { /** * * 功能描述:压缩文件 * 创建者:XXX * 创建日期: 2015年5月7日 - 下午1:35:18 * 版本: V0.1 * 修改者: * 修改日期: * @param directory 指定压缩文件路径 压缩到同目录 * @throws IOException * void */ public static void zip(String directory) throws FileNotFoundException, IOException { zip("", null, directory); } /** * * 功能描述:压缩文件 * 创建者:XXX * 创建日期: 2015年5月7日 - 下午1:36:03 * 版本: V0.1 * 修改者: * 修改日期: * @param zipFileName 压缩产生的zip包文件名--带路径,如果为null或空则默认按文件名生产压缩文件名 * @param relativePath 相对路径,默认为空 * @param directory 文件或目录的绝对路径 * void */ public static void zip(String zipFileName, String relativePath, String directory) throws FileNotFoundException, IOException { String fileName = zipFileName; if (fileName == null || fileName.trim().equals("")) { File temp = new File(directory); if (temp.isDirectory()) { fileName = directory + ".zip"; } else { if (directory.indexOf(".") > 0) { fileName = directory.substring(0, directory.lastIndexOf("."))+ "zip"; } else { fileName = directory + ".zip"; } } } ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(fileName)); try { zip(zos, relativePath, directory); } catch (IOException ex) { throw ex; } finally { if (null != zos) { zos.close(); } } } /** * * 功能描述:压缩文件 * 创建者:XXX * 创建日期: 2015年5月7日 - 下午1:37:55 * 版本: V0.1 * 修改者: * 修改日期: * @param zos 压缩输出流 * @param relativePath 相对路径 * @param absolutPath 文件或文件夹绝对路径 * @throws IOException * void */ private static void zip(ZipOutputStream zos, String relativePath, String absolutPath) throws IOException { File file = new File(absolutPath); if (file.isDirectory()) { File[] files = file.listFiles(); for (int i = 0; i < files.length; i++) { File tempFile = files[i]; if (tempFile.isDirectory()) { String newRelativePath = relativePath + tempFile.getName() + File.separator; createZipNode(zos, newRelativePath); zip(zos, newRelativePath, tempFile.getPath()); } else { zipFile(zos, tempFile, relativePath); } } } else { zipFile(zos, file, relativePath); } } /** * * 功能描述:压缩文件 * 创建者:XXX * 创建日期: 2015年5月7日 - 下午1:38:46 * 版本: V0.1 * 修改者: * 修改日期: * @param zos 压缩输出流 * @param file 文件对象 * @param relativePath 相对路径 * @throws IOException * void */ private static void zipFile(ZipOutputStream zos, File file, String relativePath) throws IOException { ZipEntry entry = new ZipEntry(relativePath + file.getName()); zos.putNextEntry(entry); InputStream is = null; try { is = new FileInputStream(file); int BUFFERSIZE = 2 <= 0) { zos.write(buffer, 0, length); } zos.flush(); zos.closeEntry(); } catch (IOException ex) { throw ex; } finally { if (null != is) { is.close(); } } } /** * * 功能描述:创建目录 * 创建者:XXX * 创建日期: 2015年5月7日 - 下午1:39:12 * 版本: V0.1 * 修改者: * 修改日期: * @param zos zip输出流 * @param relativePath 相对路径 * @throws IOException * void */ private static void createZipNode(ZipOutputStream zos, String relativePath) throws IOException { ZipEntry zipEntry = new ZipEntry(relativePath); zos.putNextEntry(zipEntry); zos.closeEntry(); } /** * * 功能描述:解压缩文件 * 创建者:XXX * 创建日期: 2015年5月7日 - 下午1:39:32 * 版本: V0.1 * 修改者: * 修改日期: * @param zipFilePath zip文件路径 * @param targetPath 解压缩到的位置,如果为null或空字符串则默认解压缩到跟zip包同目录跟zip包同名的文件夹下 * void */ public static void unzip(String zipFilePath, String targetPath) throws IOException { InputStream is = null; FileOutputStream fileOut = null; File file = null; ZipFile zipFile = null; try { zipFile = new ZipFile(zipFilePath,"GBK"); String directoryPath = ""; if (null == targetPath || "".equals(targetPath)) { directoryPath = zipFilePath.substring(0, zipFilePath.lastIndexOf(".")); } else { directoryPath = targetPath; } for(Enumeration entries = zipFile.getEntries(); entries.hasMoreElements();){ ZipEntry entry = (ZipEntry)entries.nextElement(); file = new File(directoryPath+"/"+entry.getName()); if(entry.isDirectory()){ file.mkdirs(); }else{ //如果指定文件的目录不存在,则创建之. File parent = file.getParentFile(); if(!parent.exists()){ parent.mkdirs(); } is = zipFile.getInputStream(entry); fileOut = new FileOutputStream(file); int readLen = 0; byte[] buffer = new byte[4096]; while ((readLen = is.read(buffer, 0, 4096)) >= 0) { fileOut.write(buffer, 0, readLen); } fileOut.close(); is.close(); } } zipFile.close(); } catch (IOException ex) { throw ex; } finally { if(null != zipFile){ zipFile = null; } if (null != is) { is.close(); } if (null != fileOut) { fileOut.close(); } } } /** * * 功能描述:生产文件 如果文件所在路径不存在则生成路径 * 创建者:XXX * 创建日期: 2015年5月7日 - 下午1:41:04 * 版本: V0.1 * 修改者: * 修改日期: * @param fileName 文件名 带路径 * @param isDirectory 是否为路径 * @return * File */ public static File buildFile(String fileName, boolean isDirectory) { File target = new File(fileName); if (isDirectory){ target.mkdirs(); } else { if (!target.getParentFile().exists()) { target.getParentFile().mkdirs(); target = new File(target.getAbsolutePath()); } } return target; } }
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值