hadoop 2.0.1 使用distcp问题解决

幸幸苦苦安装好了新版hadoop,
然后辛辛苦苦调通,可以跑mr了。

然后用distcp从1.0.3的集群拷数据到2.0.1的集群中。
首先是由于版本不同,不能用hdfs协议直接考,需要用http协议。
即不能用 distcp hdfs://src:54310/foo hdfs://dst:54310/
而要用 distcp hftp://src:50070/foo hdfs://dst:54310/
注意端口号哦。

然后,要在目的集群上执行该命令,也就是在2.0.1的集群上执行。

最后,尼玛碰到一个checksum mismatch的错误。
Caused by: java.io.IOException: Check-sum mismatch between hftp://src:50070/foo/yyy.yy and hdfs://dst:54310/foo/xxx.xx
网上搜了半天,这个问题发生的概率真是太小,几乎没有人问。
后来终于在cloudera的网站上找到了解决方法:
[quote]
— Distcp using MRv2 (YARN) from a CDH3 cluster to a CDH4 cluster may fail with CRC mismatch errors

Running distcp on a CDH4 YARN cluster with a CDH3 hftp source will fail if the CRC checksum type being used is the CDH4 default (CRC32C). This is because the default checksum type was changed in CDH4 from the CDH3 default of CRC32.

Bug: HADOOP-8060
Severity: Medium
Anticipated Resolution: To be fixed in an upcoming release
Workaround: You can work around this issue by changing the CRC checksum type on the CDH4 cluster to the CDH3 default, CRC32. To do this set dfs.checksum.type to CRC32 in hdfs-site.xml.

[/quote]
https://ccp.cloudera.com/display/CDH4DOC/Known+Issues+and+Work+Arounds+in+CDH4

然后照做,修改hdfs-site.xml,分发集群中,搞定。


2012.08.26----
今天又碰到一个问题,当路径中存在中文字符或者其他非英文字符时,靠谱会报错。


Error: java.io.IOException: File copy failed: hftp://xxxxxx:50070/tmp/中文路径测试/part-r-00017 --> hdfs://aaaaaa:54310/tmp/distcp_test14/part-r-00017
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://xxxxxx:50070/tmp/中文路径测试/part-r-00017 to hdfs://aaaaaa:54310/tmp/distcp_test14/part-r-00017
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)
... 10 more
Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: HTTP_OK expected, received 500
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:201)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:167)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:112)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:90)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:71)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
Caused by: java.io.IOException: HTTP_OK expected, received 500
at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderInputStream.checkResponseCode(HftpFileSystem.java:381)
at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:121)
at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:158)
at java.io.DataInputStream.read(DataInputStream.java:132)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:70)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:198)



目前还没有找到好解决办法。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值