报错:java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197)
java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at org.apache.hadoop.tools.CopyListingFileStatus.readFields(CopyListingFileStatus.java:366)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2332)
at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2305)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:78)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:388) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:302) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
描述:
hadoop --config xxxx_dir -put / get 均正常,只有distcp 失败,日志较少,只能定位到read file 阶段失败。
版本:
source 版本:Hadoop 2.6.0-cdh5.16.2
sink 版本:Hadoop 2.6.0-cdh5.5.0
原因:因跨集群传输,不同版本Yarn 的 read api 不一致,导致以上问题
原报错在sink端执行命令
链路为:source hdfs => source-yarn => sink hdfs
正确链路应为:source hdfs => sink-yarn => sink hdfs
即--config 除hdfs-site.xml 外,使用sink端的core-site.xml 等配置
附,参考文章