.gz文件上载到hdfs中

4 篇文章 0 订阅
3 篇文章 0 订阅

.gz文件上载到hdfs中

用 dfs -copyFormLocal的方式,上载后的文件正常,可以用mapreduce直接读取;

终于找到原因了:一个配置问题,

HdfsSink中默认的serializer会每写一行在行尾添加一个换行符,这样会导致每条日志后面多一个空行,修改配置不要自动添加换行符;

agentb2.sinks.hdfs_sink2.serializer.appendNewline = false

OK


用flume的方式,datastream类型,上载后mapreduce操作异常,为何呢?

Error: java.io.EOFException: Unexpected end of input stream
    at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
    at java.io.InputStream.read(InputStream.java:101)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

设定为hdfs.filetype为sequencefile,然后mr中用sequencefile的文件输入方式也不行:

job.setInputFormatClass(SequenceFileInputFormat.class);




Error: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text
    at com.gzmrdemo.GzFileMapper.map(GzFileMapper.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


2023-07-07 20:11:52,076 INFO [upload-pool-40] c.e.d.j.DataUnitService.DataUnitService#uploadFileToHdfs[DataUnitService.java:98] 本次文件上传HDFS用时:18s 2023-07-07 20:11:52,077 INFO [upload-pool-40] c.e.d.j.DataUnitService.DataUnitService#uploadFileToHdfs[DataUnitService.java:98] 本次文件上传HDFS用时:0s 2023-07-07 20:11:52,514 INFO [upload-pool-35] c.e.d.j.DataUnitService.DataUnitService#tohiveWy[DataUnitService.java:172] /u01/tarsftp//2023070719575912003640001.txt.gz解压>>>>>>/u01/untarsftp/ 2023-07-07 20:11:52,520 WARN [Thread-4655232] o.a.h.h.DFSClient.DFSOutputStream$DataStreamer#run[DFSOutputStream.java:558] DataStreamer Exception org.apache.hadoop.ipc.RemoteException: File /dataunit/cu_access_log/10/2023070719575912003640001.txt could only be written to 0 of the 1 minReplication nodes. There are 11 datanode(s) running and no node(s) are excluded in this o peration. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.securi
最新发布
07-13
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值