使用hive 对lzo数据分析时的报错

1 篇文章 0 订阅
1 篇文章 0 订阅
之前建立map作业 将文本文件通过combineInputFormat 合并 小文件并压缩为lzo文件  ,作业设置:
        conf.setInt("mapred.min.split.size", 1);
        conf.setLong("mapred.max.split.size", 600000000); // 600MB,使得每个压缩后文件120MB左右
        conf.set("mapred.output.compression.codec", "com.hadoop.compression.lzo.LzopCodec");
        conf.set("mapred.output.compression.type", "BLOCK");
        conf.setBoolean("mapred.output.compress", true); 

然后使用hive对 lzo目录进行分析报:

2014-03-03 17:00:01,494 WARN com.hadoop.compression.lzo.LzopInputStream: IOException in getCompressedData; likely LZO corruption.
java.io.IOException: Compressed length 2004251197 exceeds max block size 67108864 (probably corrupt file)
	at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:286)
	at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:256)
	at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
	at java.io.InputStream.read(InputStream.java:82)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:308)
	at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:64)
	at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:158)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:355)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:316)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:430)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
2014-03-03 17:00:01,501 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-03-03 17:00:01,503 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: java.lang.reflect.InvocationTargetException
2014-03-03 17:00:01,503 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:369)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:316)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:430)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:355)
	... 10 more
Caused by: java.io.IOException: Compressed length 2004251197 exceeds max block size 67108864 (probably corrupt file)
	at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:286)
	at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:256)
	at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
	at java.io.InputStream.read(InputStream.java:82)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:308)
	at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:64)
	at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:158)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
	... 15 more

查了很多文章 最后发现 job.xml中配置:
mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
hive.hadoop.supports.splittable.combineinputformat=true

果断 将 
hive.hadoop.supports.splittable.combineinputformat设置为false 后 正常。
原因是 lzo 压缩后 原生不支持分片,如果支持分片需要 建索引。而这里每个lzo文件相对比较小  120MB,所以 不需要建立索引  不支持分片即可。


  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值