Hive ORCFile 分区写入报错

Hive ORCFile 分区写入报错

报错内容

2022-03-01 10:53:14,868 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewInputBuffer(OutStream.java:107)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.write(OutStream.java:140)
	at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
	at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
	at com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:80)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:724)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1609)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1991)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2283)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:252)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1026)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2022-03-01 10:53:14,893 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: The client is stopped
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1534)
	at org.apache.hadoop.ipc.Client.call(Client.java:1478)
	at org.apache.hadoop.ipc.Client.call(Client.java:1439)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243)
	at com.sun.proxy.$Proxy9.statusUpdate(Unknown Source)
	at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:790)
	at java.lang.Thread.run(Thread.java:748)

报错原因

--查看详细表结构
 desc extended `table_nane`;

.......
 location:hdfs://solway-ha/user/hive/warehouse/xx.db/xxx_table_name, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:date, type:string, comment:null)], parameters:{orc.compress=ZLIB, transient_lastDdlTime=1645002562}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
 

ORCFile 写入分区时需要使用内存进行写入 , 默认大小为256Kb , 需要通过调节alter table table_name set tblproperties("orc.compress.size"="65536") 来降低内存使用,我这里降低为原来的四分之一,进行参数调试

方案

>alter table gdm.gdm_wt_minute set tblproperties("orc.compress.size"="65536")
-- 重跑SQL后无问题
>insert overwrite table xx.xxx_table_name partition(date) select * from xx.xxx_tmp_table;
>
......
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 19   Cumulative CPU: 1089.54 sec   HDFS Read: 5407892294 HDFS Write: 312056735 SUCCESS
Stage-Stage-3: Map: 139   Cumulative CPU: 308.08 sec   HDFS Read: 332061143 HDFS Write: 311133769 SUCCESS
Total MapReduce CPU Time Spent: 23 minutes 17 seconds 620 msec
OK
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值