组件版本:
flink1.13.2
cdh6.3.2
hive2.1.1
问题描述:
flink实时读取日志数据写入hdfs中,保存为orc格式文件。
flink写入文件格式:
hive表建表语句:
hive表查询时报:
org.apache.hive.service.cli.HiveSQLException:
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
查看yarn日志:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:74)
at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:385)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:62)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat.getRecordReader(VectorizedOrcInputFormat.java:186)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createVectorizedReader(OrcInputFormat.java:1672)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1683)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
... 16 more
网上查找原因告知:需要调整源码
源码切换到2.1.1的分支 : git checkout rel/release-2.1.1
org.apache.orc.OrcFile$WriterVersion.from
public static WriterVersion from(int val) {
if (val >= values.length) return FUTURE; // Special handling for the magic value.
return values[val];
}
并重新打包hive-exec.jar hive-orc.jar包替换集群包。
注意:
cdh没有给出hive的源码,针对cdh中的hive包直接重新打包apache hive源码替换会出现无法连接hive的问题,hiveserver2也会在启动后几分钟内告警。
解决方法:
编译apache hive源码,并下载cdh中hive-exec.jar hive-orc.jar 比对
cdh实际包路径:/opt/cloudera/parcels/CDH/jars
/opt/cloudera/parcels/CDH/lib/hive/lib 目录中都是软连接关联到/opt/cloudera/parcels/CDH/jars
org.apache.orc.OrcFile$WriterVersion.class发现两者并无差异。故只替换该类即可。
替换包命令:
jar uvf hive-exec-2.1.1-cdh6.3.2.jar org/apache/orc/OrcFile\$WriterVersion.class org/apache/orc/OrcFile\$WriterVersion.class
jar uvf hive-orc-2.1.1-cdh6.3.2.jar org/apache/orc/OrcFile\$WriterVersion.class org/apache/orc/OrcFile\$WriterVersion.class
重新上传到 /opt/cloudera/parcels/CDH/jars 中,重启hive验证即可。
注意:替换的包在cdh的每个节点都要上传。