Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ParquetHiveRecord cannot be cast to org.apache.hadoop.io.BytesWritable at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:717) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) ... 7 more
a表是textfail格式
b表是parquet格式
表字段和类型一模一样
insert overwrite table a select* from b;
数据读取阶段:OrcInputFormat的输出结果是OrcStruct类型,其作为输入数据传给LazySimpleSerDe的deserialize方法,很明显,deserialize中进行类型转换时抛出该异常。下面是LazySimpleSerDe的doDeserialize方法源码:
@Override
public Object doDeserialize(Writable field) throws SerDeException { if (byteArrayRef == null) {
byteArrayRef = new ByteArrayRef();
} // OrcStruct -> BinaryComparable
BinaryComparable b = (BinaryComparable) field;
byteArrayRef.setData(b.getBytes());
cachedLazyStruct.init(byteArrayRef, 0, b.getLength());
lastOperationSerialize = false;
lastOperationDeserialize = true; return cachedLazyStruct;
}
修改b表的paquet格式
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';