在最近的项目开发中遇到了一个spark无法通过jdbc从hive表中读取bigint类型字段,当数据表中有bigint类型的字段时会报如下异常:
sparkSession.read().jdbc(url,"(select id form t1) t1",ops)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36) ~[hive-serde-2.3.0.jar!/:2.3.0]
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getLong(PrimitiveObjectInspectorUtils.java:779) ~[hive-serde-2.3.0.jar!/:2.3.0]
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$LongConverter.convert(PrimitiveObjectInspectorConverter.java:183) ~[hive-serde-2.3.0.jar!/:2.3.0]
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:421) ~[hive-serde-2.3.0.jar!/:2.3.0]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:594) ~[na:na]
... 29 common frames omitted
经过检查、分析、实验发现当对sql添加了(row_number() over()) 函数后可以正常的读取。
sparkSession.read().jdbc(url,"(select (row_number() over()) rn , t.* from (
select id form t1) t ) t1",ops)
由于作者水平有限没有找到具体的错误原因,怀疑错误的原因是生产环境安装了kerberos导致。因为在开发环境下读取hive库并没有报这个错误。