前言
今天用spark把一个csv文件导到hive后,使用select * from test.tablett limit 10 的时候,直接给我报错了
cannot find field flow from [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6cfac647, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@bf7d971, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6e3b632c, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@564523d5, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6cb71fa4, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@59be7f12, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6c29550a, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@130d98d1, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@4793ff2b, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6d7f5e24, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@4 ... 报错的大致内容是,无法找到字段,这就让我很懵逼了,让后我看了下我的csv的字段内容
原来是因为列名里面有 . _等特殊符号的原因,解决方法也很简单了,只需要把这些特殊符号删除点就可以了,spark提供了很多列名重命名的方法,不过我的csv文件有50多列,所以还是批量替换来的快些
String[] columns = df.columns();
for(int i=0;i<columns.length;i++){
df=df.withColumnRenamed(
columns[i],
columns[i].replace(".","").replace("_","").toLowerCase()
);
}