hive读取orc表,列为null,解决方案

最新推荐文章于 2023-09-02 09:41:11 发布

功夫猫熊yeah

最新推荐文章于 2023-09-02 09:41:11 发布

阅读量3.3k

点赞数

分类专栏：大数据 hive

本文链接：https://blog.csdn.net/weixin_39031707/article/details/103671743

版权

大数据同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

hive

11 篇文章 0 订阅

订阅专栏

in case of orc data reader schema passed by hive are all small cases and if
the column name stored in the file has any uppercase, it will return null
values for those columns even if the data is present in the file.
Column name matching while schema evolution should be case unaware.
we need to pass config for same from hive. the
config(orc.schema.evolution.case.sensitive) in orc will be exposed by
https://issues.apache.org/jira/browse/ORC-264

上述问题摘录自https://www.mail-archive.com/issues@hive.apache.org/msg99436.html

简单描述下上面英文描述的问题就是在hive2.x版本以后orc格式的表,写入时候如果orc的schema 有的字段是大写,那么hive在读取schema时候,这个字段就会被读成null,所以建议设置参数
如图所示:
在这里插入图片描述
查看orc的schema

hive --orcfiledump oss://day=20191220/hour=03/part-00000-aaef271c-dabf-49fb-9899-2960bad0a341-c000.snappy.orc

查

如图:
在这里插入图片描述

orc.schema.evolution.case.sensitive

但是具体怎么设置,网上中英文都找不到案例
于是自作主张如下设置

hive --hiveconf orc.schema.evolution.case.sensitive

结果不行
又如下设置

hive --hiveconf orc.schema.evolution.case.sensitive=true

结果还是不行

发现在写入orc文件的时候
dataframe的schema是大小写,导致无法识别元数据,知道是这个原因,那就好解决了
直接schema的列名用 tolowerCase 强转成小写,问题就迎刃而解了
修改完后的orc文件用

hive --orcfiledump oss://gateway/t_dsp_bid_detail_tbl/day=20191220/hour=04/part-00000-aaef271c-dabf-49fb-9899-2960bad0a341-c000.snappy.orc

查看结果如下:
在这里插入图片描述

查询就不会出现null列了
如图:
在这里插入图片描述

功夫猫熊yeah

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
hive读取orc表,列为null,解决方案

in case of orc data reader schema passed by hive are all small cases and ifthe column name stored in the file has any uppercase, it will return nullvalues for those columns even if the data is pres...
复制链接

扫一扫

专栏目录