用Hive创建的Parquet格式的表,在重命名表的列名后,查询重名的列数据时显示当前列所有值为NULL。
ALTER TABLE edw.dim_own_info_snp CHANGE userid user_id bigint COMMENT '用户id'
查询结果:
0: jdbc:hive2://hadoopcbd008098.ppdgdsl.com:2> select user_id from edw.dim_own_info_snp where dt='2020-12-28' limit 1;
INFO : Compiling command(queryId=hive_20201229102317_642a58c9-b99b-4002-a6ac-730e141629bc): select user_id from edw.dim_own_info_snp where dt='2020-12-28' limit 1
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:user_id, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20201229102317_642a58c9-b99b-4002-a6ac-730e141629bc); Time taken: 1.168 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20201229102317_642a58c9-b99b-4002-a6ac-730e141629bc): select user_id from edw.dim_own_info_snp where dt='2020-12-28' limit 1
INFO : Completed executing command(queryId=hive_20201229102317_642a58c9-b99b-4002-a6ac-730e141629bc); Time taken: 0.001 seconds
INFO : OK
+----------+
| user_id |
+----------+
| NULL |
+----------+
解决方法:
1.在Hive的当前会话设置parquet.column.index.access=true属性(临时)
set parquet.column.index.access=true;
2.修改表的属性
ALTER TABLE test_parquet SET TBLPROPERTIES ('parquet.column.index.access'='true');
参考: