Hive分区表新增字段+重刷历史方法(避免旧分区新增字段为NULL)

1-1.建立测试表

hive> CREATE  EXTERNAL TABLE table_for_test_add_column(
    >    original_column string COMMENT '原始数据'
    >  )
    >  COMMENT 'add_column的测试表'
    >  PARTITIONED BY (
    >  `dt` string COMMENT '日'
    >  )
    >  row format delimited fields terminated by '\t'
    >  STORED AS textfile;
OK
Time taken: 0.734 seconds


1-2.插入数据

hive> insert into table hive_test.table_for_test_add_column partition(dt='20180114')  select '测试数据_0114';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = xxx_20180114135008_cd63d301-89d5-461d-b0c0-be292b308e24
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
....
Loading data to table hive_test.table_for_test_add_column partition (dt=20180114)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   HDFS Read: 4304 HDFS Write: 121 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 31.641 seconds


1-3.查询现有数据

hive> select * from hive_test.table_for_test_add_column;
OK
测试数据_0114   20180114
Time taken: 0.125 seconds, Fetched: 1 row(s)


1-4.添加列(alter table add column)

ALTER TABLE table_name 
  [PARTITION partition_spec]                 -- (Note: Hive 0.14.0 and later)
  ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
  [CASCADE|RESTRICT]                         -- (Note: Hive 1.1.0 and later)

否则新增的列在旧分区中不可见,查询数据时为NULL,重新刷数据时仍为NULL 
相似问题见 hive分区表增加字段新增字段值为空的bug

hive> alter table hive_test.table_for_test_add_column add columns (added_column string COMMENT '新添加的列') CASCADE;
OK
Time taken: 0.236 seconds

查看建表语句,已新增一列

hive> show create table  hive_test.table_for_test_add_column ;
OK
CREATE EXTERNAL TABLE `hive_test.table_for_test_add_column`(
  `original_column` string COMMENT '原始数据',
  `added_column` string COMMENT '新添加的列')
COMMENT 'add_column的测试表'
PARTITIONED BY (
  `dt` string COMMENT '日')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim'='\t',
  'serialization.format'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/hive_test.db/table_for_test_add_column'
TBLPROPERTIES (
  'last_modified_by'='xxx',
  'last_modified_time'='1515909925',
  'transient_lastDdlTime'='1515909925')
Time taken: 0.125 seconds, Fetched: 21 row(s)
hive> select * from hive_test.table_for_test_add_column;
OK
测试数据_0114   NULL    20180114
Time taken: 0.121 seconds, Fetched: 1 row(s)


1-5.新增数据

hive>  insert into table hive_test.table_for_test_add_column partition(dt='20180115')  select '测试数据_0115', '新增列数据_0115';
hive> select * from hive_test.table_for_test_add_column where dt >= 20180114;
OK
测试数据_0114   NULL    20180114
测试数据_0115   新增列数据_0115  20180115
Time taken: 0.453 seconds, Fetched: 2 row(s)


1-6.重刷旧分区数据

hive>  insert overwrite table hive_test.table_for_test_add_column partition(dt='20180114')  select original_column, '新增列数据_0114' from hive_test.table_for_ted_column where dt = '20180114';
OK
测试数据_0114   新增列数据_0114  20180114
测试数据_0115   新增列数据_0115  20180115
Time taken: 0.073 seconds, Fetched: 2 row(s)


1-7.Hive 修改字段
参考hive wiki

  • 0
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值