1-1.建立测试表
hive> CREATE EXTERNAL TABLE table_for_test_add_column(
> original_column string COMMENT '原始数据'
> )
> COMMENT 'add_column的测试表'
> PARTITIONED BY (
> `dt` string COMMENT '日'
> )
> row format delimited fields terminated by '\t'
> STORED AS textfile;
OK
Time taken: 0.734 seconds
1-2.插入数据
hive> insert into table hive_test.table_for_test_add_column partition(dt='20180114') select '测试数据_0114';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = xxx_20180114135008_cd63d301-89d5-461d-b0c0-be292b308e24
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
....
Loading data to table hive_test.table_for_test_add_column partition (dt=20180114)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 4304 HDFS Write: 121 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 31.641 seconds
1-3.查询现有数据
hive> select * from hive_test.table_for_test_add_column;
OK
测试数据_0114 20180114
Time taken: 0.125 seconds, Fetched: 1 row(s)
1-4.添加列(alter table add column)
ALTER TABLE table_name
[PARTITION partition_spec] -- (Note: Hive 0.14.0 and later)
ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
[CASCADE|RESTRICT] -- (Note: Hive 1.1.0 and later)
否则新增的列在旧分区中不可见,查询数据时为NULL,重新刷数据时仍为NULL
相似问题见 hive分区表增加字段新增字段值为空的bug
hive> alter table hive_test.table_for_test_add_column add columns (added_column string COMMENT '新添加的列') CASCADE;
OK
Time taken: 0.236 seconds
查看建表语句,已新增一列
hive> show create table hive_test.table_for_test_add_column ;
OK
CREATE EXTERNAL TABLE `hive_test.table_for_test_add_column`(
`original_column` string COMMENT '原始数据',
`added_column` string COMMENT '新添加的列')
COMMENT 'add_column的测试表'
PARTITIONED BY (
`dt` string COMMENT '日')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://localhost:9000/user/hive/warehouse/hive_test.db/table_for_test_add_column'
TBLPROPERTIES (
'last_modified_by'='xxx',
'last_modified_time'='1515909925',
'transient_lastDdlTime'='1515909925')
Time taken: 0.125 seconds, Fetched: 21 row(s)
hive> select * from hive_test.table_for_test_add_column;
OK
测试数据_0114 NULL 20180114
Time taken: 0.121 seconds, Fetched: 1 row(s)
1-5.新增数据
hive> insert into table hive_test.table_for_test_add_column partition(dt='20180115') select '测试数据_0115', '新增列数据_0115';
hive> select * from hive_test.table_for_test_add_column where dt >= 20180114;
OK
测试数据_0114 NULL 20180114
测试数据_0115 新增列数据_0115 20180115
Time taken: 0.453 seconds, Fetched: 2 row(s)
1-6.重刷旧分区数据
hive> insert overwrite table hive_test.table_for_test_add_column partition(dt='20180114') select original_column, '新增列数据_0114' from hive_test.table_for_ted_column where dt = '20180114';
OK
测试数据_0114 新增列数据_0114 20180114
测试数据_0115 新增列数据_0115 20180115
Time taken: 0.073 seconds, Fetched: 2 row(s)