一、新建表
1、基本建表语句
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name //EXTERNAL:外部表
[(col_name data_type [COMMENT col_comment], …)]
[COMMENT table_comment] //表描述
[PARTITIONED BY (col_name data_type [COMMENT col_comment], …)] //分区
[STORED AS file_format]
[LOCATION hdfs_path] //表在HDFS的存储路径
其中:
ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES
(property_name=property_value, property_name=property_value, ...)]
用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED,将会使用自带的 SerDe。在建表的时候,
用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的 SerDe,Hive 通过 SerDe 确定表的具体的列的数据。
STORED AS
SEQUENCEFILE //序列化文件
| TEXTFILE //普通的文本文件格式
| RCFILE //行列存储相结合的文件
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname //自定义文件格式
如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE 。
2、内部表
create table student
(id int, name string, sex string, age int,department string)
row format delimited fields terminated by ",";
3、外部表
create external table student_ext
(id int, name string, sex string, age int,department string)
row format delimited fields terminated by ","
location "/hive/student";
4、分区表
create external table student_pt
(id int, name string, sex string, age int,department string)
partitioned by (city string)
row format delimited fields terminated by ","
location "/hive/student_ptn";
alter table student_ptn add partition(city="beijing");
如果是分区表。那么数据文件一定要存储在某个分区中,而不能直接存储在表中。
5、分桶表
create external table student_bck
(id int, name string, sex string, age int,department string)
clustered by (id) sorted by (id asc, name desc) into 4 buckets
row format delimited fields terminated by ","
location "/hive/student_bck";
6、根据其他表(带数据)
create table student_ctas
as
select * from student where id < 95012;
7、复制表结构
create table student_copy like student;
如果在table的前面没有加external关键字,那么复制出来的新表。无论如何都是内部表
如果在table的前面有加external关键字,那么复制出来的新表。无论如何都是外部表
二、修改表
1、修改表名
alter table student rename to new_student;
2、增加一个字段
alter table new_student add columns (score int);
3、修改一个字段的定义
alter table new_student change name new_name string;
4、替换所有字段
alter table new_student replace columns (id int, name string, address string);
5、 添加分区
静态分区
alter table student_ptn add partition(city="chongqing");
alter table student_ptn
add partition(city="chongqing2") partition(city="chongqing3") partition(city="chongqing4");
动态分区
load data local inpath
"/home/hadoop/student.txt" into table student_ptn partition(city="beijing");
create table student_ptn_age
(id int,name string,sex string,department string) partitioned by (age int);
insert overwrite table student_ptn_age partition(age);
select id,name,sex,department,age from student_ptn;
5、 修改分区
修改分区,一般来说,都是指修改分区的数据存储目录
在添加分区的时候,直接指定当前分区的数据存储目录
alter table student_ptn add if not exists partition(city='beijing')
location '/student_ptn_beijing' partition(city='cc') location '/student_cc';
修改已经指定好的分区的数据存储目录
alter table student_ptn
partition (city='beijing') set location '/student_ptn_beijing';
此时原先的分区文件夹仍存在,但是在往分区添加数据时,只会添加到新的分区目录
6、 删除分区
alter table student_ptn drop partition (city='beijing');
7、删除表
drop table new_student;
8、清空表
truncate table student_ptn;