数据格式:
1,小明1,reading-basketball-football,shanghai:pudong-hubei:wuhan
2,小明2,reading-basketball-football,shanghai:pudong-hubei:wuhan
3,小明3,reading-basketball-football,shanghai:pudong-hubei:wuhan
4,小明4,reading-basketball-football,shanghai:pudong-hubei:wuhan
5,小明5,reading-basketball-football,shanghai:pudong-hubei:wuhan
6,小明6,reading-basketball-football,shanghai:pudong-hubei:wuhan
7,小明7,reading-basketball-football,shanghai:pudong-hubei:wuhan
8,小明8,reading-basketball-football,shanghai:pudong-hubei:wuhan
9,小明9,reading-basketball-football,shanghai:pudong-hubei:wuhan
1.创建单分区
语法:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[ROW FORMAT row_format]
实例:
create table person3
(
id int,
name string,
hobby array<string>,
address map<string,string>
)
partitioned by (age int)
ROW FORMAT delimited
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':';
desc formatted person3;
数据导入:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
实例:
load data local inpath '/date/date' into table person3 partition (age=10);
查询person3:
在MySQL中元数据存储的分区内容
在hdfs中对应的目录
2.创建双分区(多分区)
语法:
create table person4
(
id int,
name string,
hobby array<string>,
address map<string,string>
)
partitioned by (age int,sex string)
ROW FORMAT delimited
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':';
数据导入:
load data local inpath '/date/date' into table person4 partition (age=20,sex='man');
在MySQL中元数据存储的分区内容
在hdfs创建的目录
注意partition里面的age和sex属性顺序可以颠倒,因为匹配数据时按照字段名进行匹配的,无关字段的位置
3.修改分区的值
分区属于元数据的一部分,一旦创建成功,将不会进行修改,只能修改分区字段对应的值
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE]; -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)
实例:
alter table person4 add partition (sex='girl',age=10);
4.删除分区
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE]; -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)
实例:
alter table person4 drop partition (sex='man');
注意:在添加分区值时,partition后面必须跟上定义分区的所有字段的值,不然报错。在删除时,只需在partition后面跟上所要删除字段的值
5.在外部表创建分区
在hdfs的/test目录下新建age=30/sex=man目录
create EXTERNAL table person6
(
id int,
name string,
hobby array<string>,
address map<string,string>
)
partitioned by(age int,sex string)
ROW FORMAT delimited
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':'
LOCATION '/test/';
利用此方法创建的外部表将不会读取到文件内的数据,因为没有创建分区成功,无法读取数据。
对person6 元数据修复前:
必须对元数据进行修复
hive 元数据修复命令
msck repair table xxx;
对person6 元数据修复后:
查询person6: