相关知识
为了完成本关任务,你需要掌握: 1.如何创建分区表; 2.如何添加分区; 3.如何重命名表分区; 4.如何交换表分区; 5.如何表分区信息持久化; 6.如何删除表分区。
创建分区表
使用shopping
数据库创建一张商品信息分区表items_info2
,按商品品牌p_brand
和商品分类p_category
进行分区:
CREATE TABLE IF NOT EXISTS shopping.items_info2(
name STRING COMMENT 'item name',
price FLOAT COMMENT 'item price',
category STRING COMMENT 'item category',
brand STRING COMMENT 'item brand',
type STRING COMMENT 'item type',
stock INT COMMENT 'item stock',
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'item sales address')
COMMENT 'goods information table'
PARTITIONED BY (p_category STRING,p_brand STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
TBLPROPERTIES ('creator'='Xiaoming','date'='2019-01-01');
添加分区
向表items_info2
添加两个分区:
ALTER TABLE items_info2 ADD PARTITION (p_category='clothes',p_brand='playboy') LOCATION '/hive/shopping/items_info2/playboy/clothes'
PARTITION (p_category='shoes',p_brand='playboy') LOCATION '/hive/shopping/items_info2/playboy/shoes';
注意:PARTITIONED BY
中的列p_category
和p_brand
为伪列,不能与表中的实体列名相同,否则hive表创建操作报错(p_category
和p_brand
分别对应表中的实体列category
、brand
)。
重命名表分区
重命名表分区的语法为:
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
- 重命名
items_info2
表的表分区playboy/clothes
为playboy/T-shirt
:
ALTER TABLE items_info2 PARTITION (p_category='clothes',p_brand='playboy') RENAME TO PARTITION (p_category='T-shirt',p_brand='playboy');
交换表分区
交换表分区的语法为:
ALTER TABLE table_name_1 EXCHANGE PARTITION (partition_spec) WITH TABLE table_name_2;
该操作移动表table_name_1
中特定分区下的数据到具有相同表模式且不存储在相应分区的table_name_2
中。
表分区信息持久化
表分区信息持久化的语法为:
MSCK REPAIR TABLE table_name;
该操作作用于同步表table_name
在HDFS
上的分区信息到Hive
位于RDBMS
的metastore
中。
删除表分区
删除表分区的语法为:
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec,PARTITION partition_spec,…;
该操作会删除与特定分区相关的数据以及metadata
。
- 删除表
items_info2
的playboy/shoes
表分区:
ALTER TABLE items_info2 DROP IF EXISTS PARTITION (p_category='shoes',p_brand='playboy');
编程要求
请根据右侧命令行内的提示,在Begin - End
区域内进行sql
语句代码补充,具体任务如下: student
表结构:
INFO | TYPE | COMMENT |
---|---|---|
Sno | INT | student sno |
name | STRING | student name |
age | INT | student age |
sex | STRING | student sex |
score | STRUCT <Chinese:FLOAT,Math:FLOAT,English:FLOAT> | student score |
- 创建数据库
test4
- 在数据库
tets4
中,创建分区表student
,表结构如上所示,和第二、三关相同,设置分区列为:stu_year
类型STRING
、subject
类型STRING
- 添加两个分区:
stu_year='2018',subject='Chinese'
和stu_year='2018',subject='Math'
- 重命名表分区:将
2018/Math
分区重命名为2018/English
- 删除表分区:将
2018/Chinese
分区删除
#********* Begin *********#
echo "
drop database if exists test4 cascade;
CREATE DATABASE IF NOT EXISTS test4
LOCATION '/hive/test4'
WITH DBPROPERTIES('creator'='John','date'='2022-04-20');
CREATE TABLE IF NOT EXISTS test4.student(
Sno INT COMMENT 'student sno',
name STRING COMMENT 'student name',
age INT COMMENT 'student age',
sex STRING COMMENT 'student sex',
score STRUCT <Chinese:FLOAT,Math:FLOAT,English:FLOAT> COMMENT 'student score')
PARTITIONED BY (stu_year STRING,subject STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',' ;
ALTER TABLE student ADD PARTITION (stu_year='2018',subject='Chinese') LOCATION '/hive/test4/student/Chinese/2018'
PARTITION (stu_year='2018',subject='Math') LOCATION '/hive/test4/student/Math/2018';
ALTER TABLE student PARTITION (stu_year='2018',subject='Math') RENAME TO PARTITION (stu_year='2018',subject='English');
ALTER TABLE student DROP IF EXISTS PARTITION (stu_year='2018',subject='Chinese');
"
#********* End *********#