1. 建表语句
Hive官网有详细的建表语句。
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
2. 创建内部表
2.1 创建dept表
create table myhive.dept
(id int,
dept_name string)
row format delimited fields terminated by ' ';
查看内部表属性,可以发现,内部表为:MANAGED_TABLE
hive (myhive)> describe formatted dept;
col_name data_type comment
# col_name data_type comment
id int
dept_name string
# Detailed Table Information
Database: myhive
Owner: hadoop
CreateTime: Fri Jun 12 05:57:53 CST 2020
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://node01:8020/user/hive/warehouse/myhive.db/dept
Table Type: **MANAGED_TABLE**
Table Parameters:
transient_lastDdlTime 1591912673
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim
serialization.format
2.2 导入数据
[hadoop@node03 ~]$ cat data/hiveData/dept.txt
1 sales
2 product
3 financial
##导入数据
hive (myhive)> load data local inpath '/home/hadoop/data/hiveData/dept.txt' into table dept;
##查询数据
hive (myhive)> select * from dept;
dept.id dept.dept_name
1 sales
2 product
3 financial
3. 创建外部表
3.1 创建employee表
create external table myhive.employee
(id int,
name string)
row format delimited fields terminated by ' ';
查看内部表属性,可以发现,外部表为:EXTERNAL_TABLE
hive (myhive)> describe formatted employee;
col_name data_type comment
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: myhive
Owner: hadoop
CreateTime: Fri Jun 12 06:05:00 CST 2020
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://node01:8020/user/hive/warehouse/myhive.db/employee
Table Type: **EXTERNAL_TABLE**
Table Parameters:
EXTERNAL TRUE
transient_lastDdlTime 1591913100
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim
serialization.format
3.2 导入数据
[hadoop@node03 ~]$ cat data/hiveData/employee.txt
1 lisa
2 tom
3 john
##导入数据
hive (myhive)> load data local inpath '/home/hadoop/data/hiveData/employee.txt' into table employee;
##查询数据
hive (myhive)> select * from employee;
employee.id employee.name
1 lisa
2 tom
3 john
4. 内部表和外部表相互转换
4.1 内部表转换为外部表
alter table dept set tblproperties('EXTERNAL'='TRUE');
4.2 外部表转换为内部表
alter table employee set tblproperties ('EXTERNAL'='FALSE');
5. 内部表和外部表区别
- 建表语法有区别,外部表在建表时有“EXTERNAL” 关键字;
- 删表后数据文件状态有区别,删除内部表后,相关数据也会被删除,删除外部表只是删除了表结构,数据文件还是存在。
6. 内部表和外部表的选择时机
6.1 何时选择使用内部表
- 数据和表结构有相同的生命周期,也就是删表时也想把数据删除,此时可以选择使用内部表;
- 数据只有本系统一张表使用,如数仓的dw层。
6.2 何时选择使用外部表
- 有多个系统或集群同时访问一份数据;
- 希望一份数据被多个表加载;
- 删除表结构无需担心数据被删除。
总结
内部表和外部表的建表语句区别不大,实际生产环境中,一般推荐使用外部表多一些。