【Hive】Hive内部表/外部表

1. 建表语句

Hive官网有详细的建表语句。

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
2. 创建内部表
2.1 创建dept表
create table myhive.dept 
(id int,
 dept_name string)
row format delimited fields terminated by ' ';

查看内部表属性,可以发现,内部表为:MANAGED_TABLE

hive (myhive)> describe formatted dept;
col_name        data_type       comment
# col_name              data_type               comment             
                 
id                      int                                         
dept_name               string                                      
                 
# Detailed Table Information             
Database:               myhive                   
Owner:                  hadoop                   
CreateTime:             Fri Jun 12 05:57:53 CST 2020     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://node01:8020/user/hive/warehouse/myhive.db/dept    
Table Type:             **MANAGED_TABLE**            
Table Parameters:                
        transient_lastDdlTime   1591912673          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        field.delim                                 
        serialization.format 
2.2 导入数据
[hadoop@node03 ~]$ cat data/hiveData/dept.txt 
1 sales
2 product
3 financial

##导入数据
hive (myhive)> load data local inpath '/home/hadoop/data/hiveData/dept.txt' into table dept;

##查询数据
hive (myhive)> select * from dept;
dept.id dept.dept_name
1       sales
2       product
3       financial
3. 创建外部表
3.1 创建employee表
create external table myhive.employee
(id int,
 name string)
row format delimited fields terminated by ' ';

查看内部表属性,可以发现,外部表为:EXTERNAL_TABLE

hive (myhive)> describe formatted employee;
col_name        data_type       comment
# col_name              data_type               comment             
                 
id                      int                                         
name                    string                                      
                 
# Detailed Table Information             
Database:               myhive                   
Owner:                  hadoop                   
CreateTime:             Fri Jun 12 06:05:00 CST 2020     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://node01:8020/user/hive/warehouse/myhive.db/employee        
Table Type:             **EXTERNAL_TABLE**           
Table Parameters:                
        EXTERNAL                TRUE                
        transient_lastDdlTime   1591913100          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        field.delim                                 
        serialization.format
3.2 导入数据
[hadoop@node03 ~]$ cat data/hiveData/employee.txt 
1 lisa
2 tom
3 john

##导入数据
hive (myhive)> load data local inpath '/home/hadoop/data/hiveData/employee.txt' into table employee;

##查询数据
hive (myhive)> select * from employee;
employee.id     employee.name
1       lisa
2       tom
3       john
4. 内部表和外部表相互转换
4.1 内部表转换为外部表
alter table dept set tblproperties('EXTERNAL'='TRUE');
4.2 外部表转换为内部表
alter table employee set tblproperties ('EXTERNAL'='FALSE');
5. 内部表和外部表区别
  1. 建表语法有区别,外部表在建表时有“EXTERNAL” 关键字;
  2. 删表后数据文件状态有区别,删除内部表后,相关数据也会被删除,删除外部表只是删除了表结构,数据文件还是存在。
6. 内部表和外部表的选择时机
6.1 何时选择使用内部表
  1. 数据和表结构有相同的生命周期,也就是删表时也想把数据删除,此时可以选择使用内部表;
  2. 数据只有本系统一张表使用,如数仓的dw层。
6.2 何时选择使用外部表
  1. 有多个系统或集群同时访问一份数据;
  2. 希望一份数据被多个表加载;
  3. 删除表结构无需担心数据被删除。
总结

内部表和外部表的建表语句区别不大,实际生产环境中,一般推荐使用外部表多一些。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值