Hive内部表和外部表的区别

在了解内部表和外部表区别前,我们需要先了解一下hive的架构
这里写图片描述

大家可以简单看一下这个架构图,我介绍其中要点:

  • hive的数据分为两种,一种为普通数据,一种为元数据。
  • 元数据存储着表的基本信息,增删改查记录,类似于Hadoop架构中的namespace。普通数据就是表中的详细数据。
  • hive的元数据默认存储在derby中,但大多数情况下存储在MySQL中。普通数据如架构图所示存储在hdfs中。

下面我们来介绍表的两种类型:内部表和外部表

  • 内部表(MANAGED):hive在hdfs中存在默认的存储路径,即default数据库。之后创建的数据库及表,如果没有指定路径应都在/user/hive/warehouse下,所以在该路径下的表为内部表。
  • 外部表(EXTERNAL):指定了/user/hive/warehouse以外路径所创建的表

而内部表和外部表的主要区别就是
- 内部表:当删除内部表时,MySQL的元数据与HDFS上的普通数据都会删除
- 外部表:当删除外部表时,MySQL的元数据会被删除,HDFS上的数据不会被删除

下面我实际操作一下
- 创建内部表:

 hive> create table emp_mag like emp;
OK
Time taken: 1.703 seconds
hive> LOAD DATA LOCAL INPATH "/home/hadoop/data/emp.txt" OVERWRITE INTO TABLE emp_mag;
Table default.emp_mag stats: [numFiles=1, totalSize=820]
OK
Time taken: 3.912 seconds
hive> desc formatted emp_mag; 
OK
# col_name              data_type               comment             

empno                   int                                         
ename                   string                                      
job                     string                                      
mgr                     int                                         
hiredate                string                                      
sal                     double                                      
comm                    double                                      
deptno                  int                                         

# Detailed Table Information         
Database:               default                  
Owner:                  hadoop                   
CreateTime:             Mon Jan 08 09:19:56 EST 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop:8020/user/hive/warehouse/emp_mag   
Table Type:             MANAGED_TABLE    //type类型为内部表        
Table Parameters:        
    COLUMN_STATS_ACCURATE   true                
    numFiles                1                   
    totalSize               820                 
    transient_lastDdlTime   1515421333          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             \t                  
    serialization.format    \t                  
Time taken: 0.889 seconds, Fetched: 37 row(s)
  • 创建外部表
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /ext/emp_ext
18/01/08 09:27:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 hadoop supergroup        820 2018-01-08 07:23 /ext/emp_ext/emp.sql
hive> create external table emp_ext(
    > empno int,
    > ename string,
    > job string,
    > mgr int,
    > hiredate string,
    > sal double,
    > comm double,
    > deptno int
    > )row format delimited fields terminated by "\t" location '/ext/emp_ext';
OK
Time taken: 0.4 seconds
hive> desc formatted emp_ext;
OK
# col_name              data_type               comment             

empno                   int                                         
ename                   string                                      
job                     string                                      
mgr                     int                                         
hiredate                string                                      
sal                     double                                      
comm                    double                                      
deptno                  int                                         

# Detailed Table Information         
Database:               default                  
Owner:                  hadoop                   
CreateTime:             Mon Jan 08 09:28:57 EST 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop:8020/ext/emp_ext   
Table Type:             EXTERNAL_TABLE      //type类型为外部表
Table Parameters:        
    COLUMN_STATS_ACCURATE   false               
    EXTERNAL                TRUE                
    numFiles                0                   
    numRows                 -1                  
    rawDataSize             -1                  
    totalSize               0                   
    transient_lastDdlTime   1515421737          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             \t                  
    serialization.format    \t                  
Time taken: 0.28 seconds, Fetched: 40 row(s)
  • 删除内部表,查看结果
hive> drop table emp_mag
    > ;
OK
Time taken: 1.096 seconds
hive> desc formatted emp_mag;
FAILED: SemanticException [Error 10001]: Table not found emp_mag
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse
18/01/08 09:38:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 8 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-08 06:31 /user/hive/warehouse/emp
drwxr-xr-x   - hadoop supergroup          0 2018-01-08 06:39 /user/hive/warehouse/emp2
drwxr-xr-x   - hadoop supergroup          0 2018-01-08 06:44 /user/hive/warehouse/emp3
drwxr-xr-x   - hadoop supergroup          0 2018-01-06 20:41 /user/hive/warehouse/hello
drwxr-xr-x   - hadoop supergroup          0 2018-01-06 10:40 /user/hive/warehouse/hivedb3.db
drwxr-xr-x   - hadoop supergroup          0 2018-01-03 02:44 /user/hive/warehouse/wc
drwxr-xr-x   - hadoop supergroup          0 2018-01-06 22:10 /user/hive/warehouse/wlpart
drwxr-xr-x   - hadoop supergroup          0 2018-01-06 22:52 /user/hive/warehouse/wlpart2
  • 删除外部表
hive> drop table emp_ext
    > ;
OK
Time taken: 0.332 seconds
hive> desc formatted emp_ext;
FAILED: SemanticException [Error 10001]: Table not found emp_ext
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /ext/emp_ext;
18/01/08 09:52:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 hadoop supergroup        820 2018-01-08 07:23 /ext/emp_ext/emp.sql

若泽大数据交流群:671914634

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值