在了解内部表和外部表区别前,我们需要先了解一下hive的架构
大家可以简单看一下这个架构图,我介绍其中要点:
- hive的数据分为两种,一种为普通数据,一种为元数据。
- 元数据存储着表的基本信息,增删改查记录,类似于Hadoop架构中的namespace。普通数据就是表中的详细数据。
- hive的元数据默认存储在derby中,但大多数情况下存储在MySQL中。普通数据如架构图所示存储在hdfs中。
下面我们来介绍表的两种类型:内部表和外部表
- 内部表(MANAGED):hive在hdfs中存在默认的存储路径,即default数据库。之后创建的数据库及表,如果没有指定路径应都在/user/hive/warehouse下,所以在该路径下的表为内部表。
- 外部表(EXTERNAL):指定了/user/hive/warehouse以外路径所创建的表
而内部表和外部表的主要区别就是
- 内部表:当删除内部表时,MySQL的元数据与HDFS上的普通数据都会删除
- 外部表:当删除外部表时,MySQL的元数据会被删除,HDFS上的数据不会被删除
下面我实际操作一下
- 创建内部表:
hive> create table emp_mag like emp;
OK
Time taken: 1.703 seconds
hive> LOAD DATA LOCAL INPATH "/home/hadoop/data/emp.txt" OVERWRITE INTO TABLE emp_mag;
Table default.emp_mag stats: [numFiles=1, totalSize=820]
OK
Time taken: 3.912 seconds
hive> desc formatted emp_mag;
OK
# col_name data_type comment
empno int
ename string
job string
mgr int
hiredate string
sal double
comm double
deptno int
# Detailed Table Information
Database: default
Owner: hadoop
CreateTime: Mon Jan 08 09:19:56 EST 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop:8020/user/hive/warehouse/emp_mag
Table Type: MANAGED_TABLE //type类型为内部表
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
totalSize 820
transient_lastDdlTime 1515421333
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.889 seconds, Fetched: 37 row(s)
- 创建外部表
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /ext/emp_ext
18/01/08 09:27:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 hadoop supergroup 820 2018-01-08 07:23 /ext/emp_ext/emp.sql
hive> create external table emp_ext(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> comm double,
> deptno int
> )row format delimited fields terminated by "\t" location '/ext/emp_ext';
OK
Time taken: 0.4 seconds
hive> desc formatted emp_ext;
OK
# col_name data_type comment
empno int
ename string
job string
mgr int
hiredate string
sal double
comm double
deptno int
# Detailed Table Information
Database: default
Owner: hadoop
CreateTime: Mon Jan 08 09:28:57 EST 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop:8020/ext/emp_ext
Table Type: EXTERNAL_TABLE //type类型为外部表
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1515421737
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.28 seconds, Fetched: 40 row(s)
- 删除内部表,查看结果
hive> drop table emp_mag
> ;
OK
Time taken: 1.096 seconds
hive> desc formatted emp_mag;
FAILED: SemanticException [Error 10001]: Table not found emp_mag
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse
18/01/08 09:38:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 8 items
drwxr-xr-x - hadoop supergroup 0 2018-01-08 06:31 /user/hive/warehouse/emp
drwxr-xr-x - hadoop supergroup 0 2018-01-08 06:39 /user/hive/warehouse/emp2
drwxr-xr-x - hadoop supergroup 0 2018-01-08 06:44 /user/hive/warehouse/emp3
drwxr-xr-x - hadoop supergroup 0 2018-01-06 20:41 /user/hive/warehouse/hello
drwxr-xr-x - hadoop supergroup 0 2018-01-06 10:40 /user/hive/warehouse/hivedb3.db
drwxr-xr-x - hadoop supergroup 0 2018-01-03 02:44 /user/hive/warehouse/wc
drwxr-xr-x - hadoop supergroup 0 2018-01-06 22:10 /user/hive/warehouse/wlpart
drwxr-xr-x - hadoop supergroup 0 2018-01-06 22:52 /user/hive/warehouse/wlpart2
- 删除外部表
hive> drop table emp_ext
> ;
OK
Time taken: 0.332 seconds
hive> desc formatted emp_ext;
FAILED: SemanticException [Error 10001]: Table not found emp_ext
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /ext/emp_ext;
18/01/08 09:52:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 hadoop supergroup 820 2018-01-08 07:23 /ext/emp_ext/emp.sql
若泽大数据交流群:671914634