Managed and External Tables(内部表、外部表)
By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. For details on the differences between managed and external table see Managed vs. External Tables.
Storage Formats
Hive supports built-in and custom-developed file formats. See CompressedStorage for details on compressed table storage.
The following are some of the formats built-in to Hive:
测试两者区别:
一、创建一张内部表(managed table):
create table emp_managed(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
数据加载进emp_managed表:
1、load data local inpath ‘/home/hadoop/data/emp.txt’ overwrite into table emp_managed;
2、select * from emp_managed;
再去MySQL中查询:
我们使用hive-site.xml中配置的那个数据库,show tables;
select * from tbls \G; //查看元数据信息
查看内部表对应hdfs的目录下的文件:
hive下详细查看该表信息
hive (ruozeg6)> desc formatted emp_managed;
OK
# Detailed Table Information
Database: ruozeg6
Owner: hadoop
CreateTime: Mon Jul 01 16:46:42 HKT 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop004:9000/user/hive/warehouse/ruozeg6.db/emp_managed
Table Type: MANAGED_TABLE
其中location就是它在hdfs对应的文件目录:
hdfs dfs -ls /user/hive/warehouse/ruozeg6.db/ //查询该目录下文件
2、hdfs目录能够得到该表信息:
[hadoop@hadoop004 data]$ hdfs dfs -ls /user/hive/warehouse/ruozeg6.db
19/07/01 16:54:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2019-07-01 14:09 /user/hive/warehouse/ruozeg6.db/emp
drwxr-xr-x - hadoop supergroup 0 2019-07-01 16:47 /user/hive/warehouse/ruozeg6.db/emp_managed
3、在hive下执行删除表操作:
hive:show tbales;
mysql:select * from tbls \G;
hdfs:hdfs dfs -ls /user/hive/warehouse
这三个文件目录下都没有信息数据、元数据信息。
删除表
使用 drop table emp_managed;
查看元数据信息 selet * from tbls \G;
已经没有emp_managed这个表信息;
so drop命令删除hive数据库和元数据库信息。
插曲:出了个错:drop(删除表时报错)
问题已解决,原因是hive的lib目录下对应的mysql驱动包不正确:https://blog.csdn.net/zhikanjiani/article/details/94544721
二、创建外部表(external table)
1、在hive下创建外部表
create EXTERNAL table emp_external(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
location '/hive/external' //指定hdfs目录
2、hive外部表创建完成后查看有无数据:
hive (ruozeg6)> select * from emp_external;
OK
Time taken: 1.309 seconds
3、mysql中去查询元数据信息
mysql> select * from tbls \G;
*************************** 1. row ***************************
TBL_ID: 1
CREATE_TIME: 1561961175
DB_ID: 6
LAST_ACCESS_TIME: 0
OWNER: hadoop
RETENTION: 0
SD_ID: 1
TBL_NAME: emp
TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
*************************** 2. row ***************************
TBL_ID: 21
CREATE_TIME: 1561970003
DB_ID: 6
LAST_ACCESS_TIME: 0
OWNER: hadoop
RETENTION: 0
SD_ID: 21
TBL_NAME: emp_external
TBL_TYPE: EXTERNAL_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
2 rows in set (0.00 sec)
3、在创建hive表的指定目录上上载emp.txt文件:
hdfs dfs -put /home/hadoop/data/emp.txt /hive/external
4、上载完文件后,再去hive中查看员工表:
hive (ruozeg6)> select * from emp_external;
OK
7369 SMITH CLERK 7902 1980-02-15 200.0 NULL 20
7566 WARD SALESMAN 7695 1981-05-25 1600.0 20.0 NULL
7698 BLADE ANALYST 7658 1992-05-25 1300.0 NULL 20
7515 JOHN ANALYST 7658 1993-05-15 4500.0 NULL 30
7621 SAIL CLERK 7902 1982-06-15 2600.0 NULL 30
Time taken: 0.294 seconds, Fetched: 5 row(s)
ERROR:
No query specified
5、此时执行drop table emp_external这个操作后:
hive下执行:show tables; 已无该表信息;
mysql元数据库下执行:select * from tbls \G; //也无emp_external表信息
但是:hdfs dfs -ls /hive/external 下还是有该份数据信息
三、总结:
外部表创建完成后,在hive下执行该语句,查询为空。
select * from emp_external;
在hdfs目录下查询/d6_hive/external 下有无数据,put上传上去
hdfs dfs -ls /d6_hive/external
hdfs dfs -put emp.txt /d6_hive/external
hdfs上传完数据后在hive中查询当前表内容,发现是有内容的,
执行这个操作:drop table emp_external;
完成后去mysql下查看,发现也没有这个表格信息。
删除内部表:数据+元数据都会删除
删除外部表:数据不删,元数据会删除