HIVE--DDL
一.DDL概念介绍
DDL:数据库模式定义语言DDL(Data Definition Language),是用于描述数据库中要存储的现实世界实体的语言。
我个人理解DDL就是对表的Metadata(元数据)操作的语言
HIVE官方的是DDL地址:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
二.DDL--Database
Create Database:
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
IF NOT EXISTS:是否存在就不需要建,如果不存在就建数据库
COMMENT database_comment:数据库的备注信息
LOCATION hdfs_path:可以指定数据存储地址,一般数据库默认存放在hdfs:/user/hive/warehouse
WITH DBPROPERTIES (property_name=property_value, ...):可以加一些默认的参数
eg:
hive (default)> CREATE DATABASE IF NOT EXISTS xkwarehouse
> COMMENT 'xkwarehouse'
> LOCATION '/user/hive/xkwarehouse';
OK
Time taken: 0.022 seconds
hive (default)> show databases;
OK
database_name
default
xkdata
xkhive
xkwarehouse
Time taken: 0.025 seconds, Fetched: 4 row(s)
hive (default)> desc database xkwarehouse;
OK
db_name comment location owner_name owner_type parameters
xkwarehouse xkwarehouse hdfs://xkhadoop:9000/user/hive/xkwarehouse hadoop USER
Time taken: 0.029 seconds, Fetched: 1 row(s)
因为刚创建的数据库没有数据,所以需要把先创建一个table,然后查看下HDFS:/user/hive/xkwarehouse这个路径下是否有数据产生
hive (xkwarehouse)> create table stu (id int ,age int);
OK
Time taken: 0.096 seconds
hive (xkwarehouse)> show tables;
OK
tab_name
stu
Time taken: 0.029 seconds, Fetched: 1 row(s)
Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
CASCADE:级联删除,如果删除一个库的时候,数据库里面有表,则会报错叫你先删除表再删除库,加上这个参数,则不管是否有表存在,表+库一起删除(慎用)
eg:
hive (xkwarehouse)> show databases;
OK
database_name
default
xkdata
xkhive
xkwarehouse
Time taken: 0.014 seconds, Fetched: 4 row(s)
hive (xkwarehouse)> DROP DATABASE IF EXISTS xkdata;
OK
Time taken: 0.064 seconds
hive (xkwarehouse)> show databases;
OK
database_name
default
xkhive
xkwarehouse
Time taken: 0.018 seconds, Fetched: 3 row(s)
Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);
eg:
hive (xkwarehouse)> ALTER DATABASE xkwarehouse SET DBPROPERTIES ('creator'='xk');
OK
Time taken: 0.053 seconds
hive (xkwarehouse)> desc database xkwarehouse;
OK
db_name comment location owner_name owner_type parameters
xkwarehouse xkwarehouse hdfs://xkhadoop:9000/user/hive/xkwarehouse hadoop USER
Time taken: 0.017 seconds, Fetched: 1 row(s)
三.DDL--Create Table
查看表详细数据
hive (default)> desc formatted stu;
OK
col_name data_type comment
# col_name data_type comment
id int
age int
# Detailed Table Information
Database: default
OwnerType: USER
Owner: hadoop
CreateTime: Mon Dec 16 22:20:46 CST 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://xkhadoop:9000/user/hive/warehouse/stu
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 1
rawDataSize 4
totalSize 5
transient_lastDdlTime 1576506895
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.109 seconds, Fetched: 33 row(s)
1.外部表与内部表
外部表:
[hadoop@xkhadoop data]$ cat order_created.txt
10703007267488 2014-05-01 06:01:12.334+01
10101043505096 2014-05-01 07:28:12.342+01
10103043509747 2014-05-01 07:50:12.33+01
10103043501575 2014-05-01 09:27:12.33+01
10104043514061 2014-05-01 09:03:12.324+01
-------HIVE-------------
hive (xkhive)> CREATE EXTERNAL TABLE IF NOT EXISTS order_created (
> id string,
> create_time string
> )ROW FORMAT delimited FIELDS TERMINATED BY '/t'
> LOCATION '/test';
OK
Time taken: 0.14 seconds
---------load data ------------
hive (xkhive)> LOAD DATA LOCAL INPATH '/home/hadoop/data/order_created.txt' INTO TABLE order_created;
Loading data to table xkhive.order_created
Table xkhive.order_created stats: [numFiles=0, totalSize=0]
OK
Time taken: 0.583 seconds
hive (xkhive)> select * from order_created;
OK
order_created.id order_created.create_time
10703007267488 2014-05-01 06:01:12.334+01 NULL
10101043505096 2014-05-01 07:28:12.342+01 NULL
10103043509747 2014-05-01 07:50:12.33+01 NULL
10103043501575 2014-05-01 09:27:12.33+01 NULL
10104043514061 2014-05-01 09:03:12.324+01 NULL
Time taken: 0.253 seconds, Fetched: 5 row(s)
--------MySQL查询metadata-------
mysql> select * from tbls \G;
*************************** 4. row ***************************
TBL_ID: 11
CREATE_TIME: 1577457570
DB_ID: 2
LAST_ACCESS_TIME: 0
OWNER: hadoop
OWNER_TYPE: USER
RETENTION: 0
SD_ID: 11
TBL_NAME: order_created
TBL_TYPE: EXTERNAL_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
4 rows in set (0.00 sec)
--------drop table -------------
hive (xkhive)> drop table order_created;
OK
Time taken: 0.552 seconds
hive (xkhive)> show tables;
OK
tab_name
Time taken: 0.021 seconds
------------查询MySQL metadata----
mysql> select * from tbls \G;
-- 没有刚才那个表数据
-----------查询之前table创建指定HDFS地址------
[hadoop@xkhadoop data]$ hadoop fs -ls /test/
19/12/27 22:49:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 213 2019-12-27 22:41 /test/order_created.txt
内部表:
hive (xkhive)> CREATE TABLE IF NOT EXISTS order_created_managed (
> id string,
> create_time string
> )ROW FORMAT delimited FIELDS TERMINATED BY '/t'
> LOCATION '/test';
OK
Time taken: 0.076 seconds
-------load data------------
hive (xkhive)> LOAD DATA LOCAL INPATH '/home/hadoop/data/order_created.txt' INTO TABLE order_created_managed;
Loading data to table xkhive.order_created_managed
Table xkhive.order_created_managed stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
OK
Time taken: 0.23 seconds
hive (xkhive)> select * from order_created_managed;
OK
order_created_managed.id order_created_managed.create_time
10703007267488 2014-05-01 06:01:12.334+01 NULL
10101043505096 2014-05-01 07:28:12.342+01 NULL
10103043509747 2014-05-01 07:50:12.33+01 NULL
10103043501575 2014-05-01 09:27:12.33+01 NULL
10104043514061 2014-05-01 09:03:12.324+01 NULL
10703007267488 2014-05-01 06:01:12.334+01 NULL
10101043505096 2014-05-01 07:28:12.342+01 NULL
10103043509747 2014-05-01 07:50:12.33+01 NULL
10103043501575 2014-05-01 09:27:12.33+01 NULL
10104043514061 2014-05-01 09:03:12.324+01 NULL
Time taken: 0.059 seconds, Fetched: 10 row(s)
----------查询HDFS--------
[hadoop@xkhadoop data]$ hadoop fs -ls /test/
19/12/27 22:56:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rwxr-xr-x 1 hadoop supergroup 213 2019-12-27 22:41 /test/order_created.txt
-rwxr-xr-x 1 hadoop supergroup 213 2019-12-27 22:55 /test/order_created_copy_1.txt
--------发现多了一份数据---------
----------查询MySQL metadata-----------
mysql> select * from tbls \G;
*************************** 4. row ***************************
TBL_ID: 12
CREATE_TIME: 1577458497
DB_ID: 2
LAST_ACCESS_TIME: 0
OWNER: hadoop
OWNER_TYPE: USER
RETENTION: 0
SD_ID: 12
TBL_NAME: order_created_managed
TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
4 rows in set (0.00 sec)
----------drop table--------
hive (xkhive)> drop table order_created_managed;
OK
Time taken: 0.125 seconds
hive (xkhive)> show tables;
OK
tab_name
Time taken: 0.029 seconds
--------查询MySQL metadata-----
---发现没有order_created_managed表元数据信息----
---查询HDFS---------
[hadoop@xkhadoop data]$ hadoop fs -ls /test/
19/12/27 23:06:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `/test/': No such file or directory
内部表与外部表区别:
内部表drop table的时候会删除元数据信息和HDFS里面数据文件,外部表drop table 的时候只删除元数据信息,生产推荐使用外部表