Hive-HiveDDL(Database+Table+Partition)

1. Hive DDL之数据库

官网: https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DDLOperations

1.1 创建数据库
  • 创建数据库语法(方括号表示可选项):
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
  • 实验
hive (default)> CREATE DATABASE IF NOT EXISTS ruozedata_hive;  # 创建数据库
OK
Time taken: 2.689 seconds
hive (default)> exit;
[ruoze@rzdata001 ~]$
[ruoze@rzdata001 ~]$ hadoop fs -ls /user/hive/warehouse
19/12/15 13:38:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x   - ruoze supergroup          0 2019-12-15 13:38 /user/hive/warehouse/ruozedata_hive.db
drwxr-xr-x   - ruoze supergroup          0 2019-12-15 12:36 /user/hive/warehouse/stu
[ruoze@rzdata001 ~]$ 

自己创建的Hive数据库的存放目录规则:{hive.metastore.warehouse.dir}/dbname.db,此目录可采创建时指定:

[ruoze@rzdata001 ~]$ hadoop fs -ls /user/ruoze
19/12/15 13:44:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x   - ruoze supergroup          0 2019-12-01 18:52 /user/ruoze/data
[ruoze@rzdata001 ~]$ hive
which: no hbase in (/home/ruoze/app/hive/bin:/home/ruoze/app/hadoop/bin:/home/ruoze/app/hadoop/sbin:/home/hadoop/app/hive-1.1.0-cdh5.16.2/bin:/home/ruoze/app/hadoop/bin:/home/ruoze/app/hadoop/sbin:/home/ruoze/app/hadoop/bin:/home/ruoze/app/hadoop/sbin:/usr/java/jdk1.8.0_121/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ruoze/.local/bin:/home/ruoze/bin)
19/12/15 13:46:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Logging initialized using configuration in file:/home/ruoze/app/hive-1.1.0-cdh5.16.2/conf/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive (default)> CREATE DATABASE IF NOT EXISTS ruoze_hivedb                # 创建数据库ruoze_hivedb
              > LOCATION '/user/ruoze/hivedb';                            # 指定位置
OK
Time taken: 2.963 seconds
hive (default)> desc database ruoze_hivedb;                               # 查看数据库ruoze_hivedb信息
OK
db_name comment location        owner_name      owner_type      parameters
ruoze_hivedb            hdfs://rzdata001:9000/user/ruoze/hivedb ruoze   USER
Time taken: 0.131 seconds, Fetched: 1 row(s)
hive (default)> CREATE DATABASE IF NOT EXISTS ruoze_hivedb2               # 创建语句顺序混乱,报错
              > LOCATION '/user/ruoze/hivedb2'
              > COMMENT 'this is ruozedata database'
              > WITH DBPROPERTIES('creator'='cong', 'date'='2100-12-01'); 
FAILED: ParseException line 1:75 missing EOF at 'COMMENT' near ''/user/ruoze/hivedb2''
hive (default)> CREATE DATABASE IF NOT EXISTS ruoze_hivedb2               # 创建数据库ruoze_hivedb2
              > COMMENT 'this is ruozedata database'                      # 添加备注
              > LOCATION '/user/ruoze/hivedb2'                            # 指定位置
              > WITH DBPROPERTIES('creator'='cong', 'date'='2100-12-01'); # 添加属性值
OK
Time taken: 0.046 seconds
hive (default)> desc database ruoze_hivedb2;                              # 查看数据库ruoze_hivedb2信息
OK
db_name comment location        owner_name      owner_type      parameters
ruoze_hivedb2   this is ruozedata database      hdfs://rzdata001:9000/user/ruoze/hivedb2        ruoze   USER
Time taken: 0.024 seconds, Fetched: 1 row(s)
hive (default)> exit;
[ruoze@rzdata001 ~]$ hadoop fs -ls /user/ruoze                            # 查看后台目录
19/12/15 13:52:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - ruoze supergroup          0 2019-12-01 18:52 /user/ruoze/data
drwxr-xr-x   - ruoze supergroup          0 2019-12-15 13:46 /user/ruoze/hivedb
drwxr-xr-x   - ruoze supergroup          0 2019-12-15 13:50 /user/ruoze/hivedb2
[ruoze@rzdata001 ~]$ 
1.2 修改数据库

了解

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
 
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
1.3 删除数据库

语法

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

DROP DATABASE IF EXISTS ruozedata_hive2; 此时数据库中若存在表,则会删除失败。加上CASCADE(慎用)会级联删除数据库库中的表和数据库

1.4 切换数据库
USE database_name;
USE DEFAULT;

2. Table DDL

官网:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

2.1 建表
  • 语法:
###########################################################
# 建表方式1:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

# 建表方式2:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
  
###########################################################
# 数据类型(data_type)
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)

# 原始数据类型
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)

array_type
  : ARRAY < data_type >

map_type
  : MAP < primitive_type, data_type >

struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)

###########################################################
# 行格式
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

###########################################################
# 文件格式
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | JSONFILE    -- (Note: Available in Hive 4.0.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

###########################################################

column_constraint_specification:
  : [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK  [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]

default_value:
  : [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ]  

###########################################################

constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE 
    [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
  • 自定义建表

Example:

CREATE TABLE page_view(viewTime INT,
                       userid BIGINT,
                       page_url STRING,
                       referrer_url STRING,
                       ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\001'
    COLLECTION ITEMS TERMINATED BY '\002'
    MAP KEYS TERMINATED BY '\003'
STORED AS SEQUENCEFILE
LOCATION '/user/hive/warehouse/page_view';
  • Create Table As Select (CTAS)
    限制:
    • 目标表不能是外部表
    • 目标表不能是 list bucketing 表

Example

CREATE TABLE new_key_value_store
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
   STORED AS RCFile
   AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;
  • Create Table Like
CREATE TABLE empty_key_value_store
LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)];
  • Managed and External Tables
    Hive中有两种常见类型的表:
类型英文名DROP Table备注
内部表MANAGED TABLEHDFS 和 META 都被删除默认创建
外部表EXTERNAL TABLEHDFS 不删除,仅 META 被删除指定External关键词创建

内部表和外部表互相转换。设置TBLPROPERTIES :“EXTERNAL”=“TRUE”

创建外部表:

CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User',
     country STRING COMMENT 'country of origination')
 COMMENT 'This is the staging page view table'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 STORED AS TEXTFILE
 LOCATION '/tmp/page_view';
2.2 Drop Table

语法

DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)
2.3 Truncate Table

保留表结构,清空数据
语法:

TRUNCATE TABLE table_name [PARTITION partition_spec];

partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)
2.4 Alter Table
  • Rename Table:
Alter Table table_name RENAME TO new_table_name;
  • Alter Table properties
    ALTER TABLE table_name SET TBLPROPERTIES table_properties;
    
    **table_properties:**
      : (property_name = property_value, property_name = property_value, ... )
    
  • Alter Table Comment
    ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
    
  • Alter Table Storage Properties
    ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
      INTO num_buckets BUCKETS;
    
  • Alter Table Skewed or Stored as Directories
  • Alter Table Skewed
  • Alter Table Not Skewed
  • Alter Table Not Stored as Directories
  • Alter Table Set Skewed Location
  • Alter Table Constraints
  • Additional Alter Table Statements

3. Partition DDL

  • Alter Partition
    分区表:分区其实对应的就是HDFS上的一个文件夹/目录,分区列并不是一个“真正的”表字段,其实是HDFS上表对应的文件夹下的一个文件夹,使用分区表时,加载数据一定要指定我们的所有分区字段,对于分区表操作,如果你的数据是只写入了HDFS,默认sql是查询不到的,因为元数据里没有,此时需要同步增加分区:ALTER TABLE order_partition ADD IF NOT EXISTS PARTITION(event_month=‘2014-07’) ;
    MSCK命令不建议使用。
3.1 Add Partitions
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];

partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Example:

ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808'
                        PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';

ALTER TABLE table_name ADD PARTITION (partCol = 'value1') location 'loc1';
ALTER TABLE table_name ADD PARTITION (partCol = 'value2') location 'loc2';
...
ALTER TABLE table_name ADD PARTITION (partCol = 'valueN') location 'locN';
3.2 Drop Partition
ALTER TABLE table_name DROP PARTITION (partCol = 'value1') ;
3.2 Dynamic Partitions
3.3 Rename Partition
3.4 Exchange Partition
3.5 Discover Partitions
3.6 Partition Retention
3.7 Recover Partitions (MSCK REPAIR TABLE)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值