数据库
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
comment 数据库注释
location 数据库数据存放在hdfs上的位置
dbproperties 数据库附带属性
样例:
create database if not exists db
comment 'test database'
location 'hdfs://node1:8020/tmp/db'
with dbproperties ('key'='key');
在mysql中验证元数据
mysql> select * from DBS;
+-------+-----------------------+---------------------------------------+---------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------+---------+------------+------------+
| 1 | Default Hive database | hdfs://node1:8020/user/hive/warehouse | default | public | ROLE |
| 7 | t_d | hdfs://node1:8020/tmp/t_d | t_d | root | USER |
| 88 | test database | hdfs://node1:8020/tmp/db | db | hdfs | USER |
+-------+-----------------------+---------------------------------------+---------+------------+------------+
3 rows in set (0.00 sec)
在这个表中可以看到数据库id,名称,以及存放数据在hdfs的目录位置,数据库所有者
mysql> select * from DATABASE_PARAMS;
+-------+-----------+-------------+
| DB_ID | PARAM_KEY | PARAM_VALUE |
+-------+-----------+-------------+
| 7 | key | key |
| 88 | key | key |
+-------+-----------+-------------+
2 rows in set (0.00 sec)
这里可以看到数据库的附带属性
hive修改数据库
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, …);
– (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
– (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path;
– (Note: Hive 2.2.1, 2.4.0 and later)
The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. ALTER SCHEMA was added in Hive 0.14 (HIVE-6601).
The ALTER DATABASE … SET LOCATION statement does not move the contents of the database’s current directory to the newly specified location. It does not change the locations associated with any tables/partitions under the specified database. It only changes the default parent-directory where new tables will be added for this database. This behaviour is analogous to how changing a table-directory does not move existing partitions to a different location.
No other metadata about a database can be changed.
以上是官网i描述
1.当更新location时,不会对现有表数据库和存储目录做任何调整,只是新添加的表的存放目录会保存到新的location目录下
alter database db set location 'hdfs://node1:8020/tmp/db1';
alter database db set dbproperties ('key'='key1');
alter database db set owner user root
其他数据库信息属于元数据信息,没有办法更改
hive 修改数据库名
可以在mysql 数据库中直接修改元数据,hive似乎没有提供修改语法
表
表的创建方式:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| JSONFILE -- (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
表明和列名要求
1.表名和列名不区分大小写, 但是属性和serDe区分带下先
2.表名在如果在数据中以及存在了, 那么创建表示如果不使用 【if not exists】将会抛出异常
3.在hive 0.12版本以前,表名和列名只允许用阿拉伯数字和字符组成
4.在hive0.13版本后,表名和列名允许使用unicode字符,但是【.】,【:】 依旧不允许出现在表明和列名中
内部表(Managed tables)和 外表表(external tables)
1.内部表的元数据将有hive 自行维护,默认情况下,表数据将会保存到【/user/hive/warehouse/databasename.db/tablename/】下,但是如果在创建表的时候添加了location属性, 那么数据将会保存到location指定目录下;但是外部表数据将不由hive维护,我们可以将数据保存到远程的hdfs上,aws等其他平台中
2.表的创建方式不同,默认情况下, 我们创建的表时内部表, 但是如果加上关键字【EXTERNAL】则表示我们创建的表示外部表;但是如果不指定location, 那么默认会将数据保存的hdfs的【/user/hive/warehouse/databasename.db/tablename/】目录下
3.ARCHIVE/UNARCHIVE/TRUNCATE/MERGE/CONCATENATE这些命令只能用在内部表中
4.当删除表时,内部表会清除元数据信息以及表数据,但是外部表则只清除元数据,表数据将不会被清除
5.ACID/Transactional只能对内部表有效
6.查询缓存只对内部表有效
7.只有外部表允许RELY
8.如果表数据不需要保存的当前hdfs,或者已经存在了表数据,或者是表删除时需要保留表数据,那么可以需用外部表,否则一般使用内部表
复合结构:
struct:
hive> create table t_struct(
> id int,
> info struct<name:string, age:int>)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ',';
数据样例:
1 xiaoming,12
2 sanguiyao,34
array:
hive> create table t_array(
> id int,
> phone array<string>)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ',';
数据样例:
1 13511214321,18709082313
2 15643152365,18746753321
map:
hive> create table t_map(
> id int,
> imap map<string, string>)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ','
> map keys terminated by ':';
数据样例:
1 name:xiaoming,phone:197232142121
2 name:zhangdong,phone:19721212112
更改表
改表名
ALTER TABLE table_name RENAME TO new_table_name;
改表属性
ALTER TABLE table_name SET TBLPROPERTIES table_properties;
table_properties:
: (property_name = property_value, property_name = property_value, ... )
更改表注解
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
更改serde(序列号与反序列号)属性
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
serde_properties:
: (property_name = property_value, property_name = property_value, ... )
加载数据:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
插入数据:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT ... FROM ...
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
分区插入数据
insert into table_name select username,sum(amount) from data DISTRIBUTE by (name)