hive小记

数据库

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]
  [LOCATION hdfs_path]
  [WITH DBPROPERTIES (property_name=property_value, ...)];

comment 数据库注释
location 数据库数据存放在hdfs上的位置
dbproperties 数据库附带属性

样例:
create database if not exists db 
comment 'test database'
location 'hdfs://node1:8020/tmp/db'
with dbproperties ('key'='key');

在mysql中验证元数据
mysql> select * from DBS;
+-------+-----------------------+---------------------------------------+---------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                       | NAME    | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------+---------+------------+------------+
|     1 | Default Hive database | hdfs://node1:8020/user/hive/warehouse | default | public     | ROLE       |
|     7 | t_d                   | hdfs://node1:8020/tmp/t_d             | t_d     | root       | USER       |
|    88 | test database         | hdfs://node1:8020/tmp/db              | db      | hdfs       | USER       |
+-------+-----------------------+---------------------------------------+---------+------------+------------+
3 rows in set (0.00 sec)
在这个表中可以看到数据库id,名称,以及存放数据在hdfs的目录位置,数据库所有者

mysql> select * from DATABASE_PARAMS;
+-------+-----------+-------------+
| DB_ID | PARAM_KEY | PARAM_VALUE |
+-------+-----------+-------------+
|     7 | key       | key         |
|    88 | key       | key         |
+-------+-----------+-------------+
2 rows in set (0.00 sec)
这里可以看到数据库的附带属性

hive修改数据库

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, …);
– (Note: SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
– (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path;
– (Note: Hive 2.2.1, 2.4.0 and later)

The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. ALTER SCHEMA was added in Hive 0.14 (HIVE-6601).

The ALTER DATABASE … SET LOCATION statement does not move the contents of the database’s current directory to the newly specified location. It does not change the locations associated with any tables/partitions under the specified database. It only changes the default parent-directory where new tables will be added for this database. This behaviour is analogous to how changing a table-directory does not move existing partitions to a different location.

No other metadata about a database can be changed.

以上是官网i描述

1.当更新location时,不会对现有表数据库和存储目录做任何调整,只是新添加的表的存放目录会保存到新的location目录下
alter database db set location 'hdfs://node1:8020/tmp/db1';
alter database db set dbproperties ('key'='key1');
alter database db set owner user root

其他数据库信息属于元数据信息,没有办法更改

hive 修改数据库名

可以在mysql 数据库中直接修改元数据,hive似乎没有提供修改语法

表的创建方式:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
 
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)
 
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | JSONFILE    -- (Note: Available in Hive 4.0.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
 
constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE 

表明和列名要求
1.表名和列名不区分大小写, 但是属性和serDe区分带下先
2.表名在如果在数据中以及存在了, 那么创建表示如果不使用 【if not exists】将会抛出异常
3.在hive 0.12版本以前,表名和列名只允许用阿拉伯数字和字符组成
4.在hive0.13版本后,表名和列名允许使用unicode字符,但是【.】,【:】 依旧不允许出现在表明和列名中

内部表(Managed tables)和 外表表(external tables)
1.内部表的元数据将有hive 自行维护,默认情况下,表数据将会保存到【/user/hive/warehouse/databasename.db/tablename/】下,但是如果在创建表的时候添加了location属性, 那么数据将会保存到location指定目录下;但是外部表数据将不由hive维护,我们可以将数据保存到远程的hdfs上,aws等其他平台中
2.表的创建方式不同,默认情况下, 我们创建的表时内部表, 但是如果加上关键字【EXTERNAL】则表示我们创建的表示外部表;但是如果不指定location, 那么默认会将数据保存的hdfs的【/user/hive/warehouse/databasename.db/tablename/】目录下
3.ARCHIVE/UNARCHIVE/TRUNCATE/MERGE/CONCATENATE这些命令只能用在内部表中
4.当删除表时,内部表会清除元数据信息以及表数据,但是外部表则只清除元数据,表数据将不会被清除
5.ACID/Transactional只能对内部表有效
6.查询缓存只对内部表有效
7.只有外部表允许RELY
8.如果表数据不需要保存的当前hdfs,或者已经存在了表数据,或者是表删除时需要保留表数据,那么可以需用外部表,否则一般使用内部表

复合结构:

struct:

hive> create table t_struct(
    > id int,
    > info struct<name:string, age:int>)
    > row format delimited 
    > fields terminated by '\t'
    > collection items terminated by ',';

数据样例:

1	xiaoming,12
2	sanguiyao,34

array:

hive> create table t_array(
    > id int,
    > phone array<string>)
    > row format delimited 
    > fields terminated by '\t'
    > collection items terminated by ',';

数据样例:

1	13511214321,18709082313
2	15643152365,18746753321

map:

hive> create table t_map(
    > id int,
    > imap map<string, string>)
    > row format delimited 
    > fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':';

数据样例:

1	name:xiaoming,phone:197232142121
2	name:zhangdong,phone:19721212112

更改表

改表名
ALTER TABLE table_name RENAME TO new_table_name;

改表属性
ALTER TABLE table_name SET TBLPROPERTIES table_properties;
table_properties:
  : (property_name = property_value, property_name = property_value, ... )

更改表注解
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);

更改serde(序列号与反序列号)属性
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
serde_properties:
  : (property_name = property_value, property_name = property_value, ... )

加载数据:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
 
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

插入数据:

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
 
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;


Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
  SELECT ... FROM ...
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
 
  
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)

在这里插入图片描述

分区插入数据

insert into table_name select username,sum(amount) from data DISTRIBUTE by (name)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值