文章内容输出来源:拉勾教育大数据训练营
可能是因为学的比较快的缘故,有太多大数据相关的内容需要学习,而且自己最近沉迷spark的源码和一些工作流程的钻研,hive确实记忆有些模糊,虽然面向文档可以完成正常的的hive相关的开发,但是在实习生机试的时候问了两道关于hive相关函数或者指令的选择题,一下子冷汗下来的,虽然由于题目比较基础我答得出,但我深刻意识到了自己hive方面有些模糊了,所以才有了这一篇帮助自己归纳的笔记,以后哪块内容有些模糊了直接看笔记就可以了。
文档
https://www.docs4dev.com/docs/zh/apache-hive/3.1.1/reference
hive命令
# -e:不进入hive交互窗口,执行sql语句
hive -e "select * from users"
# 执行文件中的SQL语句
hive -f hqlfile1.sql
# 执行文件中的SQL语句,将结果写入文件
hive -f hqlfile1.sql >> result1.log
数据类型
如图 随时查阅
数据类型的显示转换
hive> select cast('1111s' as int);
OK
NULL
hive> select cast('1111' as int);
OK
1111
集合数据类型
DDL
https://www.docs4dev.com/docs/zh/apache-hive/3.1.1/reference/LanguageManual_DDL.html
Overview
HiveQL DDL 语句在此处记录,包括:
- 创建数据库/架构,表,视图,函数,索引
- 删除数据库/架构,表,视图,索引
- TRUNCATE TABLE
- ALTER DATABASE/SCHEMA,表格,视图
- MSCK 修复表(或 ALTER TABLE RECOVER 分区)
- 显示数据库/架构,表,TBL 属性,视图,分区,函数,索引,列,创建表
- DESCRIBE DATABASE/SCHEMA,表名,视图名,实体化视图名
除 SHOW PARTITIONS 外,PARTITION 语句通常是 TABLE 语句的选项。
Create Database
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[MANAGEDLOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
默认行为是 RESTRICT,如果数据库不为空,则 DROP DATABASE 将失败。
Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
ALTER (DATABASE|SCHEMA) database_name SET MANAGEDLOCATION hdfs_path; -- (Note: Hive 4.0.0 and later)
Use Database
USE database_name;
USE DEFAULT;
Create Table
create [external] table [IF NOT EXISTS] table_name
[(colName colType [comment 'comment'], ...)]
[comment table_comment]
[partition by (colName colType [comment col_comment], ...)]
[clustered BY (colName, colName, ...)
[sorted by (col_name [ASC|DESC], ...)] into num_buckets buckets]
[row format row_format]
[stored as file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement];
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS]
[db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
exp1
-- 创建表
create EXTERNAL table if not exists t3(
id int
,name string
,hobby array<string>
,addr map<String,string>
)
partitioned by (dt string)
row format delimited
fields terminated by ';'
collection items terminated by ','
map keys terminated by ':'
STORED AS SEQUENCEFILE;
exp2
CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;
exp3
CREATE TABLE empty_key_value_store
LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)]