Hive命令操作（详细）

最新推荐文章于 2024-07-11 10:53:03 发布

置顶这条gai最靓的华哥

最新推荐文章于 2024-07-11 10:53:03 发布

阅读量579

点赞数

分类专栏： hive 文章标签： hive 大数据数据库

本文链接：https://blog.csdn.net/hua_ge_zui_liang/article/details/107139200

版权

hive 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Hive操作

Hive DDL
Hive DML
Hive shell

本文介绍有关Hive表的DDL操作、DML操作和shell操作。

Hive DDL

1、创建表

Hive创建表的语法如下：

create [temporary][external] table [if not exists][db_name.]table_name
[(col_name data_type[comment col_comment],...)]
[comment table_comment]
[partitioned by (col_name data_type [comment col_comment],...)]
[clustered by (col_name,col_name,...)[sorted by(col_name[ASC|DESC],...)] into num_buckets BUCKETS]
[SKEWED BY (col_name,col_name,...)]
 on ((col_value,col_value,...),(col_value,col_value,...),...)
 [STORED AS directories]
[
 [ROW FORMAT row_format]
 [STORED AS file_format]
  |STORED BY 'storage.handler.class.name'[with serdeproperties(...)]
]
[location hdfs_path]
[tblproperties (property_name=property_value,...)]
[AS select_statement];

创建表语句说明：

参数	说明
create table	创建一个指定名字的表
external	external关键字可以让用户创建一个外部表，在创建表的同时指定一个指向实际数据的路径（location）
like	允许用户复制现有的表结构，但是不复制数据
存储格式指定	STORED AS sequencefile/textfile/rcfile，如果文件数据是纯文本，可以使用STORED AS TEXTFILE，也可以采用更高级的存储方式，如：ORC、Parquet等。

创建内部表示例：

create table emp(
	empno int,
	ename string,
	job string,
	sal double,
	comm double
)row format delimited fields terminated by '\t';

创建外部表示例：

create external table emp_external(
	empno int,
	ename string,
	job string,
	sal double,
	comm double
)row format delimited fields terminated by '\t'
location 'hive_external/emp/';

创建分区表示例：

create table order_partition(
	orderNumber string,
	event_time string
)
partitioned by (event_month string)
row format delimited fields terminated by '\t';

2、修改表

包括重命名表、添加列、更新列等操作。
重命名表语法：

alter table table_name rename to new_table_name

示例：

# 将emp表改名为emp_new
alter table emp rename to emp_new;

添加/更新列语法：

alter table table_name add|replace columns(col_name data_type[comment col_comment],...)

注意：
add：代表新增一个字段，字段位置在所有列后面（partition列前面）；
replace：表示替换表中所有字段。

以下为示例：

# 添加列
create table student(id int,age int,name string) row format delimited fields terminated by '\t';

# 查看表结构
desc student;

# 添加一列address
alter table student add columns(address string);

# 查看表结构，可以看到已经添加了address这一列
desc student;

# 更新所有的列
alter table student replace columns(id int,name string);

# 查看表结构，现在student表中只有id和name两列
desc student;

3、显示命令

显示命令可以查询或者查看Hive数据库和表的信息。

# 查看所有数据库
show databases;

# 查看某个数据库中的所有表
show tables;

# 查看某个表的所有分区信息
show partitions;

# 查看Hive支持的所有函数
show functions;

# 查看表的信息
desc extended table_name;

# 查看更加详细的表信息
desc formatted table_name;

Hive DML

1、load

使用load可以将文本文件的数据加载到Hive表中，语法结构如下：

load data [local] inpath 'filepath' [overwrite] into table table_name [partition (partcol1=val1,partcol2=val2,...)]

参数	说明
local	指定本地文件系统中的filepath
overwrite	覆盖指定文件路径的文件

示例：

load data local inpath '/home/hadoop/data/order.txt' overwrite into table order_partition partition(event_month='2020-07');

2、insert

insert将查询结果插入到Hive表，语法结构如下：

insert overwrite table table_name [partition(partcol1=val1,partcol2=val2,...)] select_statement1 from from_statement

多insert插入：

from from_statement
insert overwrite table table_name1 [partition(partcol1=val1,partcol2=val2,...)] select_statement1
insert overwrite table table_name2 [partition(partcol1=val1,partcol2=val2,...)] select_statement2]...

动态分区插入：

insert overwrite table table_name partition(partcol1[=val1],partcol2[=val2],...)] select_statement1 from from_statement

示例：

# 拷贝原表的指定字段
create table emp2 as select empno,ename,job,deptno from emp;

# 为测试分区表准备的原始数据表
drop table order_4_partition;
create table order_4_partition(
	orderNumber string,
	event_time string
)
row format delimited fields terminated by '\t';

load data local inpath 'home/hadoop/data/order.txt' overwrite into table order_4_partition;
insert overwrite table order_partition partition(event_month='2020-07') select * from order_4_partition;

# 将结果写入到Hive指定的分区表中
insert into table order_partition partition(event_month='2020-07') select * from order_4_partition;

3、导出表数据

将Hive表中的数据导出到文件系统（本地/HDFS），语法格式如下：

insert overwrite [local] directory directory1 select ... from ...

示例：

# 导出数据到本地
insert overwrite local directory '/home/hadoop/hivetmp'
row format delimited fields terminated by '\t' lines terminated by '\n'
select * from emp;

# 导出数据到HDFS
insert overwrite directory '/hivetmp/' select * from emp;

4、select

语法如下：

select [all|distinct] select_expr,select_expr,...
from table_reference
[where where_condition]
[group by col_list[having condition]]
[cluster by col_list
 |[distribute by col_list][sort by|order by col_list]
]
[limit number]

排序说明如下：

参数	说明
order by	会对输入做全局排序，因此只有一个reducer，会导致输入规模较大时，需要较长的计算时间
sort by	不是全局排序，其在数据今日reducer前完成排序。因此，如果用sort by进行排序，并且设置mapred.reduce.tasks>1，则sort by只保证每个reducer的输出有序，不保证全局有序。
distribute by	根据指定的内容将数据分到同一个reducer
cluster by	除了具有distribute by 的功能外，还会对该字段进行排序。因此，常常认为cluster by=distribute by + sort by

示例（条件过滤）：

# 等值过滤
select * from emp where deptno=10|ename='scott';

# >=、<=过滤
select * from emp where empno>=750;

# between and 区间过滤
select ename,sal from emp where sal between 80 and 100;

# limit控制结果集记录条数
select * from emp limit 2;

# in/not in
select ename,sal,comm fro memp where ename in('smith','king');

# is/not null
select ename,sal,comm from emp where comm is null;

# 分组后条件过滤使用having，对分组结果进行筛选，（where是对单条记录进行筛选）
select avg(sal),deptno from emp group by deptno having avg(sal)>2000;

# case when then
select ename,sal,
case
when sal>1 and sal<=1000 then 'lower'
when sal>1000 and sal<=2000 then 'middle'
when sal>2000 and sal<=4000 then 'high'
else 'highest' end
from emp;

5、join

Hive中使用join关键字完成多表关联查询，语法格式如下：

join_table:
table_reference join table_factor[join_condition]
 | table_reference {left|right|full}[outer] join table_reference join_condition
 | table_reference left semi join table_reference join_condition

Hive支持等值连接（equality joins）、外连接（outer joins，包括left/right joins）。

示例：

select a.* from a join b on (a.id=b.id)

# 可以join多于两个的表
select a.val,b.val,c.val from a join b on (a.key=b.key1) join c on (c.key=b.key2)

Hive shell

Hive命令行语法结构：

hive [-hiveconf x=y]*[<-i filename>]*[<-f filename>|<-e query-string>][-S]

使用说明：

参数	说明
-i	从文件初始化HQL（Hive QL）
-e	从命令行执行指定的HQL
-f	执行HQL脚本
-v	输出执行的HQL语句到控制台
-p	指定Hive Server连接端口
-hiveconf x=y	配置Hive/Hadoop参数

示例：

# 运行一个查询
hive -e 'select count(*) from emp';

# 运行一个文件：
#（1）将sql写在query.hql文件中
select count(*) from emp
#（2）使用hive -f 后跟一个sql文件
hive -f query.hql

ps：望多多支持，后续更新中。。。

这条gai最靓的华哥

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录