hive常用语句总结——高级查询

最新推荐文章于 2024-02-13 22:54:06 发布

白修修

最新推荐文章于 2024-02-13 22:54:06 发布

阅读量522

点赞数 1

分类专栏：大数据操作语句文章标签： hive 大数据

本文链接：https://blog.csdn.net/weixin_41639302/article/details/107372840

版权

大数据操作语句专栏收录该内容

6 篇文章 0 订阅

订阅专栏

装载数据：INSERT表插入数据

INSERT OVERWRITE TABLE test select 'hello'; -- INSERT不支持的写法
insert into employee select * from ctas_employee; -- 通过查询语句插入
-- 多插入
from ctas_employee
insert overwrite table employee select *
insert overwrite table employee_internal select *;
-- 插入到分区
from ctas_patitioned 
insert overwrite table employee PARTITION (year, month)
select *,'2018','09';
-- 通过指定列插入(insert into可以省略table关键字)
insert into employee(name) select 'John' from test limit 1;
-- 通过指定值插入
insert into employee(name) value('Judy'),('John');
-- 从同一数据源插入本地文件，hdfs文件，表
from ctas_employee
insert overwrite local directory '/tmp/out1'  select *
insert overwrite directory '/tmp/out1' select *
insert overwrite table employee_internal select *;
-- 以指定格式插入数据
insert overwrite directory '/tmp/out3'
row format delimited fields terminated by ','
select * from ctas_employee;
-- 其他方式从表获取文件
hdfs dfs -getmerge <table_file_path>

Hive数据交换 - IMPORT/EXPORT

使用EXPORT导出数据

EXPORT TABLE employee TO '/tmp/output3';
EXPORT TABLE employee_partitioned partition (year=2014, month=11) TO '/tmp/output5';

使用IMPORT导入数据

IMPORT TABLE employee FROM '/tmp/output3';
IMPORT TABLE employee_partitioned partition (year=2014, month=11) FROM '/tmp/output5';

Hive数据排序 - ORDER BY

ORDER BY (ASC|DESC)类似于标准SQL
只使用一个Reducer执行全局数据排序
速度慢,应提前做好数据过滤
支持使用CASE WHEN或表达式
支持按位置编号排序
set hive.groupby.orderby.position.alias=true;

select * from offers order by case when offerid = 1 then 1 else 0 end;
select * from offers order by 1;

Hive数据排序 - SORT BY/DISTRIBUTE BY

SORT BY对每个Reducer中的数据进行排序
-当Reducer数量设置为1时，等于ORDER BY
-排序列必须出现在SELECT column列表中

DISTRIBUTE BY类似于标准SQL中的GROUP BY
-根据相应列以及对应reduce的个数进行分发
–默认是采用hash算法
–根据分区字段的hash码与reduce的个数进行模除
-通常使用在SORT BY语句之前

SELECT department_id , name, employee_id, evaluation_score
FROM employee_hr 
DISTRIBUTE BY department_id SORT BY evaluation_score DESC;

Hive数据排序 - CLUSTER BY

CLUSTER BY = DISTRIBUTE BY + SORT BY
不支持ASC|DESC
排序列必须出现在SELECT column列表中
为了充分利用所有的Reducer来执行全局排序，可以先使用CLUSTER BY，然后使用ORDER BY

SELECT name, employee_id FROM employee_hr CLUSTER BY name;

Hive聚合运算 - GROUP BY

GROUP BY用于分组
Hive基本内置聚合函数与GROUP BY一起使用
如果没有指定GROUP BY子句，则默认聚合整个表
除聚合函数外，所选的其他列也必须包含在GROUP BY中
GROUP BY支持使用CASE WHEN或表达式

select category, max(offervalue) from offers group by category;
-- group by使用表达式
select if(category > 4000, 'GOOD', 'BAD') as newcat,max(offervalue) from offers group by category if(category > 4000, 'GOOD', 'BAD');

白修修

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
hive常用语句总结——高级查询

装载数据：INSERT表插入数据INSERT OVERWRITE TABLE test select 'hello'; -- INSERT不支持的写法insert into employee select * from ctas_employee; -- 通过查询语句插入-- 多插入from ctas_employeeinsert overwrite table employee select *insert overwrite table employee_internal select *;
复制链接

扫一扫