hive使用笔记

最新推荐文章于 2024-05-14 22:31:31 发布

项哥

最新推荐文章于 2024-05-14 22:31:31 发布

阅读量206

点赞数

分类专栏：数据库文章标签： hive big data

本文链接：https://blog.csdn.net/liufang1991/article/details/118938018

版权

数据库专栏收录该内容

11 篇文章 0 订阅

订阅专栏

hive官方文档
 presto官方手册

一、hive 常用

1.从json数组字符串中提取出需要的数据

例如 temp.test_table中 keywords字段是一个json字符串：[{“name”:“福奈特”,“type”:“brand”},{“name”:“邓紫棋”,“type”:“person”}]，我需要取出type=person人名的关键词
方式一，一次性展开：

select t1.* from temp.test_table a 
lateral view explode(f_json_parser(keywords, array())) t as item 
lateral view json_tuple(item, 'name', 'type') t1 as name, type
where type='person'

方式二, 借用临时表, 先展开再查询：

drop table if exists  temp.test_table1;
create table temp.test_table1
as
select b.keywordjson from temp.test_table a
lateral view explode(split(regexp_replace(regexp_replace(a.keywords , '\\]|\\[' ,'') ,'\\}\\,\\{','\\}\\;\\{' ),'\\;') ) b as keywordjson;
select get_json_object(keyword, '$.name') as name, get_json_object(keyword, '$.type') as type
from temp.test_table1 where get_json_object(keyword, '$.type')='person';

二、踩过的坑

1.order by 失效

下面两种情况下order by 无效
INSERT INTO table1 SELECT * FROM table2 ORDER BY create_time desc
CREATE table table1 AS SELECT * FROM table2 ORDER BY create_time desc

2.left semi join 只能查询前面表的字段

select * from t1 left semi join t2 on t1.id = t2.id 相当于， select * from t1 where exists (select id from t2)

3.hive 不能删除列，只能 replace columns

4.order by 容易导致数据倾斜，reduce 不出结果

三、优化

Hive/HiveSQL常用优化方法全面总结

项哥

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive使用笔记

踩过的坑order by 失效下面两种情况下order by 无效INSERT INTO table1 SELECT * FROM table2 ORDER BY create_time descCREATE table table1 AS SELECT * FROM table2 ORDER BY create_time desc
复制链接

扫一扫