一、hive 常用
1.从json数组字符串中提取出需要的数据
例如 temp.test_table中 keywords字段是一个json字符串:[{“name”:“福奈特”,“type”:“brand”},{“name”:“邓紫棋”,“type”:“person”}],我需要取出type=person人名的关键词
方式一,一次性展开:
select t1.* from temp.test_table a
lateral view explode(f_json_parser(keywords, array())) t as item
lateral view json_tuple(item, 'name', 'type') t1 as name, type
where type='person'
方式二, 借用临时表, 先展开再查询:
drop table if exists temp.test_table1;
create table temp.test_table1
as
select b.keywordjson from temp.test_table a
lateral view explode(split(regexp_replace(regexp_replace(a.keywords , '\\]|\\[' ,'') ,'\\}\\,\\{','\\}\\;\\{' ),'\\;') ) b as keywordjson;
select get_json_object(keyword, '$.name') as name, get_json_object(keyword, '$.type') as type
from temp.test_table1 where get_json_object(keyword, '$.type')='person';
二、踩过的坑
1.order by 失效
下面两种情况下order by 无效
INSERT INTO table1 SELECT * FROM table2 ORDER BY create_time desc
CREATE table table1 AS SELECT * FROM table2 ORDER BY create_time desc
2.left semi join 只能查询前面表的字段
select * from t1 left semi join t2 on t1.id = t2.id
相当于, select * from t1 where exists (select id from t2)