hive sql 第四天

最新推荐文章于 2024-05-19 14:54:20 发布

逗×

最新推荐文章于 2024-05-19 14:54:20 发布

阅读量109

点赞数

文章标签： hive sql 大数据 Powered by 金山文档

本文链接：https://blog.csdn.net/peimina/article/details/129370260

版权

https://blog.csdn.net/helloHbulie/article/details/115376657

继续学习

lateral view 、explode、reflect与窗口函数

使用explode函数将hive表中的Map与Array字段数据进行拆分

lateral view用于和split、explode等UDTF一起使用，能将一行数据拆分成多行数据，在此基础上可以对拆分的数据进行聚合，lateral view首先为原始表的每行调用UDTF，UDTF会把一行拆分成一行或者多行，lateral view再把结果组合，产生一个支持别名表的虚拟表

其中explode还可以用于将hive一列中复杂的array或者map结构拆分成多行

现有数据格式如下，字段之间使用\t分割

zhangsan child1,child2,child3,child4 k1:v1,k2:v2
lisi child5,child6,child7,child8 k3:v3,k4:v4

需求：

将所有的child进行拆开成为一列

将map的key和value也进行拆开

创建hive数据库

create database hive_explode;
use hive_explode

创建hive表，然后使用explode拆分map和array

create table t3(name string,children array<string>,address Map<string,string>)

加载数据

mkdir -p /export/servers/hivedatas/
cd /export/servers/hivedatas/
vim maparray
zhangsan child1,child2,child3,child4 k1:v1,k2:v2
lisi child5,child6,child7,child8 k3:v3,k4:v4

hive表中加载数据

load data local inpath '/export/servers/hivedatas/maparray' into table t3;

使用explode将hive当中的数据拆开

将array当中的数据拆分开

select explode(children) as myChild from t3;

将map当中的数据拆分开

select explode(address) as (myMapKey,myMapValue) from t3;

使用explode拆分json字符串

需求：现在有一些数据格式如下：

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

其中字段与字段之间的分隔符是|

需要解析得到所有的monthSales对应的值为一下这一列（行转列）

4900

2090

6987

创建hive表

create table explode_lateral_view('area' string,'goods_id' string,'sale_info' string) row format delimited fields terminated by '|' stored as textfile;

准备数据并加载数据

cd /export/servers/hivedatas
vim explode_json

加载数据到hive表当中

load data local inpath '/export/servers/hivedatas/explode_json' overwrite into table explode_lateral_view;

使用explode拆分Array

select explode(split(goods_id,',')) as good_id from explode_lateral_view;

使用explode拆分Map

select explode(split(area,',')) as area from explode_lateral_view;

拆解json字段

select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as sale_info from explode_lateral_view;

然后用get_json_object来获取key为monthSales的数据：

select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')),$.monthSales') as sale_info from explode_lateral_view（报错）

使用UDTF的时候，只支持一个字段，这个时候就需要lateral view出厂

配合lateral view使用

配合lateral view查询多个字段

select goods_id2,sale_info from explode_lateral_view lateral view explode(split(goods_id,',')) goods as goods_id2;

其中lateral view explode(split(goods_id,',')) goods 相当于一个虚拟表，与原表explode_lateral_view笛卡尔积关联

也可以多重使用

select goods_id2,sale_info,area2 from explode_lateral_view lateral view explode(split(goods_id,',')) goods as goods_id2 lateral view explode(split(area,',')) area as area2;(三个表笛卡尔积的结果)

最终我们可以把这个json格式的一行数据，完全转换成二维表的方式展现

select get_json_object(concat('{',sale_info_1,'}'),'$.source') as source,
get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales,
get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as monthSales,
get_json_object(concat('{',sale_info_1,'}'),$.score') as monthSales from explode_lateral_view lateral view explode(split(regexp_replace(regexp_replace(sale_info,'\[\{',''),'}]',''),'},\{'))sale_info as sale_info_1;

总结

lateral view通常和UDTF一起出现，为了解决UDTF不允许在select字段的问题。Multiple Lateral View可以实现类似打卡耳机乘积。Outer关键字可以把不输出的UDTF的空结果，输出成NULL，防止丢失数据。

行转列

列转行

所需函数：

EXPLODE(col):将hive一列中复杂的array或者map结构拆分成多行

LATERAL VIEW 用法：LATERAL VIEW udtf（expression) tableAlias AS columnAlias

解释：用于和split，explode等UDTF一起使用，它能将一列数据拆分成多行数据，在此基础上可以对拆分后的数据进行聚合

数据准备

cd /export/servers/hivedatas
vim movie.txt

文件内容如下：数据字段之间使用\t进行分割

疑犯追踪》悬疑,动作,科幻,剧情

《Lie to me》悬疑,警匪,动作,心理,剧情

《战狼2》战争,动作,灾难

需求：将电影分类中的数组数据展开。结果如下：

《疑犯追踪》悬疑

《疑犯追踪》动作

《疑犯追踪》科幻

《疑犯追踪》剧情

《Lie to me》悬疑

《Lie to me》警匪

《Lie to me》动作

《Lie to me》心理

《Lie to me》剧情

《战狼2》战争

《战狼2》动作

《战狼2》灾难

创建hive表

create table movie_info(movie string,category array<string>)
row format delimited fields terminated by "\t" collection items terminated by ",";

加载数据

load data local inpath "/export/servers/hivedatas/movie.txt" into table movie_info ;

按需求查询数据

select movie,category_name from movie_info lateral view explode(category) table_tmp as category_name;

reflect函数

reflect函数可以支持在sql中调用java的自带函数，秒杀一切udf函数

需求：使用java.lang.Math当中的Max求两列中最大值

实现步骤

创建hive

create table test_udf(col1 int,col2 int) row format delimited fields terminated by ',';

准备数据并加载数据

cd /expoort/servers/hivedatas
vim test_udf

文件内容如下

1,2

4,3

6,4

7,5

5,6

加载数据

load data local inpath '/export/servers/hivedatas/test_udf' overwrite into table test_udf;

使用java.lang.Math当中的Max求两列当中的最大值

select reflect("java.lang.Math","max",col1,col2) from test_udf

需求2：文件中不同的记录来执行不同的java的内置函数

实现步骤：

创建hive表

create table test_udf2(class_name string,method_name string ,col1 int,col2 int) row format delimited fileds terminated by ',';

准备数据

cd /export/servers/hivedatas
vim test_udf2

文件内容如下

java.lang.Math,min,1,2

java.lang.Math,max,2,3

加载数据

load data local inpath '/export/servers/hivedatas/test_udf2'  overwrite into table test_udf2;

执行查询

select reflect (class_name,method_name,col1,col2) from test_udf2;

需求3：判断是否为数字

实现方式：使用apache commons 中的函数，commons下的jar已经包含在hadoop的classpath中，所以可以直接使用

select reflect("org.apache.commons.lang.math.NumberUtils","isNumber","123")

逗×

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
hive sql 第四天

hive sql
复制链接

扫一扫

name	constellation	blood_type
孙悟空	白羊座	A
老王	射手座	A
宋宋	白羊座	B
猪八戒	白羊座	A
凤姐	射手座	A

hive sql 第四天

lateral view 、explode、reflect与窗口函数

使用explode函数将hive表中的Map与Array字段数据进行拆分

使用explode拆分json字符串

配合lateral view使用

行转列

列转行

“相关推荐”对你有帮助么？