Hive中的lateral view 、explode、reflect、行转列、列转行、窗口函数与分析函数使用与区别详解

最新推荐文章于 2024-07-07 09:46:26 发布

皮哥四月红

最新推荐文章于 2024-07-07 09:46:26 发布

阅读量6.5k

点赞数 5

分类专栏： Hive 文章标签： hive 大数据

本文链接：https://blog.csdn.net/weixin_43230682/article/details/108096441

版权

本文详细介绍了Hive中的explode函数在处理Map和Array字段、解析JSON数据以及配合LATERAL VIEW的使用方法。同时，探讨了列转行、行转列的操作，以及如何利用reflect函数调用Java内置函数。最后，讲解了Hive的窗口函数和分析函数，包括FIRST_VALUE、LAST_VALUE、LEAD、LAG等，以及GROUPING SETS、CUBE和ROLLUP的应用。

摘要由CSDN通过智能技术生成

一、使用explode函数将hive表中的Map和Array字段数据进行拆分

二、使用explode函数拆分json字符串数据

5、增强的聚合Cube和Grouping和Rollup

6、使用cube 和ROLLUP 根据GROUP BY的维度的所有组合进行聚合。

一、使用explode函数将hive表中的Map和Array字段数据进行拆分

lateral view用于和split、explode等UDTF一起使用的，能将一行数据拆分成多行数据，在此基础上可以对拆分的数据进行聚合，lateral view首先为原始表的每行调用UDTF，UDTF会把一行拆分成一行或者多行，lateral view再把结果组合，产生一个支持别名表的虚拟表。

其中explode还可以用于将hive一列中复杂的array或者map结构拆分成多行

需求：现在有数据格式如下

zhangsan      child1,child2,child3,child4     k1:v1,k2:v2
lisi  child5,child6,child7,child8     k3:v3,k4:v4

字段之间使用\t分割，需求将所有的child进行拆开成为一列

+----------+--+
| mychild   |
+----------+--+
| child1    |
| child2    |
| child3    |
| child4    |
| child5    |
| child6    |
| child7    |
| child8    |
+----------+--+

将map的key和value也进行拆开，成为如下结果

+-------------------+---------------+
| mymapkey          | mymapvalue    |
+----------------+------------------+
| k1                | v1            |
| k2                | v2            |
| k3                | v3            |
| k4                | v4            |
+-------------------+---------------+

第一步：创建hive数据库

创建hive数据库

hive (default)> create database hive_explode;
hive (default)> use hive_explode;

第二步：创建hive表，然后使用explode拆分map和array

hive (hive_explode)> create  table t3(name string,children array<string>,address Map<string,string>)
                     row format delimited fields terminated by '\t'
                     collection items terminated by ','
                     map keys terminated by ':' stored as textFile;

第三步：加载数据

node03执行以下命令创建表数据文件

mkdir -p /export/servers/hivedatas/
cd /export/servers/hivedatas/
vim maparray

zhangsan   child1,child2,child3,child4 k1:v1,k2:v2
lisi  child5,child6,child7,child8 k3:v3,k4:v4

hive表当中加载数据

hive (hive_explode)> load data local inpath '/export/servers/hivedatas/maparray' into table t3;

第四步：使用explode将hive当中数据拆开

将array当中的数据拆分开

hive (hive_explode)> SELECT explode(children) AS myChild FROM t3;

将map当中的数据拆分开

hive (hive_explode)> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;

0: jdbc:hive2://node03:10000> select explode(children) as mychild from t3;

+----------+--+
| mychild  |
+----------+--+
| child1   |
| child2   |
| child3   |
| child4   |
| child5   |
| child6   |
| child7   |
| child8   |
+----------+--+
8 rows selected (0.128 seconds)

0: jdbc:hive2://node03:10000> select name,explode(children) as mychild from t3;
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)

0: jdbc:hive2://node03:10000> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;

+-----------+-------------+--+
| mymapkey  | mymapvalue  |
+-----------+-------------+--+
| k1        | v1          |
| k2        | v2          |
| k3        | v3          |
| k4        | v4          |
+-----------+-------------+--+

二、使用explode函数拆分json字符串数据

需求：现在有一些数据格式如下：

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

其中字段与字段之间的分隔符是 “ | ”

我们要解析得到所有的monthSales对应的值为以下这一列（行转列）

4900
2090
6987

第一步：创建hive表

hive (hive_explode)> create table explode_lateral_view
                    (`area` string,
                    `goods_id` string,
                    `sale_info` string)
                    ROW FORMAT DELIMITED
                    FIELDS TERMINATED BY '|'
                    STORED AS textfile;

第二步：准备数据并加载数据

准备数据如下

cd /export/servers/hivedatas
vim explode_json

加载数据到hive表当中去

hive (hive_explode)> load data local inpath '/export/servers/hivedatas/explode_json' overwrite into table explode_lateral_view;

第三步：使用explode拆分Array

hive (hive_explode)> select explode(split(goods_id,',')) as goods_id from explode_lateral_view;

第四步：使用explode拆解Map

hive (hive_explode)> select explode(split(area,',')) as area from explode_lateral_view;

第五步：拆解json字段

hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as  sale_info from explode_lateral_view;

然后我们想用get_json_object来获取key为monthSales的数据：

hive (hive_explode)> select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')),'$.monthSales') as sale_info from explode_lateral_view;

然后出现异常FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

UDTF explode不能写在别的函数内

如果你这么写，想查两个字段，select explode(split(area,',')) as area,good_id from explode_lateral_view;

会报错FAILED: SemanticException 1:40 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'good_id'

使用UDTF的时候，只支持一个字段，这时候就需要LATERAL VIEW出场了

三、配合LATERAL VIEW使用

配合lateral view查询多个字段

hive (hive_explode)> select goods_id2,sale_info from explo

最低0.47元/天解锁文章

皮哥四月红

关注

5
点赞
踩
80

收藏

觉得还不错? 一键收藏
0
评论
Hive中的lateral view 、explode、reflect、行转列、列转行、窗口函数与分析函数使用与区别详解

目录一、使用explode函数将hive表中的Map和Array字段数据进行拆分二、使用explode函数拆分json字符串数据三、配合LATERAL VIEW使用四、列转行五、行转列六、reflect函数七、窗口函数与分析函数1、创建hive表并加载数据2、窗口函数3、OVER从句4、分析函数5、增强的聚合Cube和Grouping和Rollup6、使用cube和ROLLUP根据GROUP BY的维度的所有组合进行聚合。一、使用explode函...
复制链接

扫一扫

专栏目录