参考文章:
1.【学习笔记】hive 之行拆列explode
https://yq.aliyun.com/articles/654743
HIVE 中内置了 EXPLODE 和 LATERAL VIEW,这两个 Function 非常的有用。尤其是在行转列的情形下。
2.hive中的lateral view 与 explode函数的使用
https://blog.csdn.net/guodong2k/article/details/79459282
本篇文章我们也是用了 str_to_map 函数,不熟悉的同学,可以先看下我的 str_to_map 详解。
https://blog.csdn.net/u010003835/article/details/106632517
构建测试数据
USE data_warehouse_test;
CREATE TABLE IF NOT EXISTS explode_laterview_org(
day1_num BIGINT
,day2_num BIGINT
,day3_num BIGINT
,day4_num BIGINT
,day5_num BIGINT
,day6_num BIGINT
,day7_num BIGINT
,campaign_name STRING
,campaign_id BIGINT
);
INSERT OVERWRITE TABLE explode_laterview_org VALUES
(40, 20, 10, 4, 4, 2, 1, 'zoo', 2 )
,(100, 80, 53, 40, 7, 6, 5, 'moji', 3)
;
EXPLODE
EXPLODE 主要用来拆解列或者行的,下面我们看下函数介绍:
desc function explode;
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns |
+----------------------------------------------------+
函数解释 :
可以看到EXPLODE 函数一共处理两种情况,可以把 Array 切分成 单列多行,也可以把 Map 切分成 key,value 组成的多列多行。
案例
下面我们就上面的数据做一个测试。SQL 如下
SELECT EXPLODE(
STR_TO_MAP(
CONCAT(
'day1_num=',CAST (day1_num AS STRING),
'&day2_num=',CAST (day2_num AS STRING),
'&day3_num=',CAST (day3_num AS STRING),
'&day4_num=',CAST (day4_num AS STRING),
'&day5_num=',CAST (day5_num AS STRING),
'&day6_num=',CAST (day6_num AS STRING),
'&day7_num=',CAST (day7_num AS STRING)
)
,'&', '=')
)
FROM explode_laterview_org
;
SELECT EXPLODE(
SPLIT(
CONCAT(
CAST (day1_num AS STRING),
',',CAST (day2_num AS STRING),
',',CAST (day3_num AS STRING),
',',CAST (day4_num AS STRING),
',',CAST (day5_num AS STRING),
',',CAST (day6_num AS STRING),
',',CAST (day7_num AS STRING)
)
,',')
)
FROM explode_laterview_org
;
输出
SQL1
+-----------+--------+
| key | value |
+-----------+--------+
| day1_num | 40 |
| day2_num | 20 |
| day3_num | 10 |
| day4_num | 4 |
| day5_num | 4 |
| day6_num | 2 |
| day7_num | 1 |
| day1_num | 100 |
| day2_num | 80 |
| day3_num | 53 |
| day4_num | 40 |
| day5_num | 7 |
| day6_num | 6 |
| day7_num | 5 |
+-----------+--------+
SQL2
+------+
| col |
+------+
| 40 |
| 20 |
| 10 |
| 4 |
| 4 |
| 2 |
| 1 |
| 100 |
| 80 |
| 53 |
| 40 |
| 7 |
| 6 |
| 5 |
+------+
注意
使用 EXPLODE ,需要注意一下几点
1、No other expressions are allowed in SELECT
如果只是单一使用了EXPLODE, SELECT 不可以使用其他的表达式
SELECT pageid, explode(adid_list) AS myCol... is not supported
2、UDTF's can't be nested
EXPLODE 类似的 UDTF 函数不能嵌套
SELECT explode(explode(adid_list)) AS myCol... is not supported
3.GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
EXPLODE 不可以聚合函数,排序等组合
SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
LATERAL VIEW
既然EXPLODE 限制这么多,那么很多情况下是不能满足我们需求的。 这个时候使用 LATERAL VIEW 可以去除 EXPLODE 的这些限制。
LATERAL VIEW 被解释为 侧视图,先看下LATERAL VIEW的介绍
LATERAL 按照我的理解,其实是和原来的其他列做JOIN。 因为原来的一行还有其他列,对其中一行进行了拆分,还希望带着其他列的信息,这个时候就可以使用 LATERAL VIEW。
比如针对上面的测试数据,想转换为以下的数据形式
campagin_id, campagin_name, type, num
1,'zoo','day1_num ',40
1,'zoo' ,'day2_num',30
...
这个时候,我们就可以结合上面的 EXPLODE 写出以下的语句
SELECT
campaign_id, campaign_name, type, num
FROM explode_laterview_org
LATERAL VIEW
EXPLODE(
STR_TO_MAP(
CONCAT(
'day1_num=',CAST (day1_num AS STRING),
'&day2_num=',CAST (day2_num AS STRING),
'&day3_num=',CAST (day3_num AS STRING),
'&day4_num=',CAST (day4_num AS STRING),
'&day5_num=',CAST (day5_num AS STRING),
'&day6_num=',CAST (day6_num AS STRING),
'&day7_num=',CAST (day7_num AS STRING)
)
,'&', '=')
) lateral_table AS type, num
;
结果
+--------------+----------------+-----------+------+
| campaign_id | campaign_name | type | num |
+--------------+----------------+-----------+------+
| 2 | zoo | day1_num | 40 |
| 2 | zoo | day2_num | 20 |
| 2 | zoo | day3_num | 10 |
| 2 | zoo | day4_num | 4 |
| 2 | zoo | day5_num | 4 |
| 2 | zoo | day6_num | 2 |
| 2 | zoo | day7_num | 1 |
| 3 | moji | day1_num | 100 |
| 3 | moji | day2_num | 80 |
| 3 | moji | day3_num | 53 |
| 3 | moji | day4_num | 40 |
| 3 | moji | day5_num | 7 |
| 3 | moji | day6_num | 6 |
| 3 | moji | day7_num | 5 |
+--------------+----------------+-----------+------+
额外提示:
如果有多个拆分列,需要进行笛卡尔积的组合。 可以多重使用 LATERAL VIEW
select goods_id2,sale_info,area2
from explode_lateral_view
LATERAL VIEW explode(split(goods_id,','))goods as goods_id2
LATERAL VIEW explode(split(area,','))area as area2;