背景引入
首先,行转列列转行是数仓工作中还比较常见的问题,通常与Hive中的collect_list``collect_set``explode
等相关知识挂钩。
其次,行转列的sql编写也是面试常考题之一,面试者需要重点关注。
不知道什么是行转列的话,可以看个简陋的例子:
table01:
uid score uid score_list
01 88 >>>>> 01 88,85,92
01 85 >>>>> 02 76,70
01 92
02 76
02 70
模拟数据
with data as (
select "Tom" as uname, "传奇" as gamename, "20240203" as `date` union all
select "Tom" as uname, "英雄联盟" as gamename, "20240203" as `date` union all
select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
select "Jack" as uname, "CSGO" as gamename, "20240203" as `date`
)
行转列解决
collect_list():将分组的某一列转换为一个数组返回,不进行去重
select
uname,
collect_list(gamename) as gamelist
from data
group by uname
collect_set():将分组的某一列转换为一个数组返回,进行去重
select
uname,
collect_set(gamename) as gamelist
from data
group by uname
那我不想要返回列是数组,怎么办
使用concat()``concat_ws()
函数就好(对数组内的元素做一个拼接),两者差异简单来说concat会把"null"值已关注进去,而ws则不会
select
uname,
concat_ws(',',collect_list(gamename)) as gamelist
from data
group by uname
列转行解决
explode()炸裂函数
具体介绍,后面会有新文章。只需要记住explode函数不要直接在select子句中使用(如果select 中 仅有一个 explode函数的字段,则允许这样的操作)。在标准SQL中,不允许在投影阶段(select子句)阶段修改行的数量,因此需要借助lateral view来实现行拓展的操作。lateral view允许你与用户定义的表生成函数(UDTF)一起工作,explode就是这样一个UDTF函数。
解决
data2 as (
select
uname,
collect_list(gamename) as gamelist
from data
group by uname
)
select
uname,
game as gamename
from data2
lateral view explode(gamelist) a as game;
拓展
实际场景中,比如数仓表开发的时候,维度值是游戏名称,里面有穿越火线啊CSGO啊之类的,但是最终我们还希望得到一个all值代表全部游戏,方便下游表的统计分析,这时候可能就得在explode函数的传入参数中动点手脚了。
with data as (
select "Tom" as uname, "传奇" as gamename, "20240203" as `date` union all
select "Tom" as uname, "英雄联盟" as gamename, "20240203" as `date` union all
select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
select "Jack" as uname, "CSGO" as gamename, "20240203" as `date`
)
select
uname,
game as gamename
from data
lateral view explode(array("all",gamename)) a as game
group by uname, game