Hive中怎么行转列?怎么列转行?

背景引入

首先,行转列列转行是数仓工作中还比较常见的问题,通常与Hive中的collect_list``collect_set``explode等相关知识挂钩。
其次,行转列的sql编写也是面试常考题之一,面试者需要重点关注。
不知道什么是行转列的话,可以看个简陋的例子:

table01:
uid		score						 uid		score_list
01			88			>>>>>		01			88,85,92
01			85			>>>>>		02			76,70
01			92
02			76
02			70

模拟数据

with data as (
  select "Tom" as uname, "传奇" as gamename, "20240203" as `date` union all
  select "Tom" as uname, "英雄联盟" as gamename, "20240203" as `date` union all
  select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
  select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
  select "Jack" as uname, "CSGO" as gamename, "20240203" as `date`
)

行转列解决

collect_list():将分组的某一列转换为一个数组返回,不进行去重

select
    uname,
    collect_list(gamename) as gamelist
from data
group by uname

image.png

collect_set():将分组的某一列转换为一个数组返回,进行去重

select
    uname,
    collect_set(gamename) as gamelist
from data
group by uname

image.png

那我不想要返回列是数组,怎么办

使用concat()``concat_ws()函数就好(对数组内的元素做一个拼接),两者差异简单来说concat会把"null"值已关注进去,而ws则不会

select
    uname,
    concat_ws(',',collect_list(gamename)) as gamelist
from data
group by uname

image.png

列转行解决

explode()炸裂函数

具体介绍,后面会有新文章。只需要记住explode函数不要直接在select子句中使用(如果select 中 仅有一个 explode函数的字段,则允许这样的操作)。在标准SQL中,不允许在投影阶段(select子句)阶段修改行的数量,因此需要借助lateral view来实现行拓展的操作。lateral view允许你与用户定义的表生成函数(UDTF)一起工作,explode就是这样一个UDTF函数。

解决

data2 as (
  select
  uname,
  collect_list(gamename) as gamelist
  from data
  group by uname
)

select
    uname,
    game as gamename
from data2
lateral view explode(gamelist) a as game;

image.png

拓展

实际场景中,比如数仓表开发的时候,维度值是游戏名称,里面有穿越火线啊CSGO啊之类的,但是最终我们还希望得到一个all值代表全部游戏,方便下游表的统计分析,这时候可能就得在explode函数的传入参数中动点手脚了。

with data as (
    select "Tom" as uname, "传奇" as gamename, "20240203" as `date` union all
    select "Tom" as uname, "英雄联盟" as gamename, "20240203" as `date` union all
    select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
    select "Jack" as uname, "穿越火线" as gamename, "20240203" as `date` union all
    select "Jack" as uname, "CSGO" as gamename, "20240203" as `date`
)
select
    uname,
    game as gamename
from data
lateral view explode(array("all",gamename)) a as game
group by uname, game

在这里插入图片描述

  • 5
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值