hive场景题之行列转换

最新推荐文章于 2024-05-30 07:15:48 发布

qianchang_na

最新推荐文章于 2024-05-30 07:15:48 发布

阅读量791

点赞数

分类专栏：大数据学习之路文章标签： hive 场景题行列转换

本文链接：https://blog.csdn.net/qianchang_na/article/details/101172600

版权

大数据学习之路专栏收录该内容

4 篇文章 0 订阅

订阅专栏

2.行列转换

行转列：

1、使用case when 查询出多列即可，即可增加列。

列转行:

1、lateral view explode()，使用炸裂函数可以将1列转成多行，被转换列适用于array、map等类型。
lateral view posexplode(数组)，如有排序需求，则需要索引。将数组炸开成两行(索引 , 值),需要
as 两个别名。
2、case when 结合concat_ws与collect_set/collect_list实现。内层用case when，外层用
collect_set/list收集，对搜集完后用concat_ws分割连接形成列。

2.1 编写sql实现行列互换

id userid subject score
1 001 语文 90
2 001 数学 92
3 001 英语 80
4 002 语文 88
5 002 数学 90
6 002 英语 75.5
7 003 语文 70
8 003 数学 85
9 003 英语 90
10 003 政治 82

create table if not exists student1(
id int,
userid string,
subject string,
score double
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/student1.txt' into table student1;

select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid
union
select
if(1=1,'total','t'),
sum(s1),
sum(s2),
sum(s3),
sum(s4),
sum(s5)
from
(select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid) tmp
;

在这里插入图片描述

2.2编写sql实现如下

数据：
t1表
uid tags
1 1,2,3
2 2,3
3 1,2
编写sql实现如下结果：
uid tag
1 1
1 2
1 3
2 2
2 3
3 1
3 2

create table if not exists t1(
uid int,
tags string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/t1.txt' into table t1;

select
uid,
tag
from
t1
lateral view explode(split(tags,",")) t2 as tag
;

在这里插入图片描述

2.3 用户标签连接查询

数据：
T1表:
Tags
1,2,3
1,2
2,3
T2表:
Id lab
1 A
2 B
3 C
根据T1和T2表的数据，编写sql实现如下结果：
ids tags
1,2,3 A,B,C
1,2 A,B
2,3 B,C

create table if not exists T1(
tags string
)
row format delimited fields terminated by '\t'
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/T1.txt' into table T1;

create table if not exists T2(
id int,
lab string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/T2.txt' into table T2;

select
t2.tags,
concat_ws(',',collect_list(t2.lab))
from
(select
t1.tags,
t2.lab
from
(select
tags,
tag
from T1 lateral view explode(split(tags,","))tags as tag) t1
left join T2 t2
on t1.tag=t2.id) t2
group by t2.tags
;

在这里插入图片描述

2.4 用户标签组合

t1表：
id tag flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8
编写sql实现如下结果：
id tag flag
a b 1|2|3
c d 6|8

create table if not exists t3(
id string,
tag string,
flag string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/t3.txt' into table t3;

select
id,
tag,
concat_ws('|',collect_list(distinct flag))
from t3
group by id,tag
;

在这里插入图片描述

2.5 用户标签行列互换

t1表
uid name tags
1 goudan chihuo,huaci
2 mazi sleep
3 laotie paly
编写sql实现如下结果：
uid name tag
1 goudan chihuo
1 goudan huaci
2 mazi sleep
3 laotie paly

create table if not exists t4(
id string,
name string,
tags string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/t4.txt' into table t4;

select
id,
name,
tag
from t4 lateral view explode(split(tags,","))tag as tag
;

在这里插入图片描述

2.6 hive实现词频统计

![2.6](C:\Users\pc\Desktop\hive场景题\img\2.6.png)t1表：
uid contents
1 i|love|china
2 china|is|good|i|i|like
统计结果如下,如果出现次数一样，则按照content名称排序：
content cnt
i 3
china 2
good 1
like 1
love 1
is 1

create table if not exists wc(
uid string,
contents string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/wc.txt' into table wc;

select
word word,
count(1) c
from wc lateral view explode(split(contents,'\\|')) wc as word
group by word
order by c desc,word asc
;

在这里插入图片描述

2.7 课程行转列

t1表
id course
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e
根据编写sql，得到结果如下(表中的1表示选修，表中的0表示未选修)：
id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0

create table if not exists course(
id string,
course string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/course.txt' into table course;

select
id,
sum(case when course="a" then 1 else 0 end),
sum(case when course="b" then 1 else 0 end),
sum(case when course="c" then 1 else 0 end),
sum(case when course="d" then 1 else 0 end),
sum(case when course="e" then 1 else 0 end),
sum(case when course="f" then 1 else 0 end)
from course
group by id
;

在这里插入图片描述

2.8 兴趣行转列

t1表
name sex hobby
janson 男 打乒乓球、游泳、看电影
tom 男 打乒乓球、看电影
hobby最多3个值，使用hql实现结果如下:
name sex hobby1 hobby2 hobby3
janson 男 打乒乓球 游泳 看电影
tom 男 打乒乓球 看电影

create table if not exists hobby(
name string,
sex string,
hobby string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/hobby.txt' into table hobby;

select
name,
sex,
concat_ws(' ',collect_list(hob))
from
(select
name,
sex,
hob
from hobby lateral view explode(split(hobby,'、')) hobs as hob) t1
group by name,sex
;

在这里插入图片描述

2.9 用户商品行列互换

t1表：
用户 商品
A P1
B P1
A P2
B P3
请你使用hql变成如下结果:
用户 P1 P2 P3
A 1 1 0
B 1 0 1

create table if not exists sales(
name string,
sale string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/sales.txt' into table sales;

select
name,
sum(case when sale='P1' then 1 else 0 end) s1,
sum(case when sale='P2' then 1 else 0 end) s2,
sum(case when sale='P3' then 1 else 0 end) s3
from sales
group by name
;

在这里插入图片描述

2.10 成绩课程行列互换

t1表：
name course score
aa English 75
bb math 85
aa math 90
使用hql输出以下结果
name English math
aa 75 90
bb 0 85

create table if not exists score(
name string,
course string,
score double
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/score.txt' into table score;

select
name,
sum(case when course='English' then score else 0 end),
sum(case when course='math' then score else 0 end)
from score
group by name
;

在这里插入图片描述

2.11 求top3英雄及其pick率

pick表
id names
1 亚索,挖掘机,艾瑞莉娅,洛,卡莎
2 亚索,盖伦,奥巴马,牛头,皇子
3 亚索,盖伦,艾瑞莉娅,宝石,琴女
4 亚索,盖伦,赵信,老鼠,锤石

create table if not exists pick(
id string,
names string
)
row format delimited fields terminated by ' '
lines terminated by '\n'
stored as textfile
;

load data local inpath '/root/hivedata/pick.txt' into table pick;

select
name,
count(1) c
from
(select
name
from pick lateral view explode(split(names,',')) names as name) t1
group by name
order by c desc,name asc
limit 3
;

在这里插入图片描述

qianchang_na

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
hive场景题之行列转换

2.行列转换行转列：1、使用case when 查询出多列即可，即可增加列。列转行:1、lateral view explode()，使用炸裂函数可以将1列转成多行，被转换列适用于array、map等类型。lateral view posexplode(数组)，如有排序需求，则需要索引。将数组炸开成两行(索引 , 值),需要as 两个别名。2、case when 结合concat_...
复制链接

扫一扫

专栏目录