工作中Hive的行列转换

最新推荐文章于 2023-01-05 17:54:07 发布

吗达拉

最新推荐文章于 2023-01-05 17:54:07 发布

阅读量450

点赞数 1

分类专栏： Hadoop

本文链接：https://blog.csdn.net/weixin_42656794/article/details/91564367

版权

Hadoop 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

hive的行列转换，个人理解就是一变多，多变一的过程。
一，行转列
在项目中遇见这样一个问题，原始数据经过处理以后是这样的
在这里插入图片描述
前者是话题ID，后面是与话题相似的前30个话题ID
因为要过滤掉用户已经接触过的话题ID，所以，需要将这个数据变成以下的格式
17796 16224
17796 17385
…
在hive中创建映射表

create external table category_result(topicid int,simtopicid array<string> ) 
row format delimited fields terminated by '\001' 
collection items terminated by ',' 
location '/recs/category/category_result';

行转列

create table tmp as 
select topicid,simtopic from category_result lateral view explode(simtopicid) tmp1 as  simtopic ;

结果如下：
在这里插入图片描述
再过滤掉用户已经接触过了的

create table user_recs_categroy as 
select a.userinfoid,simtopicid from 
(select userinfoid,simtopicid from user_recs_categroy_tmp1)a
left join 
(select userinfoid,topicid from recs_userRalationTopic)b
on
a.simtopicid=b.topicid where b.topicid is null;

得到下表：
在这里插入图片描述
二，列转行
对上表的结果数据进行处理得到这种类型的数据
152541 15478,15435

select userinfoid,concat_ws(',',collect_set(simtopicid)) simtopics 
from user_recs_categroy 
group by userinfoid
limit 20

结果如下
在这里插入图片描述

吗达拉

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
工作中Hive的行列转换

hive的行列转换，个人理解就是一变多，多变一的过程。一，行转列在项目中遇见这样一个问题，原始数据经过处理以后是这样的前者是话题ID，后面是与话题相似的前30个话题ID因为要过滤掉用户已经接触过的话题ID，所以，需要将这个数据变成以下的格式17796 1622417796 17385…在hive中创建映射表create external table category_r...
复制链接

扫一扫