transpose and pivot data in hive

1. 键-值对的形式存储数据

有时候,我们在hive中编写hql的时候,可以用键-值对的形式存储结果。比如有一些同学在一些课程中的学习行为,我们可以记录成如下形式:

studentclass_namekeyvalue
Aenglishanswer_num9
Achineseanswer_num15
Benglishanswer_num12
Aenglishhomework_num5
Benglishhomework_num7
Bchinesehomework_num10

这样做的好处有以下几点。一是可以避免hql中出现大量的join连接,使得hql的冗长;二是便于修改表,比如说,在有个同学的学习特征提取错误的时候,可以直接删除这个key,而不用再重写修改hql,重新跑hql。在将数据存储成这种形式后,我们需要将其转换成普通的形式。本文主要记录如何通过hive中的内置函数,来进行数据的转换。

假设上述的表格存储在tmp的表中。

create table tmp as
select 'A' as student, 'english' as class_name, 'answer_num' as key, 9 as value from dual
union all
select 'A' as student, 'chinese' as class_name, 'answer_num' as key, 15 as value from dual
union all
select 'B' as student, 'english' as class_name, 'answer_num' as key, 12 as value from dual
union all
select 'A' as student, 'english' as class_name, 'homework_num' as key, 5 as value from dual
union all
select 'B' as student, 'english' as class_name, 'homework_num' as key, 7 as value from dual
union all
select 'B' as student, 'chinese' as class_name, 'homework_num' as key, 10 as value from dual;

则转化步骤如下:

2. step1 –> map()函数

用map()函数来将key,value映射成<键,值>对的形式。

select student, class_name,
       map(key,value) as group_map
from   tmp
;

运行结果为:

A       english {"answer_num":9}
A       chinese {"answer_num":15}
B       english {"answer_num":12}
A       english {"homework_num":5}
B       english {"homework_num":7}
B       chinese {"homework_num":10}

2. step2 –> collect_list()函数

用collect_list() 函数将key值拆解出来作为字段,而相应的value值转换为list。

select t1.student, t1.class_name,
       collect_list(t1.group_map['answer_num']) as answer_num,
       collect_list(t1.group_map['homework_num']) as homework_num
from   ( -- step1
         select student, class_name,
                map(key,value) as group_map
         from   tmp
       ) t1
group by t1.student, t1.class_name
;

运行结果为

A       chinese [15]    []
A       english [9]     [5]
B       chinese []      [10]
B       english [12]    [7]

另外,也可以用 collect_set() 函数。

3. step3 –> 处理step2的结果

set hive.cli.print.header = true
;

select t1.student,
       t1.class_name,
       answer_num[0] as answer_num,
       homework_num[0] as homework_num
from   ( -- step2
         select t1.student, t1.class_name,
                collect_list(t1.group_map['answer_num']) as answer_num,
                collect_list(t1.group_map['homework_num']) as homework_num
         from   ( -- step1
                  select student, class_name,
                         map(key,value) as group_map
                  from   tmp
                ) t1
         group by t1.student, t1.class_name
       ) t1
;

运行结果为

t1.student  t1.class_name   answer_num     homework_num
A             chinese          15              NULL
A             english          9               5
B             chinese          NULL            10
B             english          12              7
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值