hive使用collect与explode

一、多行转单行

样例:表test存储了以下数据

hotel           search  id              poi  
90663029        5       15306667        巫山县教委关心下一代工作委员会
90663029        2       15381967        文体广场
90663029        2       15381967        文体广场
90663029        1       15433199        巫山翠屏汽车站
90808282        9       15451636        巫山县教育委员会
90808282        5       15306667        巫山县教委关心下一代工作委员会
90808282        1       15367082        南峰小学

1.collect_set(x)

        返回值: 字段的罗列

        说明:对聚合范围内某字段所有值的非重复罗列。

select mhotel,concat_ws(',',collect_set(id))
  from ggg group by mhotel;

2.collect_list(x)

        返回值: 字段的罗列

        说明:对聚合范围内某字段所有值的罗列(可重复)。

select mhotel,concat_ws(',',collect_list(id)) 
  from ggg group by mhotel;

二、单行转多行(Array或map)

    explode(ARRAY):参数列为数组类型,将数组数据中的每个元素做为一行返回

    explode(MAP):将输入map中的每个键值对转换为两列,然后返回新行

    posexplode(ARRAY):与explode(ARRAY)相似,但包含元素在数组中的位置(pos,value)

样例:

id      mvt
3       NULL
2       [{"val":"1","vid":"142","eid":"154"},{"val":"0","vid":"141","eid":"152"},{"val":"0","vid":"190","eid":"201"},{"val":"1","vid":"147","eid":"160"},{"val":"0","vid":"302","eid":"315"},{"val":"0","vid":"211","eid":"224"},{"val":"1","vid":"133","eid":"144"},{"val":"0","vid":"268","eid":"281"},{"val":"0","vid":"271","eid":"284"},{"val":"0","vid":"258","eid":"271"},{"val":"1","vid":"91","eid":"104"},{"val":"1","vid":"231","eid":"244"}]
1       [{"val":"1","vid":"142","eid":"154"},{"val":"0","vid":"141","eid":"152"},{"val":"0","vid":"190","eid":"201"},{"val":"1","vid":"147","eid":"160"},{"val":"0","vid":"302","eid":"315"},{"val":"0","vid":"211","eid":"224"},{"val":"1","vid":"133","eid":"144"},{"val":"0","vid":"268","eid":"281"},{"val":"0","vid":"271","eid":"284"},{"val":"0","vid":"258","eid":"271"},{"val":"1","vid":"91","eid":"104"},{"val":"1","vid":"231","eid":"244"}]

1.explode(x)

        mvt:array<map<string,string>>

select explode(mvt) from hhh;

2.lateral VIEW explode

        mvt:array<map<string,string>>

select id,mvt_info from hhh 
lateral VIEW explode(mvt) mvts AS mvt_info;

解析ARRAY并获取map的value:

select id,mvt_info['eid'],mvt_info['val'],mvt_info['vid'] from hhh 
lateral VIEW explode(mvt) mvts AS mvt_info

解析ARRAY并解析包含的map:

select id,explode(mvt_info)
 from hhh lateral VIEW explode(mvt) mvts AS mvt_info

3.lateral VIEW posexplode

        mvt:array<map<string,string>>

select id,i,mvt_info from hhh 
lateral VIEW posexplode(mvt) mvts AS i,mvt_info;

4.lateral VIEW outer explode

        mvt:array<map<string,string>>

explode的列为空时,lateral VIEW不会生成行。为避免该数据行不显示可使用outer关键字,它能够保留原数据行内容并将扩展列的列值置为空。

select id,mvt_info from hhh 
lateral VIEW outer explode(mvt) mvts AS mvt_info;

三、单行转多行(string)

样例:  pois:string

mhotel          pois 
90663029        [{"search": 5, "id": 15306667, "poi": "巫山县教委关心下一代工作委员会"}, {"search": 2, "id": 15381967, "poi": "文体广场"}, {"search": 1, "id": 15433199, "poi": "巫
山翠屏汽车站"}]
90808282        [{"search": 9, "id": 15451636, "poi": "巫山县教育委员会"}, {"search": 5, "id": 15306667, "poi": "巫山县教委关心下一代工作委员会"}, {"search": 1, "id": 15367082, "poi": "南峰小学"}]

1.hive String处理

SELECT
  mhotel,
  json_tuple(poi_json,'search','id','poi') AS (search,id,poi)
FROM
  (SELECT
     mhotel,
     regexp_replace(regexp_replace(pois, '\\[|\\]', ''), '\\}, \\{', '\\}&\\{') pois
   FROM iii) t
  lateral VIEW explode(split(pois, '\\&')) pois AS poi_json

2.Shell String处理

#!/bin/bash

y='2018-09-15'

for i in $(seq 1 6)
do
  x=`date -d "$i day $dt" "+%Y-%m-%d"`
  y=$y,$x
done

echo $y
hive -e "select mhotel,dt
from iii lateral view explode(split('$y',',')) dts as dt"

转载于:https://my.oschina.net/puwenchao/blog/1863087

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值