hive之案例分析(grouping sets,lateral view explode, concat_ws)

有这样一组搜索结果数据:

租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

 这里最关键的是sku对应到命中的关键词:

操作步骤1: 

将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

    select tenantcode,
        nvl(platform,0) as platform,
        keywords,
        'day' as dim_code,
        '20181221' as dim_value,
        gl['skucode'] as skucode,
        gl['skuname'] as skuname,
        gl['spucode'] as spucode,
        gl['spuname'] as spuname 
    from dw_mdl.m_search_result2
    lateral view explode(goodsList) gl as gl
    where dt = '20181221';

显示如下:

 

操作步骤2:

根据商品,汇总关键词列,这里考虑到平台,时间维度等。

grouping sets 分组汇总数据

collect_set 多行合并并且去重

collect_list 多行合并不去重

with tmp_a as (
    select tenantcode,
        nvl(platform,0) as platform,
        keywords,
        'day' as dim_code,
        '20181221' as dim_value,
        gl['skucode'] as skucode,
        gl['skuname'] as skuname,
        gl['spucode'] as spucode,
        gl['spuname'] as spuname 
    from dw_mdl.m_search_result2
    lateral view explode(goodsList) gl as gl
    where dt = '20181221'
)

select tenantcode, 
    nvl(platform,'all') as platform,
    skucode,
    dim_code,
    dim_value,
    count(skuname) as search_times, 
    collect_set(keywords) as keywords
from tmp_a 
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

 

操作步骤3:

数组转字符串: concat_ws('分隔符',数组)

with tmp_a as (
    select tenantcode,
        nvl(platform,0) as platform,
        keywords,
        'day' as dim_code,
        '20181221' as dim_value,
        gl['skucode'] as skucode,
        gl['skuname'] as skuname,
        gl['spucode'] as spucode,
        gl['spuname'] as spuname 
    from dw_mdl.m_search_result2
    lateral view explode(goodsList) gl as gl
    where dt = '20181221'
),
tmp_b as (
    select tenantcode, 
        nvl(platform,'all') as platform,
        skucode,
        dim_code,
        dim_value,
        count(skuname) as search_times, 
        concat_ws(',',collect_set(keywords)) as keywords
    from tmp_a 
    group by tenantcode,platform,skucode,dim_code,dim_value
    grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;

 

是不是太简单了。

 

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值