hive 通过split和collect_list进行特殊行列转换

我想通过SQL将下面这种情况
这里写图片描述
转换为下面这种情况
这里写图片描述
最后的实现SQL是

select bb.rq,
       split(concat_ws(',',collect_list(bb.wdd)),',')[0] as `到餐`,
       split(concat_ws(',',collect_list(bb.wdd)),',')[1] as `外卖`,
       split(concat_ws(',',collect_list(bb.wdd)),',')[2] as `电影`
  from (
        select aa.rq,
               aa.yw,
               aa.wdd
          from (
                select '2018-01-01' as rq,
                       '外卖' as yw,
                       '1.1' as wdd
             union all select '2018-01-02' as rq,
                       '到餐' as yw,
                       '1.2' as wdd
             union all select '2018-01-01' as rq,
                       '电影' as yw,
                       '1.3' as wdd
             union all select '2018-01-02' as rq,
                       '外卖' as yw,
                       '1.1' as wdd
             union all select '2018-01-01' as rq,
                       '到餐' as yw,
                       '2.2' as wdd
             union all select '2018-01-02' as rq,
                       '电影' as yw,
                       '2.3' as wdd
               ) aa
         order by aa.yw
         limit 100000
       ) bb
 group by bb.rq

这里有一个问题就是,不使用split,也就是使用以下SQL跑时

select bb.rq,
       concat_ws(',',collect_list(bb.wdd)) as address_list
  from (
        select aa.rq,
               aa.yw,
               aa.wdd
          from (
                select '2018-01-01' as rq,
                       '外卖' as yw,
                       '1.1' as wdd
             union all select '2018-01-02' as rq,
                       '到餐' as yw,
                       '1.2' as wdd
             union all select '2018-01-01' as rq,
                       '电影' as yw,
                       '1.3' as wdd
             union all select '2018-01-02' as rq,
                       '外卖' as yw,
                       '1.1' as wdd
             union all select '2018-01-01' as rq,
                       '到餐' as yw,
                       '2.2' as wdd
             union all select '2018-01-02' as rq,
                       '电影' as yw,
                       '2.3' as wdd
               ) aa
       ) bb
 group by bb.rq

结果为下图,出来的结果是没办法区分concat_ws()完的字段哪一个对应哪个业务
这里写图片描述

这时候需要在内层将yw字段排序,我排序的结果是下图(emmm,很奇怪这是怎么个排序规则)。所以在上面的代码中split[0]为到餐,[1]为外卖,[2]为电影
这里写图片描述

ps:区分一下collect_set,collect_list,concat_ws(和lateral view explode 后补)的区别
collect_set() 结果去重
collect_list() 结果不去重
concat_ws() 可将数组按指定分隔符链接
运行图后补

阅读更多
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_20745901/article/details/79977291
个人分类: hive
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭