hive 高级数据类型使用

hive 高级数据类型使用

用了许久的hive,但是一直都是简单的sql join,sort, order by等,今天有一个业务场景需要使用array数据类型存储数据并进行横表转纵表的转换。mark下以后用了可以查询。 
数据样子是这样的。

ID type_flag tags
10001 3 11_20_30,11_22_34,12_23_30,13_24_36
10002 2 11_20,11_22,12_23,13_24
10003 1 11,12

表格1

需要转化成的样子如下:

ID type_flag tag1 tag2 tag3
10001 3 11 20 30
10001 3 11 22 34
10001 3 12 23 30
10001 3 13 24 36
10002 2 11 20  
10002 2 11 22  
10002 2 12 23  
10002 2 13 24  
10003 1 11    
10003 1 12    

表格2

  • 创建表tmp_type_tags存储表格1的数据
<code class="hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">create table tmp.tmp_type_tags
(
        id bigint       ,
        type_flag <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>,
        tags <span class="hljs-stl_container" style="box-sizing: border-box;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">array</span> < <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> ></span>
)
row format delimited fields terminated by <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'\t'</span> 
COLLECTION ITEMS TERMINATED BY <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">','</span> stored as textfile;</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>
  • 创建表tmp_type_tag_split存储转化中间的数据
<code class="hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">create table tmp.tmp_type_tag_split
(
        id bigint       ,
        type_flag <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>,
        tag <span class="hljs-stl_container" style="box-sizing: border-box;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">array</span> < <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> ></span>
)
row format delimited fields terminated by <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'\t'</span> 
COLLECTION ITEMS TERMINATED BY <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'_'</span> stored as textfile;</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>
  • 第一步横转纵 
    将数据按照第一层分隔符,转化成数据size那么多的行数。sql如下
<code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">insert</span> overwrite <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">table</span> tmp.tmp_type_tag_split
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">select</span>
        id       ,
        type_flag,
        split(tag0, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'_'</span>)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span>
        tmp.tmp_type_tags lateral <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">view</span> explode(tags) r1 <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag0</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

数据结果: 
tmp_type_tag_split数据

  • 第二步按照类型进行分列
<code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"> <span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">create</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">table</span> tmp.tmp_type_tag_split_info <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> 
 <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">select</span>
  id       ,
  type_flag,
  (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag1,
  (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag2,
  (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag3
 <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span>
     tmp.tmp_type_tag_split</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>

最终表数据结果如下:转换完成 
最终结果

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值