hive 高级数据类型使用
用了许久的hive,但是一直都是简单的sql join,sort, order by等,今天有一个业务场景需要使用array数据类型存储数据并进行横表转纵表的转换。mark下以后用了可以查询。
数据样子是这样的。
ID | type_flag | tags |
---|---|---|
10001 | 3 | 11_20_30,11_22_34,12_23_30,13_24_36 |
10002 | 2 | 11_20,11_22,12_23,13_24 |
10003 | 1 | 11,12 |
表格1
需要转化成的样子如下:
ID | type_flag | tag1 | tag2 | tag3 |
---|---|---|---|---|
10001 | 3 | 11 | 20 | 30 |
10001 | 3 | 11 | 22 | 34 |
10001 | 3 | 12 | 23 | 30 |
10001 | 3 | 13 | 24 | 36 |
10002 | 2 | 11 | 20 | |
10002 | 2 | 11 | 22 | |
10002 | 2 | 12 | 23 | |
10002 | 2 | 13 | 24 | |
10003 | 1 | 11 | ||
10003 | 1 | 12 |
表格2
- 创建表tmp_type_tags存储表格1的数据
<code class="hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">create table tmp.tmp_type_tags ( id bigint , type_flag <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>, tags <span class="hljs-stl_container" style="box-sizing: border-box;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">array</span> < <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> ></span> ) row format delimited fields terminated by <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'\t'</span> COLLECTION ITEMS TERMINATED BY <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">','</span> stored as textfile;</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>
- 创建表tmp_type_tag_split存储转化中间的数据
<code class="hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">create table tmp.tmp_type_tag_split ( id bigint , type_flag <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>, tag <span class="hljs-stl_container" style="box-sizing: border-box;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">array</span> < <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> ></span> ) row format delimited fields terminated by <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'\t'</span> COLLECTION ITEMS TERMINATED BY <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'_'</span> stored as textfile;</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>
- 第一步横转纵
将数据按照第一层分隔符,转化成数据size那么多的行数。sql如下
<code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">insert</span> overwrite <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">table</span> tmp.tmp_type_tag_split <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">select</span> id , type_flag, split(tag0, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'_'</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> tmp.tmp_type_tags lateral <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">view</span> explode(tags) r1 <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag0</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>
数据结果:
- 第二步按照类型进行分列
<code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"> <span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">create</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">table</span> tmp.tmp_type_tag_split_info <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">select</span> id , type_flag, (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag1, (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag2, (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag3 <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> tmp.tmp_type_tag_split</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>
最终表数据结果如下:转换完成