hive 列转行 行转列

1. explode

explode就是将hive一行中复杂的 array 或者 map 结构拆分成多行。

hive wiki对于expolde的解释如下:

explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows.

UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.

eg1.

myTable

这里写图片描述

Then running the query:

SELECT explode(myCol) AS myNewCol FROM myTable;

will produce: 

这里写图片描述

eg2.

 

2.lateral view

hive wiki 上的解释如下:

Lateral View Syntax

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)* 
fromClause: FROM baseTable (lateralView)*

Description

Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

An example table with two rows: 
这里写图片描述

SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

The resulting output will be 
这里写图片描述

 

-----行转列
create table test_1102 ( name string,  marks string );

insert into test_1102 values ( 'jason', 'english'); 
insert into test_1102 values ( 'jason', 'chinese');
insert into test_1102 values ( 'jason', 'math');
insert into test_1102 values ( 'tracy', 'jjj');
insert into test_1102 values ( 'tracy', 'kkkkk');

select
   name,
   concat_ws(',', collect_set(concat(marks,'""""" """"ooooo')))
from  test_1102
group by name;

 

inline: 

SELECT *
from (
select array(struct('2019-07-01','2019-07-10')) as date_array
)a
lateral view inline(date_array) tf as start_dt, end_dt
a.date_array

 

tf.start_dt

 

tf.end_dt

 

[{"col1":"2019-07-01","col2":"2019-07-10"}]2019-07-012019-07-10

 

SELECT *
from (
select  array(struct('2019-06-25',last_day('2019-06-25')),struct(date_add(last_day('2019-06-25'),1),'2019-07-03')) as date_array
)a
lateral view inline(date_array) tf as start_dt, end_dt
a.date_array

 

tf.start_dt

 

tf.end_dt

 

[{"col1":"2019-06-25","col2":"2019-06-30"},{"col1":"2019-07-01","col2":"2019-07-03"}]2019-06-252019-06-30
[{"col1":"2019-06-25","col2":"2019-06-30"},{"col1":"2019-07-01","col2":"2019-07-03"}]2019-07-012019-07-03

 

SELECT posexplode(split(space(7-1),' ')) as (pos,val)

pos

 

val

 

0 
1 
2 
3 
4 
5 
6

 

Built-in Table-Generating Functions (UDTF)

Normal user-defined functions, such as concat(), take in a single input row and output a single output row. In contrast, table-generating functions transform a single input row to multiple output rows.

Row-set columns types

Name(Signature)

Description

T

explode(ARRAY<T> a)

Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

Tkey,Tvalue

explode(MAP<Tkey,Tvalue> m)

Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).

int,Tposexplode(ARRAY<T> a)Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.

T1,...,Tn

inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a)

Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.)

T1,...,Tn/rstack(int r,T1 V1,...,Tn/r Vn)Breaks up n values V1,...,Vn into rows. Each row will have n/r columns. must be constant.
   

string1,...,stringn

json_tuple(string jsonStr,string k1,...,string kn)

Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the get_json_object UDF because it can get multiple keys with just one call.

string 1,...,stringn

parse_url_tuple(string urlStr,string p1,...,string pn)

Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the parse_url() UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:<KEY>.

from : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

 

ref: 

https://blog.csdn.net/bitcarmanlee/article/details/51926530

http://www.cnblogs.com/judylucky/p/3713774.html

http://chengjianxiaoxue.iteye.com/blog/2268371

https://stackoverflow.com/questions/38064412/converting-columns-to-rows-unpivot-in-hiveql

https://stackoverflow.com/questions/39353425/how-to-convert-columns-into-rows-in-hive

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值