1. explode
explode就是将hive一行中复杂的 array 或者 map 结构拆分成多行。
hive wiki对于expolde的解释如下:
explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows.
UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.
eg1.
myTable
Then running the query:
SELECT explode(myCol) AS myNewCol FROM myTable;
will produce:
eg2.
2.lateral view
hive wiki 上的解释如下:
Lateral View Syntax
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)*
fromClause: FROM baseTable (lateralView)*
Description
Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.
An example table with two rows:
SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
The resulting output will be
-----行转列
create table test_1102 ( name string, marks string );
insert into test_1102 values ( 'jason', 'english');
insert into test_1102 values ( 'jason', 'chinese');
insert into test_1102 values ( 'jason', 'math');
insert into test_1102 values ( 'tracy', 'jjj');
insert into test_1102 values ( 'tracy', 'kkkkk');
select
name,
concat_ws(',', collect_set(concat(marks,'""""" """"ooooo')))
from test_1102
group by name;
inline:
SELECT * from ( select array(struct('2019-07-01','2019-07-10')) as date_array )a lateral view inline(date_array) tf as start_dt, end_dt
a.date_array
| tf.start_dt
| tf.end_dt
|
---|---|---|
[{"col1":"2019-07-01","col2":"2019-07-10"}] | 2019-07-01 | 2019-07-10 |
SELECT * from ( select array(struct('2019-06-25',last_day('2019-06-25')),struct(date_add(last_day('2019-06-25'),1),'2019-07-03')) as date_array )a lateral view inline(date_array) tf as start_dt, end_dt
a.date_array
| tf.start_dt
| tf.end_dt
|
---|---|---|
[{"col1":"2019-06-25","col2":"2019-06-30"},{"col1":"2019-07-01","col2":"2019-07-03"}] | 2019-06-25 | 2019-06-30 |
[{"col1":"2019-06-25","col2":"2019-06-30"},{"col1":"2019-07-01","col2":"2019-07-03"}] | 2019-07-01 | 2019-07-03 |
SELECT posexplode(split(space(7-1),' ')) as (pos,val)
pos
| val
|
---|---|
0 | |
1 | |
2 | |
3 | |
4 | |
5 | |
6 |
Built-in Table-Generating Functions (UDTF)
Normal user-defined functions, such as concat(), take in a single input row and output a single output row. In contrast, table-generating functions transform a single input row to multiple output rows.
Row-set columns types | Name(Signature) | Description |
---|---|---|
T | explode(ARRAY<T> a) | Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array. |
Tkey,Tvalue | explode(MAP<Tkey,Tvalue> m) | Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.). |
int,T | posexplode(ARRAY<T> a) | Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array. |
T1,...,Tn | inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a) | Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.) |
T1,...,Tn/r | stack(int r,T1 V1,...,Tn/r Vn) | Breaks up n values V1,...,Vn into r rows. Each row will have n/r columns. r must be constant. |
string1,...,stringn | json_tuple(string jsonStr,string k1,...,string kn) | Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the |
string 1,...,stringn | parse_url_tuple(string urlStr,string p1,...,string pn) | Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the |
from : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
ref:
https://blog.csdn.net/bitcarmanlee/article/details/51926530
http://www.cnblogs.com/judylucky/p/3713774.html
http://chengjianxiaoxue.iteye.com/blog/2268371
https://stackoverflow.com/questions/38064412/converting-columns-to-rows-unpivot-in-hiveql
https://stackoverflow.com/questions/39353425/how-to-convert-columns-into-rows-in-hive