Hive版本: hive-1.1.0-cdh5.14.2
1. 列转行函数:explode
(1) 输入类型为array
语法:explode(ARRAY a)
返回值:T
描述:展开array,其中每个元素占一行
0: jdbc:hive2://node03:10000> select explode(array('A', 'B', 'C'));
+------+--+
| col |
+------+--+
| A |
| B |
| C |
+------+--+
(2) 输入类型为map
语法:explode(MAP<Tkey,Tvalue> m)
返回值: Tkey,Tvalue
描述:展开map,每个k-v对占一行
0: jdbc:hive2://node03:10000> select explode(map('A',10, 'B',20, 'C',30));
+------+--------+--+
| key | value |
+------+--------+--+
| A | 10 |
| B | 20 |
| C | 30 |
+------+--------+--+
2. 列转行函数(增加行号):posexplode
语法: posexplode(ARRAY a)
返回值: int,T
描述: 展开array,并返回元素在array中的索引号
0: jdbc:hive2://node03:10000> select posexplode(array('A','B','C'));
+------+------+--+
| pos | val |
+------+------+--+
| 0 | A |
| 1 | B |
| 2 | C |
+------+------+--+
3. 列转行函数:inline
语法:inline(ARRAY<STRUCTf1:T1,...,fn:Tn> a)
返回值: T1,…,Tn
描述: 展开嵌套struct的array,每个struct占一行
0: jdbc:hive2://node03:10000> select inline(array(struct('A',10,date '2015-01-01'), struct('B', 20, date '2016-02-02')));
+-------+-------+-------------+--+
| col1 | col2 | col3 |
+-------+-------+-------------+--+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-02-02 |
+-------+-------+-------------+--+
4. json解析函数: json_tuple
语法:json_tuple(string jsonStr,string k1,…,string kn)
返回值: string1,…,stringn
描述:解析jsonstr,并按指定的key返回结果。一次调用,可以返回多个键值,所以比get_json_object高效。
0: jdbc:hive2://node03:10000> set hivevar:jsonstr = '{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} },"email":"amy@only_for_json_udf_test.net","owner":"amy"}';
0: jdbc:hive2://node03:10000> select b.* from
. . . . . . . . . . . . . . > (select ${hivevar:jsonstr} as jsonstr) a
. . . . . . . . . . . . . . > lateral view json_tuple(a.jsonstr, 'email', 'owner') b as email, owner;
+---------------------------------+----------+--+
| b.email | b.owner |
+---------------------------------+----------+--+
| amy@only_for_json_udf_test.net | amy |
+---------------------------------+----------+--+
5. url解析函数: parse_url_tuple
语法: parse_url_tuple(string urlStr,string p1,…,string pn)
返回值: string1,…,stringn
描述:解析urlstr,返回指定部分的值。不同于parse_url(),parse_url_tuple可以一次返回url多个部分的值。可以指定的有效值为: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:
0: jdbc:hive2://node03:10000> set hivevar:urlstr = 'https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1';
0: jdbc:hive2://node03:10000> select parse_url_tuple(${hivevar:urlstr},'HOST','QUERY:k1','REF');
+--------------------+-----+-------+--+
| c0 | c1 | c2 |
+--------------------+-----+-------+--+
| www.tableName.com | v1 | Ref1 |
+--------------------+-----+-------+--+
0: jdbc:hive2://node03:10000> select b.* from
. . . . . . . . . . . . . . > (select ${hivevar:urlstr} as urlstr) a
. . . . . . . . . . . . . . > lateral view parse_url_tuple(a.urlstr,'HOST','QUERY:k1','REF') b as host,k1,ref;
+--------------------+-------+--------+--+
| b.host | b.k1 | b.ref |
+--------------------+-------+--------+--+
| www.tableName.com | v1 | Ref1 |
+--------------------+-------+--------+--+