1、json_tuple
hive 0.7中引入了一个新的UDTF。它接受一组名称(键)和一个json字符串,并使用一个函数返回一个值元组。
只解析一次JSON字符串,比调用get_json_object()从单个JSON字符串中检索多个键(多次解析)要高效得多。
由于JSON_元组是一个UDTF,因此需要搭配lateral view语法来使用。
select a.timestamp, get_json_object(a.appevents, '$.eventid'), get_json_object(a.appenvets, '$.eventname') from log a;
应该写成
select a.timestamp, b.f1, b.f2
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;
2、parse_url
SELECT
parse_url(url, 'PROTOCOL') as `协议`,
parse_url(url, 'HOST') as `主机名`,
parse_url(url, 'PATH') as `路径`,
parse_url(url, 'QUERY') as `参数列表`,
parse_url(url, 'QUERY', 'k1') as `参数值`
FROM
(
select "http://facebook.com/path1/path2/p.php?k1=v1&k2=v2#Ref1" as url
) t;
+-------+---------------+---------------------+--------------+--------+
| 协议 | 主机名 | 路径 | 参数列表 | 参数值 |
+-------+---------------+---------------------+--------------+--------+
| http | facebook.com | /path1/path2/p.php | k1=v1&k2=v2 | v1 |
+-------+---------------+---------------------+--------------+--------+
同理,改成
SELECT b.host, b.path, b.query, b.query_id FROM
(
select "http://facebook.com/path1/path2/p.php?k1=v1&k2=v2#Ref1" as url
) t
LATERAL VIEW parse_url_tuple(url, 'HOST', 'PATH', 'QUERY', 'QUERY:k1') b as host, path, query, query_id;
+---------------+---------------------+--------------+-------------+
| b.host | b.path | b.query | b.query_id |
+---------------+---------------------+--------------+-------------+
| facebook.com | /path1/path2/p.php | k1=v1&k2=v2 | v1 |
+---------------+---------------------+--------------+-------------+
3、explode
1)select列中不能udtf和非udtf列混用(可以用LATERAL VIEW explode实现) Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
2)udtf不能嵌套
3)不支持 GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY
select explode(array(1,3,5)) as id;
+-----+
| id |
+-----+
| 1 |
| 3 |
| 5 |
+-----+
select explode(map(1, '张三', 2, '李四')) as (map_key, map_value);
+----------+------------+
| map_key | map_value |
+----------+------------+
| 1 | 张三 |
| 2 | 李四 |
+----------+------------+
select posexplode(array(1,3,5)) as (pos, id);
+------+-----+
| pos | id |
+------+-----+
| 0 | 1 |
| 1 | 3 |
| 2 | 5 |
+------+-----+
4、LATERAL VIEW
Multiple Lateral View可以实现类似笛卡尔乘积
SELECT id, name
FROM
(
select
array(10, 11) as id_array,
array('tom','mary') as name_array
) t
LATERAL VIEW explode(id_array) r1 AS id
LATERAL VIEW explode(name_array) r2 AS name;
+-----+-------+
| id | name |
+-----+-------+
| 10 | tom |
| 10 | mary |
| 11 | tom |
| 11 | mary |
+-----+-------+
UDTF的结果为空则不会输出,造成数据丢失
SELECT id, name
FROM
(
select
1 as id,
array() as name_array
) t
LATERAL VIEW explode(name_array) r AS name;
+-----+-------+
| id | name |
+-----+-------+
+-----+-------+
Outer关键字可以把不输出的UDTF的空结果,输出成NULL,防止丢失数据
SELECT id, name
FROM
(
select
1 as id,
array() as name_array
) t
LATERAL VIEW OUTER explode(name_array) r AS name;
+-----+-------+
| id | name |
+-----+-------+
| 1 | NULL |
+-----+-------+