Hive UDTF

1、json_tuple

        hive 0.7中引入了一个新的UDTF。它接受一组名称(键)和一个json字符串,并使用一个函数返回一个值元组。
        只解析一次JSON字符串,比调用get_json_object()从单个JSON字符串中检索多个键(多次解析)要高效得多。
        由于JSON_元组是一个UDTF,因此需要搭配lateral view语法来使用。

select a.timestamp, get_json_object(a.appevents, '$.eventid'), get_json_object(a.appenvets, '$.eventname') from log a;

应该写成

select a.timestamp, b.f1, b.f2
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;


2、parse_url

SELECT 
  parse_url(url, 'PROTOCOL')    as `协议`,
  parse_url(url, 'HOST')        as `主机名`,
  parse_url(url, 'PATH')        as `路径`,
  parse_url(url, 'QUERY')       as `参数列表`,
  parse_url(url, 'QUERY', 'k1') as `参数值`
FROM
(
  select "http://facebook.com/path1/path2/p.php?k1=v1&k2=v2#Ref1" as url
) t;

+-------+---------------+---------------------+--------------+--------+
| 协议  |    主机名     |        路径         |   参数列表   | 参数值 |
+-------+---------------+---------------------+--------------+--------+
| http  | facebook.com  | /path1/path2/p.php  | k1=v1&k2=v2  | v1     |
+-------+---------------+---------------------+--------------+--------+

同理,改成


SELECT b.host, b.path, b.query, b.query_id FROM
(
  select "http://facebook.com/path1/path2/p.php?k1=v1&k2=v2#Ref1" as url
) t
LATERAL VIEW parse_url_tuple(url, 'HOST', 'PATH', 'QUERY', 'QUERY:k1') b as host, path, query, query_id;

+---------------+---------------------+--------------+-------------+
|    b.host     |       b.path        |   b.query    | b.query_id  |
+---------------+---------------------+--------------+-------------+
| facebook.com  | /path1/path2/p.php  | k1=v1&k2=v2  | v1          |
+---------------+---------------------+--------------+-------------+

3、explode

1)select列中不能udtf和非udtf列混用(可以用LATERAL VIEW explode实现) Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
2)udtf不能嵌套
3)不支持 GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY

select explode(array(1,3,5)) as id;

+-----+
| id  |
+-----+
| 1   |
| 3   |
| 5   |
+-----+
select explode(map(1, '张三', 2, '李四')) as (map_key, map_value);

+----------+------------+
| map_key  | map_value  |
+----------+------------+
| 1        | 张三       |
| 2        | 李四       |
+----------+------------+
select posexplode(array(1,3,5)) as (pos, id);
+------+-----+
| pos  | id  |
+------+-----+
| 0    | 1   |
| 1    | 3   |
| 2    | 5   |
+------+-----+

4、LATERAL VIEW

Multiple Lateral View可以实现类似笛卡尔乘积

SELECT id, name
FROM 
(
  select 
    array(10, 11)       as id_array, 
    array('tom','mary') as name_array
) t
LATERAL VIEW explode(id_array) r1 AS id
LATERAL VIEW explode(name_array) r2 AS name;

+-----+-------+
| id  | name  |
+-----+-------+
| 10  | tom   |
| 10  | mary  |
| 11  | tom   |
| 11  | mary  |
+-----+-------+

UDTF的结果为空则不会输出,造成数据丢失

SELECT id, name
FROM 
(
  select 
    1 as id,
    array() as name_array
) t
LATERAL VIEW explode(name_array) r AS name;

+-----+-------+
| id  | name  |
+-----+-------+
+-----+-------+

Outer关键字可以把不输出的UDTF的空结果,输出成NULL,防止丢失数据

SELECT id, name
FROM 
(
  select 
    1 as id,
    array() as name_array
) t
LATERAL VIEW OUTER explode(name_array) r AS name;

+-----+-------+
| id  | name  |
+-----+-------+
| 1   | NULL  |
+-----+-------+


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值