hive和presto的求数组长度函数区别及注意事项

1、任务

获取邮箱字符串’@'后字符串 ,求长度

2、hive & spark-sql 求数组长度的函数 size


hive & spark-sql 求数组长度的函数 size


select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;

select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;


2	["tom","126.com.cn"]	tom	126.com.cn
2	["jack","126.com"]	jack	126.com
Time taken: 0.723 seconds, Fetched 2 row(s)

3、presto  求数组长度的函数 cardinality

presto  求数组长度的函数 cardinality

select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;

_col0 |       _col1       | _col2 |   _col3    
-------+-------------------+-------+------------
     2 | [tom, 126.com.cn] | tom   | 126.com.cn 
     2 | [jack, 126.com]   | jack  | 126.com    
(2 rows)


select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;


Query 20231019_070945_20009_n9u2s failed: line 3:9: Column 'jack@126.com' cannot be resolved
select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid

4、注意事项

1)、在计算数组长度的时候,hive和presto的函数不同
  其中hive的size函数默认数组的下标从0开始
  presto的cardinality函数默认数组的下标从1开始

2)、presto 不支持双引号 ,而hive 既支持单引号,也支持双引号

presto> SELECT 
     -> email,
     -> (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
     -> FROM 
     -> (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
Query 20231016_070153_17958_p9f2s failed: line 5:9: Column 'jack@126.com' cannot be resolved
SELECT
email,
(case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
FROM
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hive中,要一个数组的长度可以使用函数size()来实现。然而,在某些情况下,当数组中的元素为空时,使用size()函数可能会出现问题。为了解决这个问题,可以使用if语句来判断数组是否为空,然后根据情况返回相应的值。例如,可以使用以下语句来计算数组的长度: select rg, sum(if(length(p_9)==0, 0, size(split(p_9, ",")))) from ttengine_api_data where dt='2017-08-07' group by rg; 这个查询语句中,使用了if语句来判断数组p_9是否为空。如果为空,则返回0,否则使用size(split(p_9, ","))来计算数组的长度。通过这种方式,可以正确地计算数组的长度。\[3\] #### 引用[.reference_title] - *1* [Hive获取array数组长度](https://blog.csdn.net/qq_31573519/article/details/77542756)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* *3* [hive size计算数组长度的一个坑](https://blog.csdn.net/liuxiao723846/article/details/76977365)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值