SparkSQL中的collect_set()函数对于输入类型的限制

博客探讨了SparkSQL的collect_set()函数在处理聚合输入类型时存在的限制,指出在尝试直接聚合map类型数据时会遇到错误。文章提供了解决方案,包括将map类型转换为string类型后再进行collect_set操作,或者使用自定义UDAF来重新实现collect_set函数。
摘要由CSDN通过智能技术生成
  1. 在HQL中,collect_set() 支持各种类型的字段的聚合;
-- HQL 中可以执行如下操作
SELECT	ssoid,
	collect_set(nickname)[0] AS nickname,
	collect_set(nat_code)[0] AS nat_code,
	collect_set(reg_brand)[0] AS reg_brand,
	collect_set(reg_date)[0] AS reg_date,
	collect_set(ip)[0] AS ip,
	collect_set(ip_tags)[0] AS ip_tags, -- map类型
	collect_set(ip_info)[0] AS ip_info,  -- map类型
	collect_set(source)[0] AS source,
	collect_set(md5_imei)[0] AS md5_imei,
	collect_set(model)[0] AS model,
	collect_set(android_version)[0] AS android_version,
	collect_set(os_version)[0] AS os_version,
	collect_set(is_phno_bound)[0] AS is_phno_bound,
	collect_set(phno_tags)[0] AS phno_tags,  -- map类型
	collect_set(phno_location)[0] AS phno_location,
	collect_set(md5_phno)[0] AS md5_phno,
	collect_set(is_sec_email_bound)[0] AS is_sec_email_bound,
	collect_set(md5_email)[0] AS md5_email,
	collect_set(email_tags)[0] AS email_tags,	 -- map类型
	collect_set(is_certificated)[0] AS is_certificated,	
	collect_set(certificate_type)[0] AS certificate_type,	
	collect_set(md5_certification)[0] AS md5_certification,	
	collect_set(emer)[0] AS emer,	
	collect_set(emer_cnt)[0] AS emer_cnt,	
	collect_set(extr)[0] AS extr	 -- map类型
from (select * from orisk_db.ads_orisk_uc_account_inc_d where dayno=${today} LIMIT 100)t1
group by ssoid
  1. 在SparkSQL中 collect_set() 函数对于聚合的输入字段类型有限制&
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值