Hive中常用函数 wm_concat & concat_ws & collect_set & sort_array总结

最新推荐文章于 2024-08-25 03:52:23 发布

Lens5935

最新推荐文章于 2024-08-25 03:52:23 发布

阅读量6.6k

点赞数 2

分类专栏： Hive 数据库数据分析文章标签： hive 大数据 sql 数据库 postgresql

本文链接：https://blog.csdn.net/weixin_40983094/article/details/112220844

版权

Hive 同时被 3 个专栏收录

34 篇文章 13 订阅

订阅专栏

数据库

29 篇文章 1 订阅

订阅专栏

数据分析

13 篇文章 2 订阅

订阅专栏

concat_ws & collect_set:

应用1:

user	each_id
user_id1	id1
user_id1	id2
user_id2	id1
user_id2	id2
user_id2	id3
user_id3	id1

user	ids
user_id1	id1,id2
user_id2	id1,id2,id3
user_id3	id1

select 
	dt,
	user_id, 
	concat_ws(',',collect_set(each_id)) as ids
from table_a  
group by dt,user_id

应用2：

画像格式【0.96:时尚:text1,0.0:音乐:text2,1.0:游戏:text3】

select
	uuid,
	concat_ws(',',collect_set( concat(  cast(weight_score as string), ':' , cast(first_cate as string) , ':' , nvl(first_text,'-') ) ) ) as first_category
from table_temp

collect_set的作用：

（1）去重，对group by后面的user_id进行去重

（2）对group by以后属于同一user_id的形成一个集合，结合concat_ws对集合中元素使用，进行分隔形成字符串

wm_concat

select dt
        ,appid
        ,wm_concat(concat(cnt,':',uv),'|','asc',cnt) as uv_str
from(
        select dt
                ,appid
                ,cnt
                ,count(distinct user_id) as uv
        from test_table
        group by dt,appid,cnt
)tmp
group by dt,appid

wm_concat功能：把聚合数据拼接成一个字符串，一般搭配group by 使用。
参数：

col1 : columnName,合并字段
splitstr : string,分隔符
asc | desc : string,升降序
col2 : columnName,排序字段

sort_array & collect_set & concat_ws（等同wm_concat ）

select dt
		,appid
		,concat_ws('|',sort_array(collect_set(concat_ws(':',cnt,uv))))as uv_str
from(
		select dt
				,appid
				,cnt
				,count(distinct user_id) as uv
		from test_table
		group by dt,appid,cnt
)tmp
group by dt,appid

concat_ws功能：使用给定的分隔符将多个输入字符串列连接到一个字符串列中
collect_set功能：聚合函数，返回去除重复元素后的集合。
sort_array功能：集合函数，对输入数组元素按升序或降序排序。空元素将按升序放在返回数组的开头，或按降序放在返回数组的末尾。