concat_ws & collect_set:
应用1:
user | each_id |
user_id1 | id1 |
user_id1 | id2 |
user_id2 | id1 |
user_id2 | id2 |
user_id2 | id3 |
user_id3 | id1 |
user | ids |
user_id1 | id1,id2 |
user_id2 | id1,id2,id3 |
user_id3 | id1 |
select
dt,
user_id,
concat_ws(',',collect_set(each_id)) as ids
from table_a
group by dt,user_id
应用2:
画像格式【0.96:时尚:text1,0.0:音乐:text2,1.0:游戏:text3】
select
uuid,
concat_ws(',',collect_set( concat( cast(weight_score as string), ':' , cast(first_cate as string) , ':' , nvl(first_text,'-') ) ) ) as first_category
from table_temp
collect_set的作用:
(1)去重,对group by后面的user_id进行去重
(2)对group by以后属于同一user_id的形成一个集合,结合concat_ws对集合中元素使用,进行分隔形成字符串
wm_concat
select dt
,appid
,wm_concat(concat(cnt,':',uv),'|','asc',cnt) as uv_str
from(
select dt
,appid
,cnt
,count(distinct user_id) as uv
from test_table
group by dt,appid,cnt
)tmp
group by dt,appid
wm_concat功能:把聚合数据拼接成一个字符串,一般搭配group by 使用。
参数:
- col1 : columnName,合并字段
- splitstr : string,分隔符
- asc | desc : string,升降序
- col2 : columnName,排序字段
sort_array & collect_set & concat_ws(等同wm_concat )
select dt
,appid
,concat_ws('|',sort_array(collect_set(concat_ws(':',cnt,uv))))as uv_str
from(
select dt
,appid
,cnt
,count(distinct user_id) as uv
from test_table
group by dt,appid,cnt
)tmp
group by dt,appid
concat_ws功能:使用给定的分隔符将多个输入字符串列连接到一个字符串列中
collect_set功能:聚合函数,返回去除重复元素后的集合。
sort_array功能:集合函数,对输入数组元素按升序或降序排序。空元素将按升序放在返回数组的开头,或按降序放在返回数组的末尾。