sort_array默认是从小到大排序的,为了实现能够从大到小排序,需要做两个变换:
(1)将字符串映射成数字:ROW_NUMBER() OVER(PARTITION BY user_id, cate_level1 ORDER BY date desc)
(2)基于数字保持大小序映射到小数:1-1/rnk(采用sigmoid变换会存在一个问题:当数字大于36以后,几乎都等于1.0,无法区分大小,对于小于36以下的数字管用)
(3)添加辅助列,置于串头排序后,借助于REGEXP_REPLACE做替换,还原出原序列,下面是一个例子
insert OVERWRITE table palgo_feeds.taolive_user_click_lifelong_account_seq PARTITION (ds='20210403')
select user_id, cate_level1_id_1_1, REGEXP_REPLACE(CONCAT_WS(';', SORT_ARRAY(COLLECT_SET(concat('!',1-1/rnk,'!', side_info)))), '!(.*?)!','') as side_info_seq, COUNT(*) as true_seq_length
from
(
select user_id, cate_level1_id_1_1, side_info, ROW_NUMBER() OVER(PARTITION BY user_id, cate_level1_id_1_1 ORDER BY SPLIT(side_info,'#')[9] desc) AS rnk
from palgo_feeds.taolive_long_life_user_click_account
where ds=20210403
) a
group by user_id, cate_level1_id_1_1;