背景:
播放数据有2个来源,根据2个来源取每个用户最大的播放时长作为最后结果。
因为sort_array不支持倒序排列,只能根据arr[1]来确认是最大值(因为只有2个数据源)
select
arr,arr[0],arr[1]
from (select
name,
collect_list(play_duration_ms) ,
sort_array(collect_list(play_duration_ms)) arr,
arr[1] --最大的播放时长
--sort_array(array(play_duration_ms,heart_passby_duration)) arr
from(
select '1' as name,'200' as play_duration_ms
union all
select '1' as name,'205' as play_duration_ms
) t group by name
) t1
由collect_list形成的列表经过concat_ws拼接后顺序具有随机性,要保证列表有序只需要在生成列表后使用sort_array函数进行排序即可,
示例如下:
SELECT
memberid,
regexp_replace(
concat_ws('-',
sort_array(
collect_list(
concat_ws(':',cast(legcount as string),airways)
)
)
),'\\d\:','') hs
from
(
select 1 as memberid,'A' as airways,2 as legcount
union ALL
select 1 as memberid,'B' as airways,3 as legcount
union ALL
select 1 as memberid,'C' as airways,4 as legcount
union ALL
select 1 as memberid,'D' as airways,1 as legcount
union ALL
select 1 as memberid,'E' as airways,8 as legcount
) as t
group by memberid
构造数据(memberid为会员ID,airway为会员预定机票选择的航司,legcount为下单航段)
运行结果: