原groupby 逻辑:
根据 user_id 聚合拼接tag的值
SELECT
user_id_value,
concat_ws(';',collect_set(tag_id))
FROM inputTable
group by user_id_value
现在需要在此基础上, 根据score字段取topn:
SELECT
user_id_value,
concat_ws(';',collect_set(tag_id))
FROM (
SELECT
user_id_value,
tag_id,
row_number() over (partition by user_id_value order by score desc ) id_rank,
score
FROM (
SELECT user_id_value, tag_id, score FROM inputTable
)a
)b
WHERE id_rank < 10
group by user_id_value
参考:
hive分组排序 取top N_longshenlmj的专栏-CSDN博客_hive分组排序取top
user defined functions - Hive getting top n records in group by query - Stack Overflow