I have following table in hive
user-id, user-name, user-address,clicks,impressions,page-id,page-name
I need to find out top 5 users[user-id,user-name,user-address] by clicks for each page [page-id,page-name]
I understand that we need to first group by [page-id,page-name] and within each group I want to orderby [clicks,impressions] desc and then emit only top 5 users[user-id, user-name, user-address] for each page but I am finding it difficult to construct the query.
How can we do this using HIve UDF ?
解决方案SELECT page-id, user-id, clicks
FROM (
SELECT page-id, user-id, rank(user-id) as rank, clicks
FROM mytable
DISTRIBUTE BY page-id, user-id
SORT BY page-id, user-id, clicks desc
) a
WHERE rank < 5
ORDER BY page-id, rank