您可以使用以下方式获得仅观看一个类别的用户:
select user_id, min(category) as category, sum(view_hour)
from kcp_01
group by user_id
having min(category) = max(category);
请注意 min(category) 是类别,因为只有一个类别 .
在Hive中,您可能需要执行以下操作:
select user_id, min(category) as min_category,
max(category) as max_category, sum(view_hour)
from kcp_01
group by user_id
having min_category = max_category;
如果您想要每个类别的此类用户数,那么子查询会很有帮助:
select category, count(*) as number_single_users
from (select user_id, min(category) as min_category, max(category) as max_category, sum(view_hour) as view_hours
from kcp_01
group by user_id
having min_category) = max_category
) u
group by category;
另一种方法是使用 not exists :
select k.category, count(distinct k.user_id)
from kcp_01 k
where not exists (select 1
from kcp_01 k2
where k2.user_id = k.user_id and
k2.category <> k.category
)
group by k.category;
我记得 - 曾几何时 - Hive对 count(distinct) 有困难,所以两个 group by 方法可能会更好 .