关于HQL使用count()报错的坑

报错信息:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT user_id)
deviating function: count(DISTINCT CASE WHEN status = 1 THEN user_id
ELSE NULL END) Consider using NDV() instead of COUNT(DISTINCT)
if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT
query option to perform this rewrite automatically.

报错代码如下

select 
	grad_id,
    count(distinct user_id) as total_users,
    count(distinct case when status = 1 then user_id else null end) as distinct_total_users
from 
    student_center_info
group by 
    grad_id;

运行报错信息如下:
AnalysisException: all DISTINCT aggregate functions need to have the
same set of parameters as count(DISTINCT user_id)
deviating function: count(DISTINCT CASE WHEN status = 1 THEN user_id
ELSE NULL END) Consider using NDV() instead of COUNT(DISTINCT)
if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT
query option to perform this rewrite automatically.

出现这个个报错的原因是因为count()按某个列统计时,只能使用一次,如上面的代码,使用两次count()对user_id统计时,就会报错,解决办法如下:

select 
    a.grad_id,
    a.total_user,
    b.total_users
from
    (select 
        grad_id,
        count(distinct user_id) as total_users
    from 
        student_center_info
    group by 
        grad_id) as a
 left join
     (select 
            grad_id,
            count(distinct case when status = 1 then user_id else null end) as total_users
        from 
            student_center_info
    group by 
        grad_id) as b
 on 
     a.grad_id = b.grad_id 

如果用impala的话,可以用ndv()函数来代替,但是ndv()这个去重不是准确的,如果对数据要求精度比较高的,不建议使用ndv()。而且hive对它的支持也不是很友好。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值