hive中count(distinct)的相关优化

本文探讨了在SQL查询中优化单个及多个`count(distinct)`的方法。针对单个`count(distinct)`,提出了使用分组、加盐和去盐策略来提升性能。对于多`count(distinct)`的情况,通过子查询和`grouping sets`来减少复杂性和提高效率。这些技巧有助于在大数据场景下更高效地执行聚合查询。
摘要由CSDN通过智能技术生成

1、单个count(distinct)

例子:

select count(distinct id) as num from table_name

方法1(简单,但很多时候效果不明显)

select count(id) as num from (select id from table_name group by id) a;

方法2(对分组字段进行加盐操作)

select sum(tag_num)  as num
from
    (select  tag
            ,count(id)  as tag_num--每个标记下的个数
    from
        (select  id
                ,CAST(RAND() * 100 AS BIGINT) tag --随机打上标记,标记为:0-100之间的整数
        from table_name 
        group by id
        ) t
    group by tag
    ) t
;

方法3(对分组字段进行去盐操作)

select sum(part_num)  as num
from
    (select  substr(id, 1, 3)              as id_part  --截取id部分做维度
            ,COUNT(DISTINCT substr(id, 4)) AS part_num --每个被截取的id下的个数
    from table_name
    group by substr(id, 1, 3)
    ) t
;

2、多count(distinct)

例子:

SELECT   id
        ,count(distinct id1)   as num1
        ,count(distinct id2)   as num2
        ,count(distinct id3)   as num3
FROM table_name
group by id
;

方法1(书写复杂,并多次读表,效率低)

select   id
        ,sum(if(logo='col1',num,0))  as num1
        ,sum(if(logo='col2',num,0))  as num2
        ,sum(if(logo='col3',num,0))  as num3
from
    (select id,'col1' as logo,count(id1) as num 
    from
        (SELECT  id,id1   
        FROM table_name
        group by id,id1
        ) t
    group by id
    union all
    select id,'col2' as logo,count(id2) as num 
    from
        (SELECT  id,id2   
        FROM table_name
        group by id,id2
        ) t
    group by id
    union all
    select id,'col3' as logo,count(id3) as num 
    from
        (SELECT  id,id3   
        FROM table_name
        group by id,id3
        ) t
    group by id
    ) t
group by id
;

方法2

SELECT   id
        ,count(id1)   as num1
        ,count(id2)   as num2
        ,count(id3)   as num3
from
    (SELECT   id,id1,id2,id3
    FROM table_name
    group by id,id1,id2,id3
    grouping sets((id,id1),(id,id2),(id,id3))
    ) t
group by id
;

多count(distinct)的情况下,可以结合但单count(distinct)的解决方法就行解决

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值