说明:一个公司有多个业务biz_id,每个业务收款的时候通过不同或者相同的account去收款
业务需要 需要计算出该公司有过滤重复后biz_count 和过滤重复后的account_count
1先来个distinct ,占用内存,容易oom,所以一般都会把distinct改为group by
select company_name,count(distinct biz_id),count(distinct account_id)
from odsiadata.ia_fdw_b_profile_account_info
where company_id ='1001000000113619'
group by company_name
2不用distinct,想到了 collect_set ,个人感觉应该和distinct差不多也会占用内存,有大佬知道哪里有distinct占内存或者collect_set占内存出处留言让我学习下
select company_name,size(collect_set(biz_id)),size(collect_set(account_id))
from odsiadata.ia_fdw_b_profile_account_info
where company_id ='1001000000113619'
group by company_name
3用group by 借鉴https://blog.csdn.net/henrrywan/article/details/86597468 但是他的sql不完整感觉像复制的。。。。不管
select company_name ,
sum(IF(t_type='biz',1,0)) biz_count, --记住用sum。。。之前用count浪费了会时间
sum(if(t_type='account',1,0)) account_num , --记住用sum。。。之前用count浪费了会时间
MIN(created_time)
from(
select company_name,biz_id as col ,"biz" as t_type ,min(created_time) created_time from odsiadata.ia_fdw_b_profile_account_info
group by company_name,biz_id
union
select company_name,account_id as col ,"account" as t_type,min(created_time) created_time from odsiadata.ia_fdw_b_profile_account_info
group by company_name,account_id
)t
GROUP by company_name