distinct去重多个字段,其他字段不去重,一起输出
例如:
select AA, BB, CC
from tableName;
要求是对AA,BB这两个字段都去重,CC是不需要去重的,最后一起输出
求解!!!
下面的内容全部是在Hive/Sparl SQL中实现的, 因为MySQL5.7这种没有窗口函数
关于distinct需要去重多个字段的问题, 下面的例子希望可以给你们带来启发, 大家一起讨论一下
distinct去重多个字段可以考虑 row_number() over(partition by 字段1[,字段2] order by 日期) 这种形式:
原数据:
-- 原表中中的数据:
select id
,substring(recode_date, 1, 10) as create_date
,device_id
from dwd.dwd_ce_user_log
where substring(recode_date, 1, 10) = '2020-08-24'
and device_id = '00de6cbcf0bbc87f';
id create_date device_id
76052968 2020-08-24 00de6cbcf0bbc87f
76052973 2020-08-24 00de6cbcf0bbc87f
76061303 2020-08-24 00de6cbcf0bbc87f
76061309 2020-08-24 00de6cbcf0bbc87f
76090080 2020-08-24 00de6cbcf0bbc87f
76129054 2020-08-24 00de6cbcf0bbc87f
76139129 2020-08-24 00de6cbcf0bbc87f
76039128 2020-08-24 00de6cbcf0bbc87f
76039129 2020-08-24 00de6cbcf0bbc87f
76042362 2020-08-24 00de6cbcf0bbc87f
76042363 2020-08-24 00de6cbcf0bbc87f
用distinct去重多个字段
-- 用distinct去重多个字段
select distinct substring(recode_date, 1, 10) as create_date
,device_id
from dwd.dwd_ce_user_log
where substring(recode_date, 1, 10) = '2020-08-24'
and device_id = '00de6cbcf0bbc87f';
create_date device_id
2020-08-24 00de6cbcf0bbc87f
用row_number() over(partition by 字段1[,字段2] order by 字段)
-- 用row_number() over(partition by 字段1[,字段2] order by 字段)
select substring(recode_date, 1, 10) as create_date
,device_id
from (
select *, row_number() over(partition by device_id, substring(recode_date,1,10) order by recode_date desc) rank
from dwd.dwd_ce_user_log
) tmp
where tmp.rank =1
and substring(recode_date, 1, 10) = '2020-08-24'
and device_id = '00de6cbcf0bbc87f';
create_date device_id
2020-08-24 00de6cbcf0bbc87f