mysql 数据查重与查重分页

起因是公司的crm录入不规范,有重复数据。

之后考虑到需要手动处理,首先需要自动找出重复的数据

查重要求:

存在多个不允许重复的字段,任一字段重复,则判断为同一个客户。划分到同一重复组中。

查重sql如下

SELECT 
    CONCAT('组', dense_rank() OVER (ORDER BY group_key)) AS group_info,
    id, name, tel
FROM (
    SELECT 
        id, name, tel,
        CONCAT_WS('-', 
            IF(name IS NULL, '', name),  
            IF(tel IS NULL, '', '1')
        ) AS group_key,
        COUNT(*) OVER (PARTITION BY CONCAT_WS('-', 
            IF(name IS NULL, '', name),  
            IF(tel IS NULL, '', '1')
        )) AS group_count
    FROM customer
) AS subquery
WHERE group_count > 1
ORDER BY group_key

通过上述sql可以获得一个查重的sql。

那么接下来的问题是如果重复数据大,我们是需要一个分页的。首先我们需要知道重复了多少组。

用以下sql获取:

SELECT COUNT(*) AS total_count 
FROM (
    SELECT DISTINCT group_info
    FROM (

SELECT 
    CONCAT('组', dense_rank() OVER (ORDER BY group_key)) AS group_info,
    id, name, tel
FROM (
    SELECT 
        id, name, tel,
        CONCAT_WS('-', 
            IF(name IS NULL, '', name),  
            IF(tel IS NULL, '', '1')
        ) AS group_key,
        COUNT(*) OVER (PARTITION BY CONCAT_WS('-', 
            IF(name IS NULL, '', name),  
            IF(tel IS NULL, '', '1')
        )) AS group_count
    FROM customer
) AS subquery
WHERE group_count > 1
ORDER BY group_key


    ) AS data_counts
) AS distinct_groups;

这里的分页应当按照重复组分页,传统的分页就失效了。所以我们就需要手动计算分页。

把分组字段group_info中的组字去掉,按group_info进行大于,小于判断

SELECT 
    dense_rank() OVER (ORDER BY group_key)  AS group_info,
    id, name, tel
FROM (
    SELECT 
        id, name, tel,
        CONCAT_WS('-', 
            IF(name IS NULL, '', name),  
            IF(tel IS NULL, '', '1')
        ) AS group_key,
        COUNT(*) OVER (PARTITION BY CONCAT_WS('-', 
            IF(name IS NULL, '', name),  
            IF(tel IS NULL, '', '1')
        )) AS group_count
    FROM customer
) AS subquery
WHERE group_count > 1
AND group_info>0 AND group_info<11
ORDER BY group_key

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

孤独白鲸

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值