【MySQL】记一次数据去重

有时候,因为开发时,考虑不当,会造成数据重复。

背景:

前几天做项目的时候,有一个给ip打标签的操作,但是当数据数量多了以后,发现,同一个ip被打了两次相同的标签,导致数据显示重复,然而,已经使用了一段时间了,只能在原有数据库上删除重复数据,然后修改添加标签的逻辑了,去重操作如下:

mysql> select * from vip_port_tag;
+-------+----------------+-------+-------------------+
| id    | vip            | port  | tag               |
+-------+----------------+-------+-------------------+
| 32446 | 10.172.95.38   | 19527 | ttttttttttttttttt |
| 32447 | 111.206.52.82  |    80 | ttttttttttttttttt |
| 32448 | 221.181.72.212 |   443 | ttttttttttttttttt |
| 32449 | 182.118.31.28  |  2229 | ttttttttttttttttt |
| 32450 | 10.172.95.38   | 19527 | ttttttttttttttttt |
| 32451 | 111.206.52.82  |    80 | ttttttttttttttttt |
+-------+----------------+-------+-------------------+
6 rows in set

mysql> 

可以很明显的看到,vip为10.172.95.38,port为19527还有vip为111.206.52.82,port为90的被连续两次打了相同的标签。

去重思想,根据同一tag,还有vip,port,分组,然后统计每组的个数,大于1的,即为有重复的。

mysql> select id,count(*) as cnt,tag,vip,port from vip_port_tag group by tag,vip,port;
+-------+-----+-------------------+----------------+-------+
| id    | cnt | tag               | vip            | port  |
+-------+-----+-------------------+----------------+-------+
| 32446 |   2 | ttttttttttttttttt | 10.172.95.38   | 19527 |
| 32447 |   2 | ttttttttttttttttt | 111.206.52.82  |    80 |
| 32449 |   1 | ttttttttttttttttt | 182.118.31.28  |  2229 |
| 32448 |   1 | ttttttttttttttttt | 221.181.72.212 |   443 |
+-------+-----+-------------------+----------------+-------+
4 rows in set

cnt那一列,大于1的就为重复列

mysql> select id,count(*) as cnt,tag,vip,port from vip_port_tag where id>32445 group by tag,vip,port having cnt>1;
+-------+-----+-------------------+---------------+-------+
| id    | cnt | tag               | vip           | port  |
+-------+-----+-------------------+---------------+-------+
| 32446 |   2 | ttttttttttttttttt | 10.172.95.38  | 19527 |
| 32447 |   2 | ttttttttttttttttt | 111.206.52.82 |    80 |
+-------+-----+-------------------+---------------+-------+
2 rows in set

检验

mysql> select * from vip_port_tag where vip="10.172.95.38" and port="19527";
+-------+--------------+-------+-------------------+
| id    | vip          | port  | tag               |
+-------+--------------+-------+-------------------+
| 32446 | 10.172.95.38 | 19527 | ttttttttttttttttt |
| 32450 | 10.172.95.38 | 19527 | ttttttttttttttttt |
+-------+--------------+-------+-------------------+
3 rows in set

然后就可以考虑删除哪一项,比如保留id较小的那个,则选出id最大的那一项

mysql> select max(id) as id,count(*) as cnt,tag,vip,port from vip_port_tag where id>32445 group by tag,vip,port having cnt>1;
+-------+-----+-------------------+---------------+-------+
| id    | cnt | tag               | vip           | port  |
+-------+-----+-------------------+---------------+-------+
| 32450 |   2 | ttttttttttttttttt | 10.172.95.38  | 19527 |
| 32451 |   2 | ttttttttttttttttt | 111.206.52.82 |    80 |
+-------+-----+-------------------+---------------+-------+
2 rows in set

然后根据id执行删除操作即可

mysql> delete from vip_port_tag where id in (select id from (select max(id) as id,count(*) as cnt,tag,vip,port from vip_port_tag group by tag,vip,port having cnt>1) as t);
Query OK, 2 rows affected

再查看原数据

mysql> select * from vip_port_tag;
+-------+----------------+-------+-------------------+
| id    | vip            | port  | tag               |
+-------+----------------+-------+-------------------+
| 32446 | 10.172.95.38   | 19527 | ttttttttttttttttt |
| 32447 | 111.206.52.82  |    80 | ttttttttttttttttt |
| 32448 | 221.181.72.212 |   443 | ttttttttttttttttt |
| 32449 | 182.118.31.28  |  2229 | ttttttttttttttttt |
+-------+----------------+-------+-------------------+
4 rows in set

OK,重复数据,就这么被干掉了。

知识点总结

  • group by语句,可以为多个字段
  • having 选出符合条件的分组
  • count 统计分组个数
  • max 选出分组中id最大的那个
  • 临时表
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值