mysql 写入400_mysql数据库400W如何处理数据去重

使用select count查看一下数据库数据量

mysql> select count(*) from zyads_integral ;

+----------+

| count(*) |

+----------+

| 4130473 |

+----------+

1 row in set (0.01 sec)

`desc查看一下数据表结构

mysql> desc zyads_integral;

+-------+---------+------+-----+---------+----------------+

| Field | Type | Null | Key | Default | Extra |

+-------+---------+------+-----+---------+----------------+

| id | int(11) | NO | PRI | NULL | auto_increment |

| hash | text | YES | | NULL | |

| sha1 | text | NO | | NULL | |

| name | text | NO | | NULL | |

| index | text | YES | | NULL | |

| size | text | YES | | NULL | |

+-------+---------+------+-----+---------+----------------+

6 rows in set (0.01 sec)

样例数据

mysql> select * from zyads_integral limit 1\G

*************************** 1. row ***************************

id: 6721212

hash: 0FA565EEFA9E688B1F87640815EE090C7326725D

sha1: 8c907b045bb7905cf2a63f0b1208eeb3bca857d6

name: 【无效链接】xxxxxx.html

index: 107

size: 78110108

1 row in set (0.01 sec)

接下来开始去掉重复数据

mysql> select id, sha1, count(*) from zyads_integral group by sha1 limit 10;

+---------+------------------------------------------+----------+

| id | sha1 | count(*) |

+---------+------------------------------------------+----------+

| 7696 | | 1 |

| 5137851 | 0000000000000000000000000000000005325911 | 2 |

| 5363699 | 00000000000000000000000000000000097ecf88 | 5 |

| 4826139 | 000000000000000000000000000000000fd81983 | 1 |

| 6250586 | 000000000000000000000000000000001b41f909 | 1 |

| 5597063 | 000000000000000000000000000000001d385b7c | 2 |

| 5281295 | 000000000000000000000000000000002a91e078 | 2 |

| 6331972 | 000000000000000000000000000000003488380d | 2 |

| 4774906 | 00000000000000000000000000000000397db43d | 1 |

| 4550736 | 00000000000000000000000000000000494ec71f | 1 |

+---------+------------------------------------------+----------+

10 rows in set (24.71 sec)

mysql> select count(*) from zyads_integral where sha1= '0000000000000000000000000000000005325911';

+----------+

| count(*) |

+----------+

| 2 |

+----------+

1 row in set (1.03 sec)

mysql> select id, count(*) from zyads_integral group by sha1 having count(*) > 1;

+---------+----------+

| id | count(*) |

+---------+----------+

| 5137851 | 2 |

| 5363699 | 5 |

| 5597063 | 2 |

| 5281295 | 2 |

...

| 4712249 | 6 |

| 1581236 | 3 |

| 5126827 | 2 |

| 1872277 | 7 |

+---------+----------+

836343 rows in set (33.77 sec)

mysql> select id from zyads_integral group by sha1 having count(*) >= 1;

+---------+

| id |

+---------+

| 7696 |

| 5137851 |

| 5363699 |

| 4826139 |

| 6250586 |

...

| 5126827 |

| 570573 |

| 1872277 |

| 4514446 |

+---------+

2466076 rows in set (3 min 36.80 sec)

删除数据

mysql> delete from zyads_integral where id in (select a.id from (select id from zyads_integral group by sha1 having count(*) > 1) a);

CREATE TABLE `zyads_integral_tmp` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`hash` varchar(100),

`sha1` varchar(100) NOT NULL,

`name` varchar(1000) NOT NULL,

`index` varchar(10),

`size` varchar(10),

UNIQUE KEY `sha1` (`sha1`),

PRIMARY KEY (`id`)

) ENGINE=MyISAM AUTO_INCREMENT=6756155 DEFAULT CHARSET=gbk

INSERT INTO zyads_integral_tmp (`hash`,`sha1`,`name`,`index`,`size`) SELECT `hash`,`sha1`,`name`,`index`,`size` from zyads_integral group by sha1 having count(*)>=1;

mysql> rename zyads_integral zyads_integral_tmp_1 ;

subscribe_qrcode.jpg

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值