mysql去除表中重复的行_从MySql表中删除重复的行

bd96500e110b49cbb3cd949968f18be7.png

I have a script to find duplicate rows in my MySql table, the table contains 40,000,000 rows. but it is very slow going, is there an easier way to find the duplicate records without going in and out of php?

This is the script i currently use

$find = mysql_query("SELECT * FROM pst_nw ID < '1000'");

while ($row = mysql_fetch_assoc($find))

{

$find_1 = mysql_query("SELECT * FROM pst_nw add1 = '$row[add1]' AND add2 = '$row[add2]' AND add3 = '$row[add3]' AND add4 = '$row[add4]'");

if (mysql_num_rows($find_1) > 0) {

mysql_query("DELETE FROM pst_nw WHERE ID ='$row[ID]'}

}

解决方案

You have a number of options.

Let the DB do the work

Create a copy of your table with a unique index - and then insert the data into it from your source table:

CREATE TABLE clean LIKE pst_nw;

ALTER IGNORE TABLE clean ADD UNIQUE INDEX (add1, add2, add3, add4);

INSERT IGNORE INTO clean SELECT * FROM pst_nw;

DROP TABLE pst_nw;

RENAME TABLE clean pst_nw;

The advantage of doing things this way is you can verify that your new table is correct before dropping your source table. The disadvantage is it takes up twice as much space and is (relatively) slow to execute.

Let the DB do the work #2

You can also achieve the result you want by doing:

set session old_alter_table=1;

ALTER IGNORE TABLE pst_nw ADD UNIQUE INDEX (add1, add2, add3, add4);

The first command is required as a workaround for the ignore flag being .. ignored

The advantage here is there's no messing about with a temporary table - the disadvantage is you don't get to check that your update does exactly what you expect before you run it.

Example:

CREATE TABLE `foo` (

`id` int(10) NOT NULL AUTO_INCREMENT,

`one` int(10) DEFAULT NULL,

`two` int(10) DEFAULT NULL,

PRIMARY KEY (`id`)

)

insert into foo values (null, 1, 1);

insert into foo values (null, 1, 1);

insert into foo values (null, 1, 1);

select * from foo;

+----+------+------+

| id | one | two |

+----+------+------+

| 1 | 1 | 1 |

| 2 | 1 | 1 |

| 3 | 1 | 1 |

+----+------+------+

3 row in set (0.00 sec)

set session old_alter_table=1;

ALTER IGNORE TABLE foo ADD UNIQUE INDEX (one, two);

select * from foo;

+----+------+------+

| id | one | two |

+----+------+------+

| 1 | 1 | 1 |

+----+------+------+

1 row in set (0.00 sec)

Don't do this kind of thing outside the DB

Especially with 40 million rows doing something like this outside the db is likely to take a huge amount of time, and may not complete at all. Any solution that stays in the db will be faster, and more robust.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值