MySQL千万级数据去重删除,保留主键最小一条数据,响应时间在1秒以内
1年前
阅读 2870
评论 0
喜欢 0
最近有个徒弟问我,MySQL中千万级的数据表要如何快速删除重复值,并只保留一条结果。
他原来的方案是新建一个临时表,然后将重发条件设置为唯一主键,然后导入数据并关闭重复提示继续向下执行。
这个方案虽然有用,但是速度太慢,千万级数据过滤起码得40分钟左右。
所以我就想到了用SQL做SELECT查询DELETE,测试表结构如下:
```sql
CREATE TABLE `channlid` (
`i` int(11) NOT NULL AUTO_INCREMENT,
`id` varchar(48) NOT NULL,
PRIMARY KEY (`i`),
KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4000138 DEFAULT CHARSET=utf8;
```
### 步骤如下:
#### 一、先查询所有重复值:
```sql
SELECT
B.*
FROM
(
SELECT
*
FROM
channlid
GROUP BY
id
HAVING
count(*) > 1
) AS A
LEFT JOIN channlid AS B ON A.id = B.id;
```
#### 二、查询所有重复值的最小ID一条记录:
```sql
SELECT
B.*, min(B.i) AS delete_id
FROM
(
SELECT
*
FROM
channlid
GROUP BY
id
HAVING
count(*) > 1
) AS A
LEFT JOIN channlid AS B ON A.id = B.id
GROUP BY
B.id;
```
#### 三、删除除了最小一条ID记录外的其余重复记录,方案一,开销大,时间慢
```sql
DELETE
FROM
channlid
WHERE
i IN (
SELECT
C.i
FROM
(
SELECT
D.i
FROM
(
SELECT
B.*, min(B.i) AS delete_id
FROM
(
SELECT
*
FROM
channlid
GROUP BY
id
HAVING
count(*) > 1
) AS A
LEFT JOIN channlid AS B ON A.id = B.id
GROUP BY
B.id
) AS C
LEFT JOIN channlid AS D ON C.id = D.id
WHERE
C.delete_id != D.i
AND C.id = D.id
) C
);
```
#### 四、删除除了最小一条ID记录外的其余重复记录,方案二,开销小,时间快
```sql
DELETE channlid
FROM
channlid,
(
SELECT
B.id,
min(B.i) AS delete_id
FROM
(
SELECT
*
FROM
channlid
GROUP BY
id
HAVING
count(*) > 1
) AS A
LEFT JOIN channlid AS B ON A.id = B.id
GROUP BY
B.id
) AS delete_from
WHERE
(
channlid.id = delete_from.id
AND channlid.i != delete_from.delete_id
);
```
推荐使用方案二,同时注意,在MySQL中,delete语句要是想连着select语句,就必须做个临时表声明,不然会报错,这点特别坑。
© 著作权归作者所有
本文介绍了如何在MySQL中高效地删除千万级数据表的重复记录,通过查询找到重复值并保留主键最小的记录,提供了两种删除方案,强调了方案二的高效性。
6582

被折叠的 条评论
为什么被折叠?



