缘起
由于起初mysql数据表设计考虑不周,导致后续表里出现重复数据,这里的重复指的是多个列组合起来数据一样。现期望多个列组合数据唯一,首先删除重复,然后加多列组合唯一索引。
本文以
col1
,col2
和col3
三列组合为唯一索引。数据表名为table_name
这时如果使用:
alter table table_name add unique "uk_index" (col1, col2, col3)
mysql会提示重复,报错。
原因在于数据表里已经有重复数据。
如何去重呢?
- 首先找出重复行:
select
col1, col2, col3
from
table_name
group by
col1, col2, col3
having
count(*) > 1
- 使用
col1
,col2
和col3
将table_name
和上面的sql结果连接:
select
t1.*
from
table_name t1
join
(select col1, col2, col3 from table_name group by col1, col2, col3 having count(*) > 1) t2
on
t1.col1 = t2.col1 and
t1.col2 = t2.col2 and
t1.col3 = t2.col3
这里就筛选出所有重复的记录。假设重复数都为2,则需要将其中一条删除。这里使用group by
取重复中都一条ID。
3. 取ID
select
t1.id
from
table_name t1
join
(select col1, col2, col3 from table_name group by col1, col2, col3 having count(*) > 1) t2
on
t1.col1 = t2.col1 and
t1.col2 = t2.col2 and
t1.col3 = t2.col3
group by
col1, col2, col3
这里取出了需要删除的ID,如何使用这些ID进行删除呢?
4. 删除
使用delete from where id in ()
删除。
delete from
table_name
where
id
in (
select
t1.id
from
table_name t1
join
(select col1, col2, col3 from table_name group by col1, col2, col3 having count(*) > 1) t2
on
t1.col1 = t2.col1 and
t1.col2 = t2.col2 and
t1.col3 = t2.col3
group by
col1, col2, col3
)
这条sql会报错,原因在于mysql禁止在一个表里同时子查询与修改。我们可以分两步做,分开执行select
和delete
操作。下面偷懒一下,用一个比较hack的方法:在子查询中在查询一次并将结果做一个alias。
delete from
table_name
where
id
in (
select
*
from (
select
t1.id
from
table_name t1
join
(select col1, col2, col3 from table_name group by col1, col2, col3 having count(*) > 1) t2
on
t1.col1 = t2.col1 and
t1.col2 = t2.col2 and
t1.col3 = t2.col3
group by
col1, col2, col3
)as p
)
添加组合唯一索引
alter table table_name add unique "uk_index" (col1, col2, col3)