mysql 清除链接,MySQL从重复的条目清除表,并在依赖表中重新链接FK

Here is my situation: I have 2 tables, patient and study.

Each table has its own PK using autoincrement.

In my case, the pat_id should be unique. It's not declared as unique at database level since it could be non unique is some uses (it's not a home made system). I found out how to configure the system to consider the pat_id as unique, but I need now to cleanup the database for duplicated patients AND relink duplicated patients in study table to remaining unique patient, before deleting the duplicated patients.

Patient table:

CREATE TABLE `patient` (

`pk` BIGINT(20) NOT NULL AUTO_INCREMENT,

`pat_id` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,

...

`pat_name` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,

...

`pat_custom1` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL

....

PRIMARY KEY (`pk`)

)ENGINE=InnoDB;

Study table:

CREATE TABLE `study` (

`pk` BIGINT(20) NOT NULL AUTO_INCREMENT,

`patient_fk` BIGINT(20) DEFAULT NULL,

...

PRIMARY KEY (`pk`),

...

CONSTRAINT `patient_fk` FOREIGN KEY (`patient_fk`) REFERENCES `patient` (`pk`)

)ENGINE=InnoDB;

I found some similar questions, but not exactly the same issue, especially it was missing the link of the foreign keys to the remaining unique patient.

解决方案

This is how I did.

I reused an unused field in patient table to mark non duplicated (N), 1st of duplicated (X), and other duplicated patients (Y). You could also add a column for this (and drop it after use).

Here are the steps I followed to cleanup my database:

/*1: List duplicated */

select pk,pat_id, t.`pat_id_issuer`, t.`pat_name`, t.pat_custom1

from patient t

where pat_id in (

select pat_id from (

select pat_id, count(*)

from patient

group by 1

having count(*)>1

) xxx);

/*2: Delete orphan patients */

delete from patient where pk not in (select patient_fk from study);

/*3: Reset flag for duplicated (or not) patients*/

update patient t set t.`pat_custom1`='N';

/*4: Mark all duplicated */

update patient t set t.`pat_custom1`='Y'

where pat_id in (

select pat_id from (

select pat_id, count(*)

from patient

group by 1

having count(*)>1

) xxx) ;

/*5: Unmark the 1st of the duplicated*/

update patient t

join (select pk from (

select min(pk) as pk, pat_id from patient

where pat_custom1='Y'

group by pat_id

) xxx ) x

on (x.pk=t.pk)

set t.`pat_custom1`='X'

where pat_custom1='Y'

;

/*6: Verify update is correct*/

select pk, pat_id,pat_custom1

from `patient`

where pat_custom1!='N'

order by pat_id, pat_custom1;

/*7: Verify studies linked to duplicated patient */

select p.* from study s

join patient p on (p.pk=s.patient_fk)

where p.pat_custom1='Y';

/*8: Relink duplicated patients */

update study s

join patient p on (p.pk=s.patient_fk)

set patient_fk = (select pk from patient pp

where pp.pat_id=p.pat_id and pp.pat_custom1='X')

where p.pat_custom1='Y';

/*9: Delete newly orphan patients */

delete from patient where pk not in (select patient_fk from study);

/* 10: reset flag */

update patient t set t.`pat_custom1`=null;

/* 11: Commit changes */

commit;

There is certainly a shorter way, with a some smarter (complicated?) SQL, but I personally prefer the simple way. This also allows me to check each step is doing what I expect.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值