Here is my situation: I have 2 tables, patient and study.
Each table has its own PK using autoincrement.
In my case, the pat_id should be unique. It's not declared as unique at database level since it could be non unique is some uses (it's not a home made system). I found out how to configure the system to consider the pat_id as unique, but I need now to cleanup the database for duplicated patients AND relink duplicated patients in study table to remaining unique patient, before deleting the duplicated patients.
Patient table:
CREATE TABLE `patient` (
`pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
`pat_id` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
`pat_name` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
`pat_custom1` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL
....
PRIMARY KEY (`pk`)
)ENGINE=InnoDB;
Study table:
CREATE TABLE `study` (
`pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
`patient_fk` BIGINT(20) DEFAULT NULL,
...
PRIMARY KEY (`pk`),
...
CONSTRAINT `patient_fk` FOREIGN KEY (`patient_fk`) REFERENCES `patient` (`pk`)
)ENGINE=InnoDB;
I found some similar questions, but not exactly the same issue, especially it was missing the link of the foreign keys to the remaining unique patient.
解决方案
This is how I did.
I reused an unused field in patient table to mark non duplicated (N), 1st of duplicated (X), and other duplicated patients (Y). You could also add a column for this (and drop it after use).
Here are the steps I followed to cleanup my database:
/*1: List duplicated */
select pk,pat_id, t.`pat_id_issuer`, t.`pat_name`, t.pat_custom1
from patient t
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient
group by 1
having count(*)>1
) xxx);
/*2: Delete orphan patients */
delete from patient where pk not in (select patient_fk from study);
/*3: Reset flag for duplicated (or not) patients*/
update patient t set t.`pat_custom1`='N';
/*4: Mark all duplicated */
update patient t set t.`pat_custom1`='Y'
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient
group by 1
having count(*)>1
) xxx) ;
/*5: Unmark the 1st of the duplicated*/
update patient t
join (select pk from (
select min(pk) as pk, pat_id from patient
where pat_custom1='Y'
group by pat_id
) xxx ) x
on (x.pk=t.pk)
set t.`pat_custom1`='X'
where pat_custom1='Y'
;
/*6: Verify update is correct*/
select pk, pat_id,pat_custom1
from `patient`
where pat_custom1!='N'
order by pat_id, pat_custom1;
/*7: Verify studies linked to duplicated patient */
select p.* from study s
join patient p on (p.pk=s.patient_fk)
where p.pat_custom1='Y';
/*8: Relink duplicated patients */
update study s
join patient p on (p.pk=s.patient_fk)
set patient_fk = (select pk from patient pp
where pp.pat_id=p.pat_id and pp.pat_custom1='X')
where p.pat_custom1='Y';
/*9: Delete newly orphan patients */
delete from patient where pk not in (select patient_fk from study);
/* 10: reset flag */
update patient t set t.`pat_custom1`=null;
/* 11: Commit changes */
commit;
There is certainly a shorter way, with a some smarter (complicated?) SQL, but I personally prefer the simple way. This also allows me to check each step is doing what I expect.