I'm facing a very poorly designed database with a non-normalized table X.
This table X should have a N:M relationship with another table Y.
The problem is that this relationship is currently 1:N and the jerry-rigged solution until now was to duplicate the entries when there was various registries to be related.
Simplifying, I have this:
| ID | TEXT | LOCATION_ID |
| 1 | foo | 1 |
| 2 | foo | 2 |
| 3 | bar | 1 |
| 4 | bar | 4 |
| 5 | bar | 3 |
I have to normalize this table. So, my first idea was try to obtain pairs of similar registries. Something like this:
| a.ID | b.ID |
| 1 | 2 |
| 3 | 4 |
| 3 | 5 |
Experimenting a little bit:
SELECT a.id, b.id
FROM mytable AS a
INNER JOIN mytable AS b
ON a.text = b.text AND a.id != b.id
GROUP BY a.id, b.id
This lead to a problem like this:
| a.ID | b.ID |
| 1 | 2 |
| 2 | 1 |
| 3 | 4 |
| 3 | 5 |
| 4 | 3 |
| 4 | 5 |
| 5 | 3 |
| 5 | 4 |
The pairs were duplicated.
After some digging, I realized that this was more efficient:
SELECT a.id, b.id
FROM mytable AS a
INNER JOIN mytable AS b
ON a.text = b.text AND a.id < b.id
GROUP BY a.id, b.id
So, I got this:
| a.ID | b.ID |
| 1 | 2 |
| 3 | 4 |
| 3 | 5 |
| 4 | 5 |
But I still need to get rid of that last register.
解决方案
Group on only one side and take the MIN() of the other:
SELECT MIN(a.ID) a, b.ID b
FROM mytable a JOIN mytable b ON b.text = a.text AND b.ID > a.ID
GROUP BY b.ID
See it on sqlfiddle.