AO表无法创建主键约束避免重复数据,GP6官方手册描述如下:
由于无法轻易识别源数据的重复项,可以通过加载完所有内容,然后利用唯一标识行的primary key id删除重复项的方式,来去除AO表中的重复项。
数据表分布在不同的节点上,每个节点的ctid是唯一的,但是不同的节点就有ctid重复的可能,因此GP必须借助gp_segment_id来进行去重复处理。
SQL1:不适用于同一条记录的重复项分布到不同的segment
DELETE FROM weather WHERE (gp_segment_id, ctid) NOT IN (SELECT gp_segment_id, min(ctid) FROM weather GROUP BY city, gp_segment_id);
注:weather表示表名, city表示唯一标识行的primary key id(pkid),通过pkid去除重复字段
postgres=# select ctid, gp_segment_id,* from weather;
ctid | gp_segment_id | city | temp_lo | temp_hi | prcp | date
----------------+---------------+---------------+---------+---------+------+------------
(33554432,2) | 2 | San Francisco | 46 | 50 | 0.25 | 1994-11-27
(33554432,202) | 2 | San Francisco | 46 | 50 | 0.25 | 1994-11-27
(33554432,2) | 1 | Hayward | 37 | 54 | | 1994-11-29
(33554432,103) | 1 | Hayward | 48 | 54 | | 1994-11-29
(33554432,202) | 1 | Hayward | 37 | 54 | | 1994-11-29
(33554432,203) | 1 | Hayward | 48 | 54 | | 1994-11-29
postgres=# CREATE TABLE weather (
postgres(# city varchar(80),
postgres(# temp_lo int,
postgres(# temp_hi int,
postgres(# prcp real,
postgres(# date date
postgres(# )WITH(appendonly=true)distributed by(city);
CREATE TABLE
postgres=# INSERT INTO weather VALUES
postgres-# ('San Francisco', 46, 50, 0.25, '1994-11-27'),
postgres-# ('San Francisco', 46, 50, 0.25, '1994-11-27'),
postgres-# ('Hayward', 37, 54, NULL, '1994-11-29'),
postgres-# ('Hayward', 37, 54, NULL, '1994-11-29');
INSERT 0 4
postgres=# SELECT * FROM weather;
city | temp_lo | temp_hi | prcp | date
---------------+---------+---------+------+------------
Hayward | 37 | 54 | | 1994-11-29
Hayward | 37 | 54 | | 1994-11-29
San Francisco | 46 | 50 | 0.25 | 1994-11-27
San Francisco | 46 | 50 | 0.25 | 1994-11-27
(4 rows)
postgres=# DELETE FROM weather WHERE (gp_segment_id, ctid) NOT IN (SELECT gp_segment_id, min(ctid) FROM weather GROUP BY city, gp_segment_id);
DELETE 2
postgres=# SELECT * FROM weather;
city | temp_lo | temp_hi | prcp | date
---------------+---------+---------+------+------------
Hayward | 37 | 54 | | 1994-11-29
San Francisco | 46 | 50 | 0.25 | 1994-11-27
(2 rows)
SQL2:适用于同一条记录的重复项分布到不同的segment
DELETE FROM userserverbind USING
(select gp_segment_id, ctid from
(select gp_segment_id, ctid, rank() over (partition by account, bindtime order by gp_segment_id, ctid) as rk from userserverbind ) as foo
WHERE rk <> 1) rows_to_delete
WHERE userserverbind.gp_segment_id=rows_to_delete.gp_segment_id
AND userserverbind.ctid=rows_to_delete.ctid;
zyq=# select ctid,gp_segment_id, * from userserverbind order by 3,4;
ctid | gp_segment_id | account | bindtime | serviceid | binditemid | value
--------+---------------+---------+--------------+-----------+------------+-------
(0,1) | 0 | 周周1 | BindTime 1 | 1 | |
(0,24) | 3 | 周周1 | BindTime 1 | 1 | |
(0,3) | 1 | 周周10 | BindTime 10 | 10 | |
(0,50) | 1 | 周周100 | BindTime 100 | 100 | |
(0,25) | 0 | 周周100 | BindTime 100 | 100 | |
(0,3) | 0 | 周周11 | BindTime 11 | 11 | |
(0,28) | 1 | 周周11 | BindTime 11 | 11 | |
(0,4) | 1 | 周周12 | BindTime 12 | 12 | |
(0,30) | 2 | 周周12 | BindTime 12 | 12 | |
....
(175 rows)
zyq=# DELETE FROM userserverbind USING
zyq-# (select gp_segment_id, ctid from
zyq(# (select gp_segment_id, ctid, rank() over (partition by account, bindtime order by gp_segment_id, ctid) as rk from userserverbind ) as foo
zyq(# WHERE rk <> 1) rows_to_delete
zyq-# WHERE userserverbind.gp_segment_id=rows_to_delete.gp_segment_id
zyq-# AND userserverbind.ctid=rows_to_delete.ctid;
DELETE 75
zyq=# select * from userserverbind order by 1,2;
account | bindtime | serviceid | binditemid | value
---------+--------------+-----------+------------+-------
周周1 | BindTime 1 | 1 | |
周周10 | BindTime 10 | 10 | |
周周100 | BindTime 100 | 100 | |
周周11 | BindTime 11 | 11 | |
周周12 | BindTime 12 | 12 | |
周周13 | BindTime 13 | 13 | |
周周14 | BindTime 14 | 14 | |
周周15 | BindTime 15 | 15 | |
周周16 | BindTime 16 | 16 | |
周周17 | BindTime 17 | 17 | |
周周18 | BindTime 18 | 18 | |
周周19 | BindTime 19 | 19 | |
周周2 | BindTime 2 | 2 | |
周周20 | BindTime 20 | 20 | |
周周21 | BindTime 21 | 21 | |
周周22 | BindTime 22 | 22 | |
周周23 | BindTime 23 | 23 | |
周周24 | BindTime 24 | 24 | |
周周25 | BindTime 25 | 25 | |
周周26 | BindTime 26 | 26 | |
周周27 | BindTime 27 | 27 | |
周周28 | BindTime 28 | 28 | |
周周29 | BindTime 29 | 29 | |
周周3 | BindTime 3 | 3 | |
周周30 | BindTime 30 | 30 | |
周周31 | BindTime 31 | 31 | |
周周32 | BindTime 32 | 32 | |
周周33 | BindTime 33 | 33 | |
周周34 | BindTime 34 | 34 | |
周周35 | BindTime 35 | 35 | |
周周36 | BindTime 36 | 36 | |
周周37 | BindTime 37 | 37 | |
周周38 | BindTime 38 | 38 | |
周周39 | BindTime 39 | 39 | |
周周4 | BindTime 4 | 4 | |
周周40 | BindTime 40 | 40 | |
周周41 | BindTime 41 | 41 | |
周周42 | BindTime 42 | 42 | |
周周43 | BindTime 43 | 43 | |
周周44 | BindTime 44 | 44 | |
周周45 | BindTime 45 | 45 | |
周周46 | BindTime 46 | 46 | |
周周47 | BindTime 47 | 47 | |
周周48 | BindTime 48 | 48 | |
周周49 | BindTime 49 | 49 | |
周周5 | BindTime 5 | 5 | |
周周50 | BindTime 50 | 50 | |
周周51 | BindTime 51 | 51 | |
周周52 | BindTime 52 | 52 | |
周周53 | BindTime 53 | 53 | |
周周54 | BindTime 54 | 54 | |
周周55 | BindTime 55 | 55 | |
周周56 | BindTime 56 | 56 | |
周周57 | BindTime 57 | 57 | |
周周58 | BindTime 58 | 58 | |
周周59 | BindTime 59 | 59 | |
周周6 | BindTime 6 | 6 | |
周周60 | BindTime 60 | 60 | |
周周61 | BindTime 61 | 61 | |
周周62 | BindTime 62 | 62 | |
周周63 | BindTime 63 | 63 | |
周周64 | BindTime 64 | 64 | |
周周65 | BindTime 65 | 65 | |
周周66 | BindTime 66 | 66 | |
周周67 | BindTime 67 | 67 | |
周周68 | BindTime 68 | 68 | |
周周69 | BindTime 69 | 69 | |
周周7 | BindTime 7 | 7 | |
周周70 | BindTime 70 | 70 | |
周周71 | BindTime 71 | 71 | |
周周72 | BindTime 72 | 72 | |
周周73 | BindTime 73 | 73 | |
周周74 | BindTime 74 | 74 | |
周周75 | BindTime 75 | 75 | |
周周76 | BindTime 76 | 76 | |
周周77 | BindTime 77 | 77 | |
周周78 | BindTime 78 | 78 | |
周周79 | BindTime 79 | 79 | |
周周8 | BindTime 8 | 8 | |
周周80 | BindTime 80 | 80 | |
周周81 | BindTime 81 | 81 | |
周周82 | BindTime 82 | 82 | |
周周83 | BindTime 83 | 83 | |
周周84 | BindTime 84 | 84 | |
周周85 | BindTime 85 | 85 | |
周周86 | BindTime 86 | 86 | |
周周87 | BindTime 87 | 87 | |
周周88 | BindTime 88 | 88 | |
周周89 | BindTime 89 | 89 | |
周周9 | BindTime 9 | 9 | |
周周90 | BindTime 90 | 90 | |
周周91 | BindTime 91 | 91 | |
周周92 | BindTime 92 | 92 | |
周周93 | BindTime 93 | 93 | |
周周94 | BindTime 94 | 94 | |
周周95 | BindTime 95 | 95 | |
周周96 | BindTime 96 | 96 | |
周周97 | BindTime 97 | 97 | |
周周98 | BindTime 98 | 98 | |
周周99 | BindTime 99 | 99 | |
(100 rows)