刚才登陆上来看到以前写的一篇文章扩展greenplum集群 下面有人回帖问如何裁剪SEGMENT。因为并没有看到GREENPLUM提供什么工具可以去除某些已经在使用的SEGMENT的。刚刚去试了一下,好像可以,只能说好像可以。下面把操作步骤贴出来,大家也可以一起看看有没有什么问题。
下面是现有集群的一个大致情况
testutf8=# select * from gp_configuration;
content | definedprimary | dbid | isprimary | valid | hostname | port | datadir
———+—————-+——+———–+——-+———-+——-+———————————-
-1 | t | 1 | t | t | hadoop7 | 5345 | /home/gpadmin1/gp3master/aligp-1
0 | f | 8 | f | t | hadoop8 | 30001 | /home/gpadmin1/gp3datam1/aligp0
1 | f | 9 | f | t | hadoop8 | 30002 | /home/gpadmin1/gp3datam2/aligp1
0 | t | 2 | t | t | hadoop7 | 20001 | /home/gpadmin1/gp3datap1/aligp0
1 | t | 3 | t | t | hadoop7 | 20002 | /home/gpadmin1/gp3datap2/aligp1
3 | t | 5 | t | t | hadoop8 | 20002 | /home/gpadmin1/gp3datap2/aligp3
-1 | f | 14 | f | t | hadoop8 | 5345 | /home/gpadmin1/gp3master/aligp-1
2 | t | 4 | t | t | hadoop8 | 20001 | /home/gpadmin1/gp3datap1/aligp2
2 | f | 10 | f | t | hadoop9 | 30001 | /home/gpadmin1/gp3datam1/aligp2
3 | f | 11 | f | t | hadoop9 | 30002 | /home/gpadmin1/gp3datam2/aligp3
4 | f | 12 | f | t | hadoop7 | 30001 | /home/gpadmin1/gp3datam1/aligp4
5 | f | 13 | f | t | hadoop7 | 30002 | /home/gpadmin1/gp3datam2/aligp5
4 | t | 6 | t | t | hadoop9 | 20001 | /home/gpadmin1/gp3datap1/aligp4
5 | t | 7 | t | t | hadoop9 | 20002 | /home/gpadmin1/gp3datap2/aligp5 –标为红色的部分即为要删除的SEGMENT
(14 rows)
–创建两张表t1,t2.在后面的操作中我不会对t1进行特殊处理,因此理论上删除某个SEGMENT以后它上面的数据会有所减少。而对于t2,我会在修改集群配置以前把在要被删除的那些SEGMENT上的数据手动迁移到其他保留的SEGMENT上。
testutf8=# create table t1 as select nextval(’seq1′) from pg_class;
NOTICE: Table doesn’t have ‘DISTRIBUTED BY’ clause — Using column(s) named ‘nextval’ as the Greenplum Database data distribution key for this table.
HINT: The ‘DISTRIBUTED BY’ clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 267
testutf8=# select gp_segment_id,count(1) from t1 group by 1;
gp_segment_id | count
—————+——-
1 | 43
4 | 48
2 | 42
0 | 44
3 | 41
5 | 49
(6 rows)
testutf8=# create table t2 as select nextval(’seq1′) from pg_class;
NOTICE: Table doesn’t have ‘DISTRIBUTED BY’ clause — Using column(s) named ‘nextval’ as the Greenplum Database data distribution key for this table.
HINT: The ‘DISTRIBUTED BY’ clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 268
testutf8=# select gp_segment_id,count(1) from t2 group by 1;
gp_segment_id | count
—————+——-
1 | 44
2 | 43
0 | 41
4 | 49
3 | 42
5 | 49
(6 rows)
下面几步所做的就是对t2表的在将被删除的SEGMENT上的数据进行迁移
#connect to hadoop9:20001 from hadoop9
[gpadmin1@hadoop9 ~]$ PGOPTIONS=”-c gp_session_role=utility” psql -h hadoop9 -p 20001 -U r1
psql (8.2.13)
Type “help” for help.
testutf8=#
testutf8=# select count(*) from t2;
count
——-
49
(1 row)
testutf8=# select * from t2 limit 1; –这里随便查出一条数据,在集群相应的SEGMENT被去除以后,可以