GREENPLUM的分布与分区

刚开始看GREENPLUM文档的时候,有点搞不清楚对于一个分区表来说,分布(DISTRIBUTE)与分区(PARTITION)的关系究竟是怎样的。下面摘录文档里的一段原话

Partitioned tables are also distributed across Greenplum Database segments as is any non-partitioned table. Table distribution in Greenplum Database physically divides a table across the Greenplum segments to enable parallel query processing. Table partitioning is a tool to logically divide big tables to improve query performance and facilitate data warehouse maintenance tasks. Partitioning does not change the physical distribution of the table data across the segments.

从上面这段话可以得到如下几条信息

  • 无论是分区表还是非分区表,通过GP的DISTRIBUTE功能,都会把数据分散到各个SEGMENT上去。
  • 分布,是从物理上把数据分散到各个SEGMENT上,这样更有利于并行查询。
  • 分区,是从逻辑上把一个大表分开,这样可以优化查询性能。分区是不会影响数据在各个SEGMENT上的分布情况的。

下面做点实际操作看一下就更加清楚了

[gpadmin1@hadoop7 ~]$ psql
psql (8.2.13)
Type "help" for help.
创建一个分区表
testutf8=# drop table stu;
DROP TABLE
testutf8=# CREATE TABLE stu (id int, gender char(1))
testutf8-# DISTRIBUTED BY (id)
testutf8-# PARTITION BY LIST (gender)
testutf8-# ( PARTITION girls VALUES ('F'),
testutf8(# PARTITION boys VALUES ('M'),
testutf8(# DEFAULT PARTITION other );
NOTICE:  CREATE TABLE will create partition "stu_1_prt_girls" for table "stu"
NOTICE:  CREATE TABLE will create partition "stu_1_prt_boys" for table "stu"
NOTICE:  CREATE TABLE will create partition "stu_1_prt_other" for table "stu"
CREATE TABLE

使用下面语句随便插入几条测试数据

insert into stu values(1,'F');
insert into stu values(2,'F');
insert into stu values(3,'F');
insert into stu values(4,'M');
insert into stu values(5,'M');
insert into stu values(6,'M');

查询一下数据在各个segment上的分布情况

testutf8=# select gp_segment_id,* from stu;
 gp_segment_id | id | gender
---------------+----+--------
             4 |  3 | F
             4 |  5 | M
             0 |  1 | F
             3 |  2 | F
             1 |  4 | M
             1 |  6 | M
(6 rows)

为了验证分区不影响分布,我们现在插入一条id已经存在,但是gender列不同的值

testutf8=# insert into stu values(1,'M');
INSERT 0 1
testutf8=# select gp_segment_id,* from stu;
 gp_segment_id | id | gender
---------------+----+--------
             0 |  1 | F              --这里可以看到,虽然分区列gender的值不同,但分布列id值都是1,都分布在了0号segment上
             0 |  1 | M

             1 |  4 | M
             1 |  6 | M
             4 |  3 | F
             4 |  5 | M
             3 |  2 | F
(7 rows)

也就是说,数据首先会根据分布列id分发到每个SEGMENT上面去。然后在每一个SEGMENT上面,对于stu表都会存在三个不同的分区,毕竟每个SEGMENT也是一个单独的POSTGRESQL嘛。可以直接连到某个SEGMENT上去看一看

[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=#

testutf8=# select oid,* from pg_class where relname='stu';
  oid  | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+---------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
 43106 | stu     |         2200 |   43107 |       10 |     0 |       43106 |             0 |        0 |         0 |             0 |             0 |             0 |             0 | f           | f           | r       | h          |        2 |         0 |           0 |        0 |        0 |       0 | f          | f          | f           | t              |        60967 |        |
(1 row)

testutf8=# select oid,* from pg_class where relname='stu_1_prt_girls';
  oid  |     relname     | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+-----------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
 43126 | stu_1_prt_girls |         2200 |   43127 |       10 |     0 |       43126 |             0 |        0 |         0 |             0 |             0 |             0 |             0 | f           | f           | r       | h          |        2 |         1 |           0 |        0 |        0 |       0 | f          | f          | f           | f              |        60967 |        |
(1 row)
换一下,连到另外一个SEGMENT上去看看
testutf8=# /q
[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# select oid,* from pg_class where relname='stu';
  oid  | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+---------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
 43106 | stu     |         2200 |   43107 |       10 |     0 |       43106 |             0 |        0 |         0 |             0 |             0 |             0 |             0 | f           | f           | r       | h          |        2 |         0 |           0 |        0 |        0 |       0 | f          | f          | f           | t              |        60543 |        |
(1 row)

testutf8=# select oid,* from pg_class where relname='stu_1_prt_boys';
  oid  |    relname     | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+----------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
 43147 | stu_1_prt_boys |         2200 |   43148 |       10 |     0 |       43147 |             0 |        0 |         0 |             0 |             0 |             0 |             0 | f           | f           | r       | h          |        2 |         1 |           0 |        0 |        0 |       0 | f          | f          | f           | f              |        60543 |        |
(1 row)

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值