刚开始看GREENPLUM文档的时候,有点搞不清楚对于一个分区表来说,分布(DISTRIBUTE)与分区(PARTITION)的关系究竟是怎样的。下面摘录文档里的一段原话
Partitioned tables are also distributed across Greenplum Database segments as is any non-partitioned table. Table distribution in Greenplum Database physically divides a table across the Greenplum segments to enable parallel query processing. Table partitioning is a tool to logically divide big tables to improve query performance and facilitate data warehouse maintenance tasks. Partitioning does not change the physical distribution of the table data across the segments.
从上面这段话可以得到如下几条信息
- 无论是分区表还是非分区表,通过GP的DISTRIBUTE功能,都会把数据分散到各个SEGMENT上去。
- 分布,是从物理上把数据分散到各个SEGMENT上,这样更有利于并行查询。
- 分区,是从逻辑上把一个大表分开,这样可以优化查询性能。分区是不会影响数据在各个SEGMENT上的分布情况的。
下面做点实际操作看一下就更加清楚了
[gpadmin1@hadoop7 ~]$ psql
psql (8.2.13)
Type "help" for help.
创建一个分区表
testutf8=# drop table stu;
DROP TABLE
testutf8=# CREATE TABLE stu (id int, gender char(1))
testutf8-# DISTRIBUTED BY (id)
testutf8-# PARTITION BY LIST (gender)
testutf8-# ( PARTITION girls VALUES ('F'),
testutf8(# PARTITION boys VALUES ('M'),
testutf8(# DEFAULT PARTITION other );
NOTICE: CREATE TABLE will create partition "stu_1_prt_girls" for table "stu"
NOTICE: CREATE TABLE will create partition "stu_1_prt_boys" for table "stu"
NOTICE: CREATE TABLE will create partition "stu_1_prt_other" for table "stu"
CREATE TABLE
使用下面语句随便插入几条测试数据
insert into stu values(1,'F');
insert into stu values(2,'F');
insert into stu values(3,'F');
insert into stu values(4,'M');
insert into stu values(5,'M');
insert into stu values(6,'M');
查询一下数据在各个segment上的分布情况
testutf8=# select gp_segment_id,* from stu;
gp_segment_id | id | gender
---------------+----+--------
4 | 3 | F
4 | 5 | M
0 | 1 | F
3 | 2 | F
1 | 4 | M
1 | 6 | M
(6 rows)
为了验证分区不影响分布,我们现在插入一条id已经存在,但是gender列不同的值
testutf8=# insert into stu values(1,'M');
INSERT 0 1
testutf8=# select gp_segment_id,* from stu;
gp_segment_id | id | gender
---------------+----+--------
0 | 1 | F --这里可以看到,虽然分区列gender的值不同,但分布列id值都是1,都分布在了0号segment上
0 | 1 | M
1 | 4 | M
1 | 6 | M
4 | 3 | F
4 | 5 | M
3 | 2 | F
(7 rows)
也就是说,数据首先会根据分布列id分发到每个SEGMENT上面去。然后在每一个SEGMENT上面,对于stu表都会存在三个不同的分区,毕竟每个SEGMENT也是一个单独的POSTGRESQL嘛。可以直接连到某个SEGMENT上去看一看
[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=#
testutf8=# select oid,* from pg_class where relname='stu';
oid | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+---------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
43106 | stu | 2200 | 43107 | 10 | 0 | 43106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | f | f | r | h | 2 | 0 | 0 | 0 | 0 | 0 | f | f | f | t | 60967 | |
(1 row)
testutf8=# select oid,* from pg_class where relname='stu_1_prt_girls';
oid | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+-----------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
43126 | stu_1_prt_girls | 2200 | 43127 | 10 | 0 | 43126 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | f | f | r | h | 2 | 1 | 0 | 0 | 0 | 0 | f | f | f | f | 60967 | |
(1 row)
换一下,连到另外一个SEGMENT上去看看
testutf8=# /q
[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# select oid,* from pg_class where relname='stu';
oid | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+---------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
43106 | stu | 2200 | 43107 | 10 | 0 | 43106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | f | f | r | h | 2 | 0 | 0 | 0 | 0 | 0 | f | f | f | t | 60543 | |
(1 row)
testutf8=# select oid,* from pg_class where relname='stu_1_prt_boys';
oid | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
-------+----------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
43147 | stu_1_prt_boys | 2200 | 43148 | 10 | 0 | 43147 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | f | f | r | h | 2 | 1 | 0 | 0 | 0 | 0 | f | f | f | f | 60543 | |
(1 row)