虽说GREENPLUM不是开源的,可是自带的一些脚本还是可以看一看的。最近在学习其中的几个,期间经常看到使用gp_dist_random()函数,奇怪自己在ADMIN文档里并没有看到过这个函数啊,去搜了一下也没搜到。试了一试,大概知道了是个什么功能。
首先连到MASTER上
[gpadmin1@hadoop7 ~]$ psql
psql (8.2.13)
Type "help" for help.
testutf8=# select count(1) from gp_dist_random('pg_class');
count
-------
1633
(1 row)
testutf8=# select count(1) from pg_class;
count
-------
272
(1 row)
testutf8=# select * from gp_id;
gpname | numsegments | dbid | content
--------+-------------+------+---------
GP | 6 | 1 | -1
(1 row)
testutf8=# select 272*6;
?column?
----------
1632
(1 row)
看出些端倪来了吧,单节点*节点个数=gp_dist_random()函数的结果 。
因为我们在日常使用的过程中MASTER以及各个SEGMENT的PG_CLASS正常情况下应该都是一致的,所以这样乘是合理的。我这里差了1(1633-1632=1)是因为我之前有直接连到某个SEGMENT上直接做操作,没有通过MASTER,因此可能会有一些不一致。
也就是说,gp_dist_random()函数的作用就是从MASTER可以查到某一个表在各个SEGMENT上的情况(不包含MASTER)。
当然,还是有一些限制条件的,整个函数不是对每个表都有效的。
1、这类表必须是单独连到各个POSTGRES进程上创建的,不能是直接连上MASTER在整个集群内进行创建。
2、在各个SEGMENT上表的OID要是一致的。这一点可能就比较难控制了,不过刚才我在实验环境中由于没其他人操作,各个SEGMENT都很一致,所以还是实验成功了。
下面贴一下结果看看吧
连接到MASTER所在端口号5345的那个POSTGRES进程
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 5345 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# insert into t1 values(1);
INSERT 0 1
testutf8=# insert into t1 values(12);
INSERT 0 1
testutf8=# select * from t1;
id
----
1
12
(2 rows)
现在正常通过MASTER连接到集群上,查询一下试试
testutf8=# select * from t1;
id
----
1
12
(2 rows)
testutf8=# select * from gp_dist_random('t1');
ERROR: relation with OID 43268 does not exist (seg0 slice1 hadoop7:20001 pid=1511)
这里报错了,因为t1表这个对象现在只存在MASTER的那个POSTGRES进程上,在其他各个SEGMENT上并不存在。因此我们需要单独连接到各个SEGMENT上创建这个对象。
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop8 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop8 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop9 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop9 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# create table t1(id int);
CREATE TABLE
在这个SEGMENT上插入两条数据试试
testutf8=# insert into t1 values(2);
INSERT 0 1
testutf8=# insert into t1 values(22);
INSERT 0 1
testutf8=# select * from t1;
id
----
2
22
(2 rows)
testutf8=# select * from gp_dist_random('t1');
id
----
2
22
(2 rows)
可以看到,在某个SEGMENT上也可以使用gp_dist_random()函数,只不过仍然只能看到本SEGMENT上的数据而已。
这时我们切换回到正常通过MASTER连接到集群上的窗口,查询一下看看
testutf8=# select * from gp_dist_random('t1');
id
----
2
22
(2 rows)
这时就通过函数gp_dist_random()就查看到了在SEGMENT上插入的那两条数据,注意这里并不包含在MASTER上的1,12两条数据。
再连接到某个SEGMENT上插入两条数据试试
[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop9 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# insert into t1 values(33);
INSERT 0 1
testutf8=# insert into t1 values(3);
INSERT 0 1
然后再回到正常通过MASTER连接到集群上的窗口,查询一下看看
testutf8=# select * from gp_dist_random('t1');
id
----
2
22
33
3
(4 rows)
新增的两条数据也看到了。
在GREENPLUM的源程序中使用这个函数来进行MASTER和各个SEGMENT之间的数据比对。
因为正常情况下都会通过MASTER来进行操作,因此不会存在这种特殊类型的表,所以暂时还没想到这个函数能够应用在其他什么地方。
不过对于有些系统表,我们还是可以使用这个函数来做一些事情的。比如说我在MASTER上发起了一个查询,虽然通过pg_stat_activity可以看到当前执行的查询,不过如果一个查询好久好久都没有结束,那么我可能会想它是不是死掉了,想查看一下这个这个查询究竟有没有在各个SEGMENT上执行。
在MASTER上创建一个test表,插入多点数据,然后发起一个JOIN多次的查询
testutf8=# drop table test;
DROP TABLE
testutf8=# create table test as select * from pg_class;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'relname' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 275
testutf8=# insert into test select * from test;
INSERT 0 275
testutf8=# insert into test select * from test;
INSERT 0 550
testutf8=# insert into test select * from test;
INSERT 0 1100
testutf8=# insert into test select * from test;
INSERT 0 2200
testutf8=# insert into test select * from test;
INSERT 0 4400
testutf8=# insert into test select * from test;
INSERT 0 8800
testutf8=# insert into test select * from test;
INSERT 0 17600
testutf8=# insert into test select * from test;
INSERT 0 35200
testutf8=# insert into test select * from test;
INSERT 0 70400
testutf8=# insert into test select * from test;
INSERT 0 140800
testutf8=#
testutf8=# select * from test a cross join test b cross join test c;
另外开启一个窗口连接
通过函数gp_dist_random('pg_stat_activity')可以看到pg_stat_activity在各个SEGMENT上的情况,这里是不包含MASTER的
testutf8=# select * from gp_dist_random('pg_stat_activity');
datid | datname | procpid | sess_id | usesysid | usename | current_query | waiting | query_start | backend_start | client_addr | client_port
-------+----------+---------+---------+----------+----------+-----------------------------------------------------------+---------+-------------------------------+-------------------------------+--------------+-------------
25691 | testutf8 | 20837 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.597638+08 | 10.20.151.11 | 37041
25691 | testutf8 | 20741 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:44.967478+08 | 10.20.151.11 | 37023
25691 | testutf8 | 20658 | 30 | 10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity'); | f | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.918739+08 | 10.20.151.11 | 36992
25691 | testutf8 | 18629 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.113912+08 | 10.20.151.11 | 46839
25691 | testutf8 | 20705 | -1 | 10 | gpadmin1 | <IDLE> | f | | 2010-11-30 11:01:23.691262+08 | 10.20.151.93 | 51429
25691 | testutf8 | 20839 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.601519+08 | 10.20.151.11 | 50706
25691 | testutf8 | 20743 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:44.970788+08 | 10.20.151.11 | 50688
25691 | testutf8 | 20660 | 30 | 10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity'); | f | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.922336+08 | 10.20.151.11 | 50657
25691 | testutf8 | 18631 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.134902+08 | 10.20.151.11 | 36656
25691 | testutf8 | 20706 | -1 | 10 | gpadmin1 | <IDLE> | f | | 2010-11-30 11:01:23.693834+08 | 10.20.151.93 | 59344
25691 | testutf8 | 14200 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.683182+08 | 10.20.151.11 | 46190
25691 | testutf8 | 14104 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.044488+08 | 10.20.151.11 | 46172
25691 | testutf8 | 14049 | 30 | 10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity'); | f | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.997564+08 | 10.20.151.11 | 46141
25691 | testutf8 | 12203 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.361576+08 | 10.20.151.11 | 41952
25691 | testutf8 | 14092 | -1 | 10 | gpadmin1 | <IDLE> | f | | 2010-11-30 11:01:23.75539+08 | 10.20.151.93 | 55913
25691 | testutf8 | 14202 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.686193+08 | 10.20.151.11 | 41996
25691 | testutf8 | 14106 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.047272+08 | 10.20.151.11 | 41978
25691 | testutf8 | 14051 | 30 | 10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity'); | f | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:01.004602+08 | 10.20.151.11 | 41947
25691 | testutf8 | 12205 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.365583+08 | 10.20.151.11 | 57566
25691 | testutf8 | 13836 | -1 | 10 | gpadmin1 | <IDLE> | f | | 2010-11-30 10:53:35.973226+08 | 10.20.151.93 | 44622
25691 | testutf8 | 6246 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.678602+08 | 10.20.151.11 | 47749
25691 | testutf8 | 6229 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.042156+08 | 10.20.151.11 | 47731
25691 | testutf8 | 6188 | 30 | 10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity'); | f | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.992359+08 | 10.20.151.11 | 47700
25691 | testutf8 | 4638 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.315053+08 | 10.20.151.11 | 42496
25691 | testutf8 | 6057 | -1 | 10 | gpadmin1 | <IDLE> | f | | 2010-11-30 10:56:10.860697+08 | 10.20.151.93 | 47089
25691 | testutf8 | 6244 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.667617+08 | 10.20.151.11 | 48144
25691 | testutf8 | 6227 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.038651+08 | 10.20.151.11 | 48126
25691 | testutf8 | 6186 | 30 | 10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity'); | f | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.984353+08 | 10.20.151.11 | 48095
25691 | testutf8 | 4636 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.211114+08 | 10.20.151.11 | 37456
25691 | testutf8 | 6214 | -1 | 10 | gpadmin1 | <IDLE> | f | | 2010-11-30 11:01:23.758726+08 | 10.20.151.93 | 42670
(30 rows)
下面这样查询,只能看到MASTER上的pg_stat_activity 内容
testutf8=# select * from pg_stat_activity;
datid | datname | procpid | sess_id | usesysid | usename | current_query | waiting | query_start | backend_start | client_addr | client_port
-------+----------+---------+---------+----------+----------+-----------------------------------------------------------+---------+-------------------------------+-------------------------------+--------------+-------------
25691 | testutf8 | 20728 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:01:32.185858+08 | 2010-11-30 11:01:32.495914+08 | 10.20.151.11 | 38653
25691 | testutf8 | 20655 | 30 | 10 | gpadmin1 | select * from pg_stat_activity; | f | 2010-11-30 11:02:48.213233+08 | 2010-11-30 10:59:59.05798+08 | | -1
25691 | testutf8 | 18621 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:40.245568+08 | | -1
(3 rows)
testutf8=#
连到某个特定的SEGMETN上看一下
testutf8=# /q
[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.
testutf8=# select * from pg_stat_activity;
datid | datname | procpid | sess_id | usesysid | usename | current_query | waiting | query_start | backend_start | client_addr | client_port
-------+----------+---------+---------+----------+----------+-----------------------------------------------------------+---------+-------------------------------+-------------------------------+--------------+-------------
25691 | testutf8 | 20837 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.597638+08 | 10.20.151.11 | 37041
25691 | testutf8 | 20741 | 29 | 10 | gpadmin1 | <IDLE> | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:44.967478+08 | 10.20.151.11 | 37023
25691 | testutf8 | 20889 | -1 | 26271 | r1 | select * from pg_stat_activity; | f | 2010-11-30 11:04:16.063552+08 | 2010-11-30 11:04:13.766148+08 | 10.20.151.11 | 55612
25691 | testutf8 | 18629 | 29 | 10 | gpadmin1 | select * from test a cross join test b cross join test c; | f | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.113912+08 | 10.20.151.11 | 46839
25691 | testutf8 | 20705 | -1 | 10 | gpadmin1 | <IDLE> | f | | 2010-11-30 11:01:23.691262+08 | 10.20.151.93 | 51429
(5 rows)