AntDB读写分离探索
本次测试过程将AntDB集群的datanode slave备机节点作为可读节点来实现。这样做的主要目的是合理利用资源,提高系统吞吐量。
适用场景: 针对同一时间有大量的读操作和较少的写操作,并且读操作对数据的实时性要求并没有那么高。在此前提下,可以这么设计解决方案。
一、环境准备
本次测试过程只考虑一主一从的场景,datanode 一主两从的场景原理类似,感兴趣的朋友可以动手试试。
搭建一个2C2D(datanode 一主一从同步)的antdb集群:
postgres=# monitor all;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
postgres=# list node;
name | host | type | mastername | port | sync_state | path | initialized | incluster
--------+------------+-----------------+------------+-------+------------+----------------------------+-------------+-----------
coord1 | localhost1 | coordinator | | 6603 | | /data/zgy/pgxc_data/coord1 | t | t
coord2 | localhost2 | coordinator | | 6603 | | /data/zgy/pgxc_data/coord2 | t | t
db1 | localhost1 | datanode master | | 14000 | | /data/zgy/pgxc_data/db1 | t | t
db1 | localhost2 | datanode slave | db1 | 14001 | sync | /data/zgy/pgxc_data/db1s | t | t
db2 | localhost2 | datanode master | | 15000 | | /data/zgy/pgxc_data/db2 | t | t
db2 | localhost1 | datanode slave | db2 | 15001 | sync | /data/zgy/pgxc_data/db2s | t | t
gtm | localhost1 | gtm master | | 7663 | | /data/zgy/pgxc_data/gtm | t | t
(7 rows)
二、数据准备
连接coord1,创建table并写入数据:
postgres=# create table a (id int ,name varchar(22),score int) distribute by hash(id);
CREATE TABLE
postgres=# insert into a select generate_series(1,100),'jay'||generate_series(1,100),ceil(random()*100);
INSERT 0 100
postgres=# select count(*) from a;
count
-------
100
(1 row)
三、修改coord2节点的pgxc_node关于datanode节点信息
这里也可以通过adbmgr,添加一个coord3,部署在datanode slave节点所在的服务器。本次实验由于服务器只有2台,就只基于原集群进行改造。
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14000 | 10.21.20.175 | t | f | -2885965
db2 | D | 15000 | 10.21.20.176 | f | t | -79866771
(4 rows)
postgres=# update pgxc_node set node_port=14001,node_host='10.21.20.176' where node_name='db1';
UPDATE 1
postgres=# update pgxc_node set node_port=15001,node_host='10.21.20.175' where node_name='db2';
UPDATE 1
postgres=# select pgxc_pool_reload();
pgxc_pool_reload
------------------
t
(1 row)
到此,读写分离配置完成。此刻,coord1提供读写datanode master节点,coord2提供读datanode slave节点。
我们来看下效果:
连接coord2,执行简单查询:
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
count
-------
100
(1 row)
postgres=# explain verbose select count(*) from a;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (cost=2.50..2.51 rows=1 width=0)
Output: pg_catalog.count(*)
-> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=0)
Output: (count(*))
Node/s: db1, db2
Remote query: SELECT count(*) FROM ONLY public.a a WHERE true
(6 rows)
此时查看进程,发现dbslave节点产生backend进程:
424707 ? S 0:00 /data/zgy/app/bin/postgres --datanode -D /data/zgy/pgxc_data/db1s -i
424709 ? Ss 0:00 \_ postgres: logger process
424710 ? Ss 0:00 \_ postgres: startup process recovering 000000010000000000000003
424711 ? Ss 0:00 \_ postgres: checkpointer process
424712 ? Ss 0:00 \_ postgres: writer process
424713 ? Ss 0:00 \_ postgres: stats collector process
424715 ? Ss 0:00 \_ postgres: wal receiver process streaming 0/3045C90
425072 ? Ss 0:00 \_ postgres: zgy postgres 10.21.20.176(20063) idle
连接coord1进行读写操作:
[zgy@INTEL175 highavailable]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# explain verbose select count(*) from a;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (cost=2.50..2.51 rows=1 width=0)
Output: pg_catalog.count(*)
-> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=0)
Output: (count(*))
Node/s: db1, db2
Remote query: SELECT count(*) FROM ONLY public.a a WHERE true
(6 rows)
postgres=# insert into a select generate_series(101,110),'jay'||generate_series(101,110),ceil(random()*100);
INSERT 0 10
postgres=# select count(*) from a;
count
-------
110
(1 row)
由于antdb 2.2版本,datanode 主备nodename必须相同,读写分离看起来不是很明显,我们可以这样操作:
- 通过adbmgr模块,将datanode master 节点都手动stop,来验证coord2是否仍可读:
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
postgres=# stop datanode master db1 MODE F;
NOTICE: 10.21.20.175, pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c
nodename | status | description
----------+--------+-------------
db1 | t | success
(1 row)
postgres=# stop datanode master db2 MODE F;
NOTICE: 10.21.20.176, pg_ctl stop -D /data/zgy/pgxc_data/db2 -Z datanode -m fast -o -i -w -c
nodename | status | description
----------+--------+-------------
db2 | t | success
(1 row)
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | f | not running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | f | not running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
分别连接coord1 和coord2 读表a的数据:
[zgy@INTEL175 highavailable]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
ERROR: error message from poolmgr:reconnect three thimes , could not connect to server: Connection refused
Is the server running on host "10.21.20.175" and accepting
TCP/IP connections on port 14000?
[zgy@INTEL175 ~]$ coord2
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
count
-------
110
(1 row)
四、异常场景
有以下几个场景需要考虑:
- 1、datanode 备机节点异常(宕机,网络异常等等),业务查询将出错
- 2、datanode matser发生一次切换后,需要人工介入添加备机,再将备机的信息更新到读的coord列表
- 3、datanode 备机流复制延时(异步wal日志文件延时&同步节点recover数据延时),都将会导致查询的数据不准确
- 4、提供读的coord节点 ,不能作为写操作的提供者
4.1 datanode master 节点发生异常,触发failover 操作后:
模拟过程:
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
postgres=# failover datanode db1;
NOTICE: lock cluster: SELECT PG_PAUSE_CLUSTER();
NOTICE: 10.21.20.175, pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m immediate -o -i -w -c
NOTICE: 10.21.20.176, pg_ctl promote -w -D /data/zgy/pgxc_data/db1s
NOTICE: unlock cluster: SELECT PG_UNPAUSE_CLUSTER();
WARNING: the datanode master "db1" has no slave or extra, it is better to append a new datanode slave node
nodename | status | description
----------+--------+-------------
db1 | t | success
(1 row)
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(6 rows)
分别连接coord1和coord2,执行操作:
[zgy@INTEL175 highavailable]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14001 | 10.21.20.176 | t | f | -2885965
db2 | D | 15000 | 10.21.20.176 | f | t | -79866771
(4 rows)
postgres=# select count(*) from a;
count
-------
110
(1 row)
postgres=# insert into a select generate_series(111,120),'jay'||generate_series(111,120),ceil(random()*100)
postgres-# ;
INSERT 0 10
postgres=# explain verbose select * from a;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: a.id, a.name, a.score
Node/s: db1, db2
Remote query: SELECT id, name, score FROM public.a a
(4 rows)
postgres=#
[zgy@INTEL175 highavailable]$ coord2
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14001 | 10.21.20.176 | t | t | -2885965
db2 | D | 15000 | 10.21.20.176 | f | f | -79866771
(4 rows)
postgres=# select count(*) from a;
count
-------
120
(1 row)
postgres=# explain verbose select * from a;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: a.id, a.name, a.score
Node/s: db1, db2
Remote query: SELECT id, name, score FROM public.a a
(4 rows)
通过adbmgr rewind将宕掉的原master节点作为db1备节点再次加到集群中来:
备注:antdb2.2版本使用rewind的前提是在初始化集群时带上参数 –datachecksums , 否则会报错:ERROR: rewind datanode slave “db1” fail, target master need to use data checksums. 如果初始化没带参数,可以通过append datanode slave nodename的方式添加备机。
[zgy@INTEL175 highavailable]$ adbmgr
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# list node;
name | host | type | mastername | port | sync_state | path | initialized | incluster
--------+------------+-----------------+------------+-------+------------+----------------------------+-------------+-----------
coord1 | localhost1 | coordinator | | 6603 | | /data/zgy/pgxc_data/coord1 | t | t
coord2 | localhost2 | coordinator | | 6603 | | /data/zgy/pgxc_data/coord2 | t | t
db1 | localhost2 | datanode master | | 14001 | | /data/zgy/pgxc_data/db1s | t | t
db2 | localhost2 | datanode master | | 15000 | | /data/zgy/pgxc_data/db2 | t | t
db2 | localhost1 | datanode slave | db2 | 15001 | sync | /data/zgy/pgxc_data/db2s | t | t
gtm | localhost1 | gtm master | | 7663 | | /data/zgy/pgxc_data/gtm | t | t
(6 rows)
postgres=# add datanode slave db1(host='localhost1',port=14000,path='/data/zgy/pgxc_data/db1');
ADD NODE
postgres=# rewind datanode slave db1;
NOTICE: pg_ctl restart datanode slave "db1"
NOTICE: 10.21.20.175, pg_ctl restart -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c -l /data/zgy/pgxc_data/db1/logfile
NOTICE: wait max 90 seconds to check datanode slave "db1" running normal
NOTICE: pg_ctl stop datanode slave "db1" with fast mode
NOTICE: 10.21.20.175, pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c
NOTICE: wait max 90 seconds to check datanode slave "db1" stop complete
NOTICE: update datanode master "db1" pg_hba.conf for the rewind node db1
NOTICE: update gtm master "gtm" pg_hba.conf for the rewind node db1
NOTICE: on datanode master "db1" execute "checkpoint"
NOTICE: 10.21.20.176, /data/zgy/app/bin/pg_controldata '/data/zgy/pgxc_data/db1s' | grep 'Minimum recovery ending location:' |awk '{print $5}'
NOTICE: receive msg: {"result":"0/0"}
NOTICE: 10.21.20.176, /data/zgy/app/bin/pg_controldata '/data/zgy/pgxc_data/db1s' |grep 'Min recovery ending loc' |awk '{print $6}'
NOTICE: receive msg: {"result":"0"}
NOTICE: 10.21.20.175, pg_rewind --target-pgdata /data/zgy/pgxc_data/db1 --source-server='host=10.21.20.176 port=14001 user=zgy dbname=postgres' -N db1
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db1 | datanode slave | t | running | 10.21.20.175 | 14000
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
添加db1后,需要手动修改coord2的pgxc_node 关于datanode 节点信息:
[zgy@INTEL175 highavailable]$ coord2
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=#
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14001 | 10.21.20.176 | t | t | -2885965
db2 | D | 15000 | 10.21.20.176 | f | f | -79866771
(4 rows)
postgres=# update pgxc_node set node_port=14000,node_host='10.21.20.175' where node_name='db1';
UPDATE 1
postgres=# update pgxc_node set node_port=15001,node_host='10.21.20.175' where node_name='db2';
UPDATE 1
postgres=# select pgxc_pool_reload();
pgxc_pool_reload
------------------
t
(1 row)
此时,coord1 读写datanode master节点,而coord2又重新读取datanode slave节点。
4.2 datanode slave 节点发生异常:
antdb监控脚本,针对datanode slave节点状态异常后,主动尝试重新拉起,重试间隔和次数均无法拉起。此时就需要将slave节点移出集群。提供读的coord的pgxc_node 的datanode节点信息需要同步更新。
模拟过程:
连接adbmgr,手动停止datanode slave db1:
postgres=# stop datanode slave db1 mode f;
NOTICE: 10.21.20.175, pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c
nodename | status | description
----------+--------+-------------
db1 | t | success
(1 row)
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db1 | datanode slave | f | not running | 10.21.20.175 | 14000
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
此时连接coord2,查询table a,报错:
[zgy@INTEL175 highavailable]$ coord2
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
ERROR: error message from poolmgr:reconnect three thimes , could not connect to server: Connection refused
Is the server running on host "10.21.20.175" and accepting
TCP/IP connections on port 14000?
postgres=#
更新coord2的pgxc_node datanode db1的信息:
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14000 | 10.21.20.175 | t | t | -2885965
db2 | D | 15001 | 10.21.20.175 | f | f | -79866771
(4 rows)
postgres=# update pgxc_node set node_port=14001,node_host='10.21.20.176' where node_name='db1';
UPDATE 1
postgres=# select pgxc_pool_reload();
pgxc_pool_reload
------------------
t
(1 row)
postgres=# select count(*) from a;
count
-------
120
(1 row)
由于之前添加一主一从为同步流复制方式,因此此时coord1写会hang住,一直等待db1的同步从节点返回结果,如下:
[zgy@INTEL175 ~]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
count
-------
120
(1 row)
postgres=# insert into a select generate_series(121,130),'jay'||generate_series(121,130),ceil(random()*100)
;
对于这样的场景,可以连接adbmgr 修改db1的流复制方式:
[zgy@INTEL175 highavailable]$ adbmgr
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# alter datanode slave db1 (sync_state='async');
ALTER NODE
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db1 | datanode slave | f | not running | 10.21.20.175 | 14000
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
宕掉的datanode slave节点,可以通过adbmgr remove集群后drop掉。
postgres=# remove datanode slave db1;
WARNING: the datanode master "db1" has no synchronous slave or extra node
REMOVE NODE
postgres=# drop datanode slave db1;
DROP NODE
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(6 rows)
4.3 读slave节点的数据可能不准确
antdb datanode节点也是基于postgresql 流复制的方式实现的。异步wal日志文件延时&同步节点recover数据延时,极端情况下都将会导致查询的数据不准确。这一点暂时没有想到什么好的解决方案,期待社区大牛们有好的建议。
AntDB:
开源url:https://github.com/ADBSQL/AntDB
QQ交流群:496464280