PostgreSQL 9.5 在foreign data wrapper这方面有了几块非常好的功能增强:
- 支持创建外部表约束
- 优化器支持外部表查询下推
- 外部表支持继承
- 支持import foreign schema一键创建外部表
使用前3点增强的技术点可以支持数据库的水平分片,可以把PostgreSQL 9.5作为master,在其他版本作为数据节点。从而实现水平分库的目的。
这种分片技术相比中间件分片技术的好处:
- 支持跨库JOIN
- 支持绑定变量
- 支持ACID
- 支持分布式事务(不需要用户干预2PC)
- 支持master节点水平扩展
- 支持segment节点水平扩展
- 支持函数和存储过程
架构如图:
架构1:
维度表存储在上游节点,好处是JOIN时效率更高,缺点是当上层节点需要部署多个时,需要在上层节点之间同步维度表(可以使用前端同步或后端同步的方法),同时如果维度数据量大或者DML频繁的话master库会比较重。
前端同步可以使用外部表的方式,使用触发器实现。insert,update,delete通过父表触发在所有子表执行。SELECT时查询本地表不走触发器。
后端同步可以使用LOGICAL DECODE。
架构2:
维度表存储在分片节点,可以选择任一分片节点存储任一维度表。
缺点是JOIN时数据需要传到上层节点,效率低。
还有一个缺点是数据节点可能不平衡。
当维度表使用多副本时,不存在不平横的问题。这种情况下查询指定任意节点,增删改则指定所有节点。
HA架构:
上层节点使用9.5以上的版本,用于存放维度表(全局表),以及数据分片的定义,数据路由算法。
数据分片的定义包括继承关系,分布列约束,外部表定义,插入触发器。
下层节点使用的数据库版本没有要求,用于存放数据分片。
下面是一组测试,创建5个数据库,1个master库用于存放全局数据和数据分片的定义,数据路由算法;4个下层节点数据库,用于存放数据分片;
master=# create database db0;
master=# create database db1;
master=# create database db2;
master=# create database db3;
master=# create database master;
连接到master库,创建外部server:
d3=# \c master
master=# create extension postgres_fdw;
master=# create server db0 foreign data wrapper postgres_fdw options (hostaddr '127.0.0.1', port '1923', dbname 'db0');
master=# create server db1 foreign data wrapper postgres_fdw options (hostaddr '127.0.0.1', port '1923', dbname 'db1');
master=# create server db2 foreign data wrapper postgres_fdw options (hostaddr '127.0.0.1', port '1923', dbname 'db2');
master=# create server db3 foreign data wrapper postgres_fdw options (hostaddr '127.0.0.1', port '1923', dbname 'db3');
创建user mapping:
master=# create user mapping for postgres server db0 options (user 'postgres', password 'postgres');
master=# create user mapping for postgres server db1 options (user 'postgres', password 'postgres');
master=# create user mapping for postgres server db2 options (user 'postgres', password 'postgres');
master=# create user mapping for postgres server db3 options (user 'postgres', password 'postgres');
连接到分片节点,创建分片表(表名请随意):
master=# \c db0
db0=# create table tbl0(id int primary key, info text, crt_time timestamp);
alter table tbl0 add constraint ck1 check (abs(mod(id,4))=0);
master=# \c db1
db1=# create table tbl1(id int primary key, info text, crt_time timestamp);
alter table tbl1 add constraint ck1 check (abs(mod(id,4))=1);
master=# \c db2
db2=# create table tbl2(id int primary key, info text, crt_time timestamp);
alter table tbl2 add constraint ck1 check (abs(mod(id,4))=2);
master=# \c db3
db3=# create table tbl3(id int primary key, info text, crt_time timestamp);
alter table tbl3 add constraint ck1 check (abs(mod(id,4))=3);
连接到主节点,创建外部表,这里使用了import foreign schema语法,一键创建:
db3=# \c master
You are now connected to database "master" as user "postgres".
master=# import FOREIGN SCHEMA public from server db0 into public;
IMPORT FOREIGN SCHEMA
master=# import FOREIGN SCHEMA public from server db1 into public;
IMPORT FOREIGN SCHEMA
master=# import FOREIGN SCHEMA public from server db2 into public;
IMPORT FOREIGN SCHEMA
master=# import FOREIGN SCHEMA public from server db3 into public;
IMPORT FOREIGN SCHEMA
master=# \det
List of foreign tables
Schema | Table | Server
--------+-------+--------
public | tbl0 | db0
public | tbl1 | db1
public | tbl2 | db2
public | tbl3 | db3
(4 rows)
创建主表,用户操作主表即可。(当然用户也可以直接操作子表,PostgreSQL不拦你)
master=# create table tbl(id int, info text, crt_time timestamp);
CREATE TABLE
设置外部表继承关系,继承到主表下面。
master=# alter foreign table tbl0 inherit tbl;
ALTER FOREIGN TABLE
master=# alter foreign table tbl1 inherit tbl;
ALTER FOREIGN TABLE
master=# alter foreign table tbl2 inherit tbl;
ALTER FOREIGN TABLE
master=# alter foreign table tbl3 inherit tbl;
ALTER FOREIGN TABLE
创建外部表的约束,约束即路由算法的一部分。
注意,带约束条件的SQL,数据库会自动选择对应的外部表进行操作。
不带约束条件的SQL,数据库会选择所有节点操作。
所以建议每条SQL都带上约束条件。
master=# alter foreign table tbl0 add constraint ck_tbl0 check (abs(mod(id,4))=0);
ALTER FOREIGN TABLE
master=# alter foreign table tbl1 add constraint ck_tbl1 check (abs(mod(id,4))=1);
ALTER FOREIGN TABLE
master=# alter foreign table tbl2 add constraint ck_tbl2 check (abs(mod(id,4))=2);
ALTER FOREIGN TABLE
master=# alter foreign table tbl3 add constraint ck_tbl3 check (abs(mod(id,4))=3);
ALTER FOREIGN TABLE
带约束条件abs(mod(id, 4)) = (abs(mod(100, 4)))的SQL,选择了对应的外部表进行操作。
master=# explain select * from tbl where id=100 and abs(mod(id, 4)) = (abs(mod(100, 4)));
QUERY PLAN
-------------------------------------------------------------------
Append (cost=0.00..134.10 rows=2 width=44)
-> Seq Scan on tbl (cost=0.00..0.00 rows=1 width=44)
Filter: ((id = 100) AND (abs(mod(id, 4)) = 0))
-> Foreign Scan on tbl0 (cost=100.00..134.10 rows=1 width=44)
(4 rows)
master=# explain select * from tbl where id=101 and abs(mod(id, 4)) = (abs(mod(101, 4)));
QUERY PLAN
-------------------------------------------------------------------
Append (cost=0.00..134.10 rows=2 width=44)
-> Seq Scan on tbl (cost=0.00..0.00 rows=1 width=44)
Filter: ((id = 101) AND (abs(mod(id, 4)) = 1))
-> Foreign Scan on tbl1 (cost=100.00..134.10 rows=1 width=44)
(4 rows)
不带约束条件abs(mod(id, 4)) = (abs(mod(100, 4)))的SQL,选择了所有外部表进行操作。
master=# explain select * from tbl where id=100;
QUERY PLAN
-------------------------------------------------------------------
Append (cost=0.00..500.68 rows=25 width=44)
-> Seq Scan on tbl (cost=0.00..0.00 rows=1 width=44)
Filter: (id = 100)
-> Foreign Scan on tbl0 (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on tbl1 (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on tbl2 (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on tbl3 (cost=100.00..125.17 rows=6 width=44)
(7 rows)
master=# explain select count(*),sum(id),avg(id+id) from tbl;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=908.01..908.02 rows=1 width=4)
-> Append (cost=0.00..791.00 rows=11701 width=4)
-> Seq Scan on tbl (cost=0.00..0.00 rows=1 width=4)
-> Foreign Scan on tbl0 (cost=100.00..197.75 rows=2925 width=4)
-> Foreign Scan on tbl1 (cost=100.00..197.75 rows=2925 width=4)
-> Foreign Scan on tbl2 (cost=100.00..197.75 rows=2925 width=4)
-> Foreign Scan on tbl3 (cost=100.00..197.75 rows=2925 width=4)
(7 rows)
创建插入路由触发器函数:
master=# create or replace function f_tbl_ins() returns trigger as
$$
declare
begin
case abs(mod(NEW.id, 4))
when 0 then
insert into tbl0 (id, info, crt_time) values (NEW.*);
when 1 then
insert into tbl1 (id, info, crt_time) values (NEW.*);
when 2 then
insert into tbl2 (id, info, crt_time) values (NEW.*);
when 3 then
insert into tbl3 (id, info, crt_time) values (NEW.*);
else
return null;
end case;
return null;
end;
$$
language plpgsql;
创建插入触发器:
master=# create trigger tg1 before insert on tbl for each row execute procedure f_tbl_ins();
CREATE TRIGGER
测试插入路由是否正确:
master=# insert into tbl values (1,'abc',now());
INSERT 0 0
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
(1 row)
master=# insert into tbl values (2,'abc',now());
INSERT 0 0
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
2 | abc | 2016-02-23 09:17:55.065332
(2 rows)
master=# select * from tbl where id=2;
id | info | crt_time
----+------+----------------------------
2 | abc | 2016-02-23 09:17:55.065332
(1 row)
master=# select * from tbl where id=2 and abs(mod(id, 4))=abs(mod(2, 4));
id | info | crt_time
----+------+----------------------------
2 | abc | 2016-02-23 09:17:55.065332
(1 row)
带约束条件abs(mod(id, 4)) = (abs(mod(100, 4)))的SQL,选择了对应的外部表进行操作。
master=# explain select * from tbl where id=2 and abs(mod(id, 4))=abs(mod(2, 4));
QUERY PLAN
-------------------------------------------------------------------
Append (cost=0.00..134.10 rows=2 width=44)
-> Seq Scan on tbl (cost=0.00..0.00 rows=1 width=44)
Filter: ((id = 2) AND (abs(mod(id, 4)) = 2))
-> Foreign Scan on tbl2 (cost=100.00..134.10 rows=1 width=44)
(4 rows)
空值操作,在触发器中已经绕过:
master=# insert into tbl values (null,'abc',now());
INSERT 0 0
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
2 | abc | 2016-02-23 09:17:55.065332
(2 rows)
支持分布式事务,看执行计划,远程用了for update,然后在上层先提交远程节点,再提交本地节点:
master=# explain (verbose) update tbl set info='new' where id=2 and abs(mod(id, 4))=abs(mod(2, 4));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Update on public.tbl (cost=0.00..149.02 rows=2 width=18)
Update on public.tbl
Foreign Update on public.tbl2
Remote SQL: UPDATE public.tbl2 SET info = $2 WHERE ctid = $1
-> Seq Scan on public.tbl (cost=0.00..0.00 rows=1 width=18)
Output: tbl.id, 'new'::text, tbl.crt_time, tbl.ctid
Filter: ((tbl.id = 2) AND (abs(mod(tbl.id, 4)) = 2))
-> Foreign Scan on public.tbl2 (cost=100.00..149.02 rows=1 width=18)
Output: tbl2.id, 'new'::text, tbl2.crt_time, tbl2.ctid
Remote SQL: SELECT id, crt_time, ctid FROM public.tbl2 WHERE ((id = 2)) AND ((abs(mod(id, 4)) = 2)) FOR UPDATE
(10 rows)
master=# explain (verbose) delete from tbl where id=2 and abs(mod(id, 4))=abs(mod(2, 4));
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Delete on public.tbl (cost=0.00..164.62 rows=2 width=6)
Delete on public.tbl
Foreign Delete on public.tbl2
Remote SQL: DELETE FROM public.tbl2 WHERE ctid = $1
-> Seq Scan on public.tbl (cost=0.00..0.00 rows=1 width=6)
Output: tbl.ctid
Filter: ((tbl.id = 2) AND (abs(mod(tbl.id, 4)) = 2))
-> Foreign Scan on public.tbl2 (cost=100.00..164.62 rows=1 width=6)
Output: tbl2.ctid
Remote SQL: SELECT ctid FROM public.tbl2 WHERE ((id = 2)) AND ((abs(mod(id, 4)) = 2)) FOR UPDATE
(10 rows)
可以回退外部表SQL:
master=# begin ;
BEGIN
master=# insert into tbl0 values (0,'abc',now());
INSERT 0 1
master=# rollback;
ROLLBACK
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
2 | abc | 2016-02-23 09:17:55.065332
(2 rows)
有跨库事务时,支持全局一致性:
master=# begin;
BEGIN
master=# insert into tbl values(3,'new',now());
INSERT 0 0
master=# insert into tbl values(1,'new',now());
ERROR: duplicate key value violates unique constraint "pk"
DETAIL: Key (id)=(1) already exists.
CONTEXT: Remote SQL command: INSERT INTO public.tbl1(id, info, crt_time) VALUES ($1, $2, $3)
SQL statement "insert into tbl1 (id, info, crt_time) values (NEW.*)"
PL/pgSQL function f_tbl_ins() line 8 at SQL statement
master=# end;
ROLLBACK
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
2 | abc | 2016-02-23 09:17:55.065332
(2 rows)
有跨库事务和本地事务时,支持全局一致性:
master=# create table test(id int primary key, info text);
CREATE TABLE
master=# begin;
BEGIN
master=# insert into test values (1,'abc');
INSERT 0 1
master=# insert into tbl values(3,'new',now());
INSERT 0 0
master=# insert into test values (1,'abc');
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(1) already exists.
master=# end;
ROLLBACK
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
2 | abc | 2016-02-23 09:17:55.065332
(2 rows)
master=# select * from test;
id | info
----+------
(0 rows)
支持绑定变量:
master=# prepare p1 (int,text,timestamp) as insert into tbl values ($1,$2,$3);
PREPARE
master=# prepare p2 (int,int) as select * from tbl where id=$1 and abs(mod($1,4))=$2;
PREPARE
master=# prepare p3 (int,int,text,timestamp) as update tbl set info=$3,crt_time=$4 where id=$1 and abs(mod($1,4))=$2;
PREPARE
master=# execute p1(1,'abc',now());
ERROR: duplicate key value violates unique constraint "pk"
DETAIL: Key (id)=(1) already exists.
CONTEXT: Remote SQL command: INSERT INTO public.tbl1(id, info, crt_time) VALUES ($1, $2, $3)
SQL statement "insert into tbl1 (id, info, crt_time) values (NEW.*)"
PL/pgSQL function f_tbl_ins() line 8 at SQL statement
master=# execute p1(3,'abc',now());
INSERT 0 0
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
2 | abc | 2016-02-23 09:17:55.065332
3 | abc | 2016-02-23 09:56:00.835324
(3 rows)
master=# execute p1(4,'abc',now());
INSERT 0 0
master=# execute p1(5,'abc',now());
INSERT 0 0
master=# execute p1(6,'abc',now());
INSERT 0 0
master=# execute p1(7,'abc',now());
INSERT 0 0
master=# execute p1(8,'abc',now());
INSERT 0 0
master=# execute p1(9,'abc',now());
INSERT 0 0
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
4 | abc | 2016-02-23 09:56:20.159337
8 | abc | 2016-02-23 09:56:31.034317
1 | abc | 2016-02-23 09:17:43.054333
5 | abc | 2016-02-23 09:56:24.392312
9 | abc | 2016-02-23 09:56:33.303365
2 | abc | 2016-02-23 09:17:55.065332
6 | abc | 2016-02-23 09:56:26.560318
3 | abc | 2016-02-23 09:56:00.835324
7 | abc | 2016-02-23 09:56:28.740312
(9 rows)
master=# execute p2(1,1);
id | info | crt_time
----+------+----------------------------
1 | abc | 2016-02-23 09:17:43.054333
(1 row)
master=# execute p2(10,2);
id | info | crt_time
----+------+----------
(0 rows)
master=# execute p2(1,2);
id | info | crt_time
----+------+----------
(0 rows)
master=# execute p2(2,2);
id | info | crt_time
----+------+----------------------------
2 | abc | 2016-02-23 09:17:55.065332
(1 row)
master=# execute p3(1,1,'test',now());
UPDATE 1
master=# select * from tbl;
id | info | crt_time
----+------+----------------------------
4 | abc | 2016-02-23 09:56:20.159337
8 | abc | 2016-02-23 09:56:31.034317
5 | abc | 2016-02-23 09:56:24.392312
9 | abc | 2016-02-23 09:56:33.303365
1 | test | 2016-02-23 09:57:12.126359
2 | abc | 2016-02-23 09:17:55.065332
6 | abc | 2016-02-23 09:56:26.560318
3 | abc | 2016-02-23 09:56:00.835324
7 | abc | 2016-02-23 09:56:28.740312
(9 rows)
使用xmin, xmax验证外部表因为事务中其他SQL或其他节点执行失败导致的事务回滚的情况,确保全局事务一致:
db0=# select tableoid,ctid,xmin,xmax,* from tbl0;
tableoid | ctid | xmin | xmax | id | info | crt_time
----------+--------+----------+------+----+------+----------------------------
20280 | (0,11) | 38232587 | 0 | 4 | test | 2016-02-23 11:25:20.440349
20280 | (0,12) | 38232587 | 0 | 8 | test | 2016-02-23 11:25:20.440349
(2 rows)
注意master节点查看到的外部表的xmin, xmax是不准确的,已经提交了这个BUG给社区。
db0=# \c master
You are now connected to database "master" as user "postgres".
master=# select tableoid,ctid,xmin,xmax,* from tbl0;
tableoid | ctid | xmin | xmax | id | info | crt_time
----------+--------+------+------------+----+------+----------------------------
20304 | (0,11) | 192 | 4294967295 | 4 | test | 2016-02-23 11:25:20.440349
20304 | (0,12) | 192 | 4294967295 | 8 | test | 2016-02-23 11:25:20.440349
(2 rows)
使用以下SQL,在一条SQL中跨库更新多条记录,其中一条涉及到ID从7变成6,会违反数据库分片的表约束,导致SQL失败,看看这种情况下是否能保证全局事务一致性。
master=# update tbl set id=(case id when 4 then 4 when 3 then 3 when 7 then 6 else id end) ;
ERROR: new row for relation "tbl3" violates check constraint "ck1"
DETAIL: Failing row contains (6, test, 2016-02-23 11:25:20.440349).
CONTEXT: Remote SQL command: UPDATE public.tbl3 SET id = $2 WHERE ctid = $1
master=# select tableoid,ctid,xmin,xmax,* from tbl4;
ERROR: relation "tbl4" does not exist
LINE 1: select tableoid,ctid,xmin,xmax,* from tbl4;
^
查看XID:
master=# select tableoid,ctid,xmin,xmax,* from tbl0;
tableoid | ctid | xmin | xmax | id | info | crt_time
----------+--------+------+------------+----+------+----------------------------
20304 | (0,11) | 192 | 4294967295 | 4 | test | 2016-02-23 11:25:20.440349
20304 | (0,12) | 192 | 4294967295 | 8 | test | 2016-02-23 11:25:20.440349
(2 rows)
可以看到xmax值已经变更了,说明发生了回退,从而确保了全局事务的一致性。
master=# \c db0
You are now connected to database "db0" as user "postgres".
db0=# select tableoid,ctid,xmin,xmax,* from tbl0;
tableoid | ctid | xmin | xmax | id | info | crt_time
----------+--------+----------+----------+----+------+----------------------------
20280 | (0,11) | 38232587 | 38232588 | 4 | test | 2016-02-23 11:25:20.440349
20280 | (0,12) | 38232587 | 38232588 | 8 | test | 2016-02-23 11:25:20.440349
(2 rows)
外部表和外部表的跨库JOIN
master=# explain select * from tbl, t where t.id=tbl.id and tbl.id=1 and abs(mod(tbl.id,4))=1;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (cost=0.00..635.41 rows=50 width=88)
-> Append (cost=0.00..500.68 rows=25 width=44)
-> Seq Scan on t (cost=0.00..0.00 rows=1 width=44)
Filter: (id = 1)
-> Foreign Scan on db0_t (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on db1_t (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on db2_t (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on db3_t (cost=100.00..125.17 rows=6 width=44)
-> Materialize (cost=0.00..134.11 rows=2 width=44)
-> Append (cost=0.00..134.10 rows=2 width=44)
-> Seq Scan on tbl (cost=0.00..0.00 rows=1 width=44)
Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))
-> Foreign Scan on tbl1 (cost=100.00..134.10 rows=1 width=44)
(13 rows)
外部表,全局表的JOIN
master=# explain select * from tbl, t, test where t.id=tbl.id and test.id=t.id and tbl.id=1 and abs(mod(tbl.id,4))=1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Nested Loop (cost=0.15..643.60 rows=50 width=124)
-> Append (cost=0.00..500.68 rows=25 width=44)
-> Seq Scan on t (cost=0.00..0.00 rows=1 width=44)
Filter: (id = 1)
-> Foreign Scan on db0_t (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on db1_t (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on db2_t (cost=100.00..125.17 rows=6 width=44)
-> Foreign Scan on db3_t (cost=100.00..125.17 rows=6 width=44)
-> Materialize (cost=0.15..142.30 rows=2 width=80)
-> Nested Loop (cost=0.15..142.29 rows=2 width=80)
-> Index Scan using test_pkey on test (cost=0.15..8.17 rows=1 width=36)
Index Cond: (id = 1)
-> Append (cost=0.00..134.10 rows=2 width=44)
-> Seq Scan on tbl (cost=0.00..0.00 rows=1 width=44)
Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))
-> Foreign Scan on tbl1 (cost=100.00..134.10 rows=1 width=44)
(16 rows)
扩展能力和性能:
master和数据分片都可以水平扩展。
性能可以随着节点数的增加线性提升。
更适合OLTP。OLAP目前还是MPP做得比较好。
如果要下推JOIN,目前的方法是使用数据节点视图,将JOIN封装在视图中,在上层节点建立视图的外部表,通过访问这类外部表来实现JOIN的下推。未来PG会提供JOIN下推的功能,不需要这么麻烦。
其他
truncate 目前不支持外部表
JOIN目前不能下推到数据节点执行
条件可以下推到数据节点
不要走开,马上下一篇会介绍利用PostgreSQL特性实现 Oracle 水平分库的方案和技术实现。
没错,是 Oracle 水平分库,而且不限 Oracle 的版本。
文章来咯:
https://yq.aliyun.com/articles/6637