PostgreSQL 9.6 水平分库,还差一点点啦

PostgreSQL 持续在基于fdw的sharding技术上深耕,9.6开始,在符合条件的前提下,支持JOIN和SORT下推到数据节点执行。
下面是一个测试
创建几个shard库

for subfix in 0 1 2 3 
do
psql -c "create database db$subfix"
done

创建master库

psql -c "create database master;"
psql master -c "create extension postgres_fdw;"

在master库创建foreign server和user mapping

for subfix in 0 1 2 3 
do
psql master -c "create server db$subfix foreign data wrapper postgres_fdw options (hostaddr 'xxx.xxx.xxx.xxx', dbname 'db$subfix', port '1923');"
psql master -c "create user mapping for postgres server db$subfix options (user 'postgres', password 'postgres');"
done

在shard库创建分片表

for subfix in 0 1 2 3 
do
psql db$subfix -c "drop table if exists tbl; create table tbl(id int primary key, info text)"
psql db$subfix -c "drop table if exists tab; create table tab(id int primary key, info text)"
done

在master库创建foreign 表,并设置约束

for subfix in 0 1 2 3 
do
psql master -c "drop foreign table if exists tbl$subfix ; create foreign table tbl$subfix (id int not null, info text) server db$subfix options (schema_name 'public', table_name 'tbl');"
psql master -c "alter foreign table tbl$subfix add constraint ck1 check (mod(id,4) = $subfix );"

psql master -c "drop foreign table if exists tab$subfix ; create foreign table tab$subfix (id int not null, info text) server db$subfix options (schema_name 'public', table_name 'tab');"
psql master -c "alter foreign table tab$subfix add constraint ck1 check (mod(id,4) = $subfix );"
done

查看

psql master <<EOF
\det
EOF

结果

 List of foreign tables
 Schema | Table | Server 
--------+-------+--------
 public | tab0  | db0
 public | tab1  | db1
 public | tab2  | db2
 public | tab3  | db3
 public | tbl0  | db0
 public | tbl1  | db1
 public | tbl2  | db2
 public | tbl3  | db3
(8 rows)

在master库创建父表

psql master -c "create table tbl(id int primary key, info text);"
psql master -c "create table tab(id int primary key, info text);"

在master库创建foreign表和父表的继承关系

for subfix in 0 1 2 3 
do
psql master -c "alter foreign table tbl$subfix inherit tbl;"
psql master -c "alter foreign table tab$subfix inherit tab;"
done

测试JOIN的下推

master=# explain verbose select * from tbl1,tab1 where tab1.id=tbl1.id and mod(tbl1.id,4)=1;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=100.00..226.75 rows=48 width=72)
   Output: tbl1.id, tbl1.info, tab1.id, tab1.info
   Relations: (public.tbl1) INNER JOIN (public.tab1)
   Remote SQL: SELECT r1.id, r1.info, r2.id, r2.info FROM (public.tbl r1 INNER JOIN public.tab r2 ON (((r1.id = r2.id)) AND ((mod(r1.id, 4) = 1))))
(4 rows)

目前sort下推需要关闭优化器enable_sort开关才会下推,也是值得改进的地方

master=# set enable_sort=off;
SET
master=# explain verbose select * from tbl1 where mod(id,4)=mod(100,4) order by id;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Foreign Scan on public.tbl1  (cost=100.00..136.71 rows=7 width=36)
   Output: id, info
   Remote SQL: SELECT id, info FROM public.tbl WHERE ((mod(id, 4) = 0)) ORDER BY id ASC NULLS LAST
(3 rows)



还需要改进的地方
这样的查询优化器能优化什么?

  1. 如果要更纯粹的sharding,父表不应该参与计算,只是一个别名而已。因此JOIN 可以根据前提条件下推。
  2. tab.id=tbl.id and mod(tbl.id,4)=1 可以推演出 and mod(tab.id,4)=1 。 因此tab表只需要扫描tab1。
master=# explain verbose select * from tbl,tab where tab.id=tbl.id and mod(tbl.id,4)=1;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..0.00 rows=0 width=0)
   Output: tbl.id, tbl.info, tab.id, tab.info
   Workers Planned: 1
   Single Copy: true
   ->  Hash Join  (cost=130.71..757.17 rows=218 width=72)
         Output: tbl.id, tbl.info, tab.id, tab.info
         Hash Cond: (tab.id = tbl.id)
         ->  Append  (cost=0.00..603.80 rows=5461 width=36)
               ->  Seq Scan on public.tab  (cost=0.00..0.00 rows=1 width=36)
                     Output: tab.id, tab.info
               ->  Foreign Scan on public.tab0  (cost=100.00..150.95 rows=1365 width=36)
                     Output: tab0.id, tab0.info
                     Remote SQL: SELECT id, info FROM public.tab
               ->  Foreign Scan on public.tab1  (cost=100.00..150.95 rows=1365 width=36)
                     Output: tab1.id, tab1.info
                     Remote SQL: SELECT id, info FROM public.tab
               ->  Foreign Scan on public.tab2  (cost=100.00..150.95 rows=1365 width=36)
                     Output: tab2.id, tab2.info
                     Remote SQL: SELECT id, info FROM public.tab
               ->  Foreign Scan on public.tab3  (cost=100.00..150.95 rows=1365 width=36)
                     Output: tab3.id, tab3.info
                     Remote SQL: SELECT id, info FROM public.tab
         ->  Hash  (cost=130.61..130.61 rows=8 width=36)
               Output: tbl.id, tbl.info
               ->  Append  (cost=0.00..130.61 rows=8 width=36)
                     ->  Seq Scan on public.tbl  (cost=0.00..0.00 rows=1 width=36)
                           Output: tbl.id, tbl.info
                           Filter: (mod(tbl.id, 4) = 1)
                     ->  Foreign Scan on public.tbl1  (cost=100.00..130.61 rows=7 width=36)
                           Output: tbl1.id, tbl1.info
                           Remote SQL: SELECT id, info FROM public.tbl WHERE ((mod(id, 4) = 1))
(31 rows)

修改源码,允许删除继承的约束
vi src/backend/commands/tablecmds.c

                /* Don't drop inherited constraints */
//              if (con->coninhcount > 0 && !recursing)
//                      ereport(ERROR,
//                                      (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
//                                       errmsg("cannot drop inherited constraint \"%s\" of relation \"%s\"",
//                                                      constrName, RelationGetRelationName(rel))));

重新编译,重启

make && make install
pg_ctl restart -m fast

添加主表的约束,删除子表的约束,造成主表不被访问的假象。

alter table tab add constraint tab_ck check (mod(id, 4) is null);
alter table tbl add constraint tbl_ck check (mod(id, 4) is null);

alter table only tab0 drop constraint tab_ck;
alter table only tab1 drop constraint tab_ck;
alter table only tab2 drop constraint tab_ck;
alter table only tab3 drop constraint tab_ck;

alter table only tbl0 drop constraint tbl_ck;
alter table only tbl1 drop constraint tbl_ck;
alter table only tbl2 drop constraint tbl_ck;
alter table only tbl3 drop constraint tbl_ck;

我的目标是这样能做到下推JOIN,但实际上没有下推,这个是非常痛苦的。
一个真正的分库中间件应该解决这样的问题。

master=# explain verbose select * from tbl ,tab where tab.id=tbl.id and mod(tbl.id,4)=1 and mod(tab.id,4)=1;
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Nested Loop  (cost=200.00..261.98 rows=7 width=72)
   Output: tbl1.id, tbl1.info, tab1.id, tab1.info
   Join Filter: (tbl1.id = tab1.id)
   ->  Append  (cost=100.00..130.61 rows=7 width=36)
         ->  Foreign Scan on public.tbl1  (cost=100.00..130.61 rows=7 width=36)
               Output: tbl1.id, tbl1.info
               Remote SQL: SELECT id, info FROM public.tbl WHERE ((mod(id, 4) = 1))
   ->  Materialize  (cost=100.00..130.65 rows=7 width=36)
         Output: tab1.id, tab1.info
         ->  Append  (cost=100.00..130.61 rows=7 width=36)
               ->  Foreign Scan on public.tab1  (cost=100.00..130.61 rows=7 width=36)
                     Output: tab1.id, tab1.info
                     Remote SQL: SELECT id, info FROM public.tab WHERE ((mod(id, 4) = 1))
(13 rows)

这才是我要的结果

                                                                                  QUERY PLAN                                                                                  
-------------------------
 Foreign Scan  (cost=100.00..161.58 rows=1 width=72)
   Output: tbl.id, tbl.info, tab.id, tab.info
   Relations: (public.tbl1 tbl) INNER JOIN (public.tab1 tab)
   Remote SQL: SELECT r1.id, r1.info, r2.id, r2.info FROM (public.tbl r1 INNER JOIN public.tab r2 ON (((r1.id = r2.id)) AND ((mod(r2.id, 4) = 1)) AND ((mod(r1.id, 4) = 1))))
(4 rows)

小结

  1. 通常应用会以父表作为常用的表,而不是直接访问子表,但是可以要求用户带上分区约束的条件,从而满足下推的排他性。
  2. PG应该允许父表和子表有不一样的约束,从而可以利用排他约束把父表的访问过滤掉。
  3. 父表访问过滤掉之后,子表的JOIN应该可以下推使用。
  4. 复制表的实现和下推?
  5. 在PostgreSQL的shard完美之前,用户应该尽量避免JOIN。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值