PostgreSQL fdw详解

最新推荐文章于 2024-08-05 14:26:28 发布

foucus、

最新推荐文章于 2024-08-05 14:26:28 发布

阅读量1.7w

点赞数 3

分类专栏： PostgreSQL 文章标签： sql 数据库 postgresql

本文链接：https://blog.csdn.net/weixin_39540651/article/details/105968786

版权

PostgreSQL 专栏收录该内容

282 篇文章 169 订阅

订阅专栏

postgres fdw是一种外部访问接口，它可以被用来访问存储在外部的数据，这些数据可以是外部的pg数据库，也可以oracle、mysql等数据库，甚至可以是文件。

目前支持的fdw外部数据源：
https://wiki.postgresql.org/wiki/Foreign_data_wrappers

FDW一般用于哪些场景呢？例如：
sharding，例如pg_sharedman插件，就是使用postgres_fdw和pg_pathman的插件来实现数据的分片。
同步数据、etl、数据迁移等等。

postgres_fdw使用举例：
这里创建2个数据库db01,db02,2个用户user01,user02分别用来作为本地和远端的数据库和用户。

bill=# create user user01 superuser  password 'bill';    
CREATE ROLE
bill=# create database db01 owner=user01 TEMPLATE=template0 LC_CTYPE='zh_CN.UTF-8';
CREATE DATABASE
bill=# create user user02 superuser password 'bill';
CREATE ROLE
bill=# create database db02 with owner=user02 TEMPLATE=template0 LC_CTYPE='zh_CN.UTF-8';
CREATE DATABASE

接下来在远端的db02下面创建表：

bill=# \c db02 user02 
You are now connected to database "db02" as user "user02".
db02=# create table table1 (id int, crt_Time timestamp, info text, c1 int);
CREATE TABLE
db02=#  create table table2 (id int, crt_Time timestamp, info text, c1 int); 
CREATE TABLE
db02=# insert into table1 select generate_series(1,1000000), clock_timestamp(), md5(random()::text), random()*1000;  
INSERT 0 1000000
db02=# insert into table2 select generate_series(1,1000000), clock_timestamp(), md5(random()::text), random()*1000;  
INSERT 0 1000000

需要注意，想要使用fdw访问数据需要先确保：网络通，数据库防火墙(pg_hba,conf)正常，远端数据库的用户必须有表的相关权限。

然后在本地db01创建server：

db01=# CREATE SERVER db02  
db01-#         FOREIGN DATA WRAPPER postgres_fdw  
db01-#         OPTIONS (host '192.168.7.xxx', port '1921', dbname 'db02'); 
CREATE SERVER
db01=# select * from pg_foreign_server ;  
  oid  | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl |                 srvoptions                 
-------+---------+----------+--------+---------+------------+--------+--------------------------------------------
 50361 | db02    |    50345 |  50350 |         |            |        | {host=192.168.7.xxx,port=1921,dbname=db02}
(1 row)

配置user mapping：

db01=# CREATE USER MAPPING FOR user01  
db01-# SERVER db02  
db01-# OPTIONS (user 'user02', password 'bill'); 
CREATE USER MAPPING

然后就可以创建forein table了：
–方法一：批量导入，这种比较常见，可以一次导入一个模式下的所有表

db01=# import foreign schema public from server db02 into sch1;
IMPORT FOREIGN SCHEMA
db01=# \det sch1.*
  List of foreign tables
 Schema | Table  | Server 
--------+--------+--------
 sch1   | table1 | db02
 sch1   | table2 | db02
(2 rows)

–方法二：单个创建

db01=#  CREATE FOREIGN TABLE sch1.table1 (  
db01(#    id int, crt_Time timestamp, info text, c1 int)   
db01-#          SERVER db02  
db01-#          OPTIONS (schema_name 'public', table_name 'table1');  
CREATE FOREIGN TABLE

查询：

db01=#  select count(*) from sch1.table1; 
  count  
---------
 1000000
(1 row)

db01=#  select count(*) from sch1.table2;
  count  
---------
 1000000
(1 row)

我们可以explain verbose来查看sql在远端怎么执行的：

db01=# explain verbose  select count(*) from sch1.table1; 
                     QUERY PLAN                     
----------------------------------------------------
 Foreign Scan  (cost=108.53..152.69 rows=1 width=8)
   Output: (count(*))
   Relations: Aggregate on (sch1.table1)
   Remote SQL: SELECT count(*) FROM public.table1
(4 rows)

pushdown：
我们在本地执行的语句并不是所有的都能pushdown到远端执行，目前只支持：
内置数据类型、immutable操作符、immutable函数。
这也比较好理解，因为pg允许用户自己定义数据类型、操作符这些，如果我们在本地定义了一种新的操作符但是在远端却没有，自然无法push到远端。另外如果本地和远端都创建了同样的extension，那么这个extension自带的操作符和函数是可以pushdown的。

–projection

db01=# explain verbose select id from sch1.table2;   
                              QUERY PLAN                              
----------------------------------------------------------------------
 Foreign Scan on sch1.table2  (cost=100.00..197.75 rows=2925 width=4)
   Output: id
   Remote SQL: SELECT id FROM public.table2
(3 rows)

–where

db01=# explain verbose select * from sch1.table1 where id=1;  
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Foreign Scan on sch1.table1  (cost=100.00..124.33 rows=6 width=48)
   Output: id, crt_time, info, c1
   Remote SQL: SELECT id, crt_time, info, c1 FROM public.table1 WHERE ((id = 1))
(3 rows)

–agg

db01=# explain verbose select count(*) from sch1.table1;   
                     QUERY PLAN                     
----------------------------------------------------
 Foreign Scan  (cost=108.53..152.69 rows=1 width=8)
   Output: (count(*))
   Relations: Aggregate on (sch1.table1)
   Remote SQL: SELECT count(*) FROM public.table1
(4 rows)

–join

db01=# explain verbose select t1.* from sch1.table1 t1 inner join sch1.table2 using (id) limit 2;  
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=100.00..100.99 rows=2 width=48)
   Output: t1.id, t1.crt_time, t1.info, t1.c1
   Relations: (sch1.table1 t1) INNER JOIN (sch1.table2)
   Remote SQL: SELECT r1.id, r1.crt_time, r1.info, r1.c1 FROM (public.table1 r1 INNER JOIN public.table2 r2 ON (((r1.id = r2.id)))) LIMIT 2::bigint
(4 rows)

–limit
pg12之前limit操作不在远端执行。

db01=# explain verbose select * from sch1.table2 limit 10;   
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Foreign Scan on sch1.table2  (cost=100.00..100.37 rows=10 width=48)
   Output: id, crt_time, info, c1
   Remote SQL: SELECT id, crt_time, info, c1 FROM public.table2 LIMIT 10::bigint
(3 rows)

–sort

db01=# explain verbose select * from sch1.table2 order by id desc limit 10;    
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Foreign Scan on sch1.table2  (cost=100.00..100.40 rows=10 width=48)
   Output: id, crt_time, info, c1
   Remote SQL: SELECT id, crt_time, info, c1 FROM public.table2 ORDER BY id DESC NULLS FIRST LIMIT 10::bigint
(3 rows)

控制参数：
例如前面提到的可以在本地和远端都创建了同样的extension，那么这个extension自带的操作符和函数是可以pushdown的，我们需要通过控制参数extensions声明。

db01=#  alter server db02 options (add extensions 'dblink');
ALTER SERVER
db01=#  alter server db02 options (set extensions 'dblink');   
ALTER SERVER
db01=# alter server db02 options (drop extensions ); 
ALTER SERVER

pull
有些时候需要将远端的数据pull到本地来进行操作。

db01=# create table t as select * from sch1.table1;   
SELECT 1000000
db01=#  explain verbose select count(*) from t join sch1.table1 t1 on (t.id=t1.id and t1.c1=1); 
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Aggregate  (cost=28038.83..28038.83 rows=1 width=8)
   Output: count(*)
   ->  Hash Join  (cost=147.05..27831.98 rows=82738 width=0)
         Hash Cond: (t.id = t1.id)
         ->  Seq Scan on public.t  (cost=0.00..21341.70 rows=1103170 width=4)
               Output: t.id, t.crt_time, t.info, t.c1
         ->  Hash  (cost=146.86..146.86 rows=15 width=4)
               Output: t1.id
               ->  Foreign Scan on sch1.table1 t1  (cost=100.00..146.86 rows=15 width=4)
                     Output: t1.id
                     Remote SQL: SELECT id FROM public.table1 WHERE ((c1 = 1))
(11 rows)

DML

db01=# explain verbose update sch1.table1 set crt_time=now() where id=1;  
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Update on sch1.table1  (cost=100.00..124.78 rows=6 width=54)
   Remote SQL: UPDATE public.table1 SET crt_time = $2 WHERE ctid = $1
   ->  Foreign Scan on sch1.table1  (cost=100.00..124.78 rows=6 width=54)
         Output: id, now(), info, c1, ctid
         Remote SQL: SELECT id, info, c1, ctid FROM public.table1 WHERE ((id = 1)) FOR UPDATE
(5 rows)

连接池
因为pg是多进程的结构，对于每一个连接都要fork一个新的进程，那么通过fdw访问远端数据是不是会建立很多个进程呢？使用postgres_fdw在一个session中如果使用的是同样的foreign server那么是可以重用同一个进程的。

并行
postgres_fdw是允许并行操作的，比如对大表进行操作是可以用到并行的。
https://www.postgresql.org/docs/12/fdwhandler.html

事务
使用fdw访问数据是怎么保证数据的一致性呢？在这一方面并没有使用两阶段提交的方式，而是规定：
1、如果本地事务的隔离级别是SERIALIZABLE，那么远端事务的隔离级别也是SERIALIZABLE；
2、如果本地事务的隔离级别是其它的，那么远端事务的隔离级别都是REPEATABLE READ。