FDW连接
preparations
- 两个在不同主机(满足TCP/IP连接是分属于不同IP地址下)
- (如果是安装PG源码)进入源码目录下面的
/contrib/postgres_fdw/
,然后用make & make install
编译和安装
[root@localhost /]# cd 源码存放的目录/postgresql-15.3/contrib/postgres_fdw/
[root@localhost postgres_fdw]# make & make install
- (如果是yum安装)
Connection
[userb@localhost ~]$ psql -d my_second_pgdb
psql (15.3)
Type "help" for help.
# 1.创建拓展
my_second_pgdb=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION
my_first_pgdb=# \x
Expanded display is on.
my_first_pgdb=# select * from pg_foreign_data_wrapper;
-[ RECORD 1 ]+-------------
oid | 16392
fdwname | postgres_fdw
fdwowner | 10
fdwhandler | 16390
fdwvalidator | 16391
fdwacl |
fdwoptions |
my_first_pgdb=# select * from pg_extension;
-[ RECORD 1 ]--+-------------
oid | 13673
extname | plpgsql
extowner | 10
extnamespace | 11
extrelocatable | f
extversion | 1.0
extconfig |
extcondition |
-[ RECORD 2 ]--+-------------
oid | 16389
extname | postgres_fdw
extowner | 10
extnamespace | 2200
extrelocatable | t
extversion | 1.1
extconfig |
extcondition |
# 2. 创建目标库连接串,包含欲访问的外部表所在主机的IP地址,欲访问的数据库进程的端口号,数据库名称
# 关系型数据库常用默认端口号
# 1.Oracle数据库默认端口号为,1521;
# 2.MySQL数据库默认端口号为,3306;
# 3.SQLServer数据库默认端口号为,1433;
# 4.PostgreSQL数据库默认端口号为,5432;
my_second_pgdb=# CREATE SERVER server_a_my_first_pgdb FOREIGN DATA WRAPPER postgres_fdw OPTIONS(host 'xxx.xxx.xxx.xxx', port '5432', dbname 'my_first_pgdb');
CREATE SERVER
my_second_pgdb=# select * from pg_foreign_server;
-[ RECORD 1 ]----------------------------------------------------
oid | 16396
srvname | server_a_my_first_pgdb
srvowner | 10
srvfdw | 16392
srvtype |
srvversion |
srvacl |
srvoptions | {host=xxx.xxx.xxx.xxx,port=5432,dbname=my_first_pgdb}
# 3. 用户映射:欲访问的外部表所属用户名、密码(OPTIONS中),将其映射到当前库的某一个用户(FOR关键字后面)上来
# 这里假设当前库的被映射的用户名为usera,也就是欲访问的外部表所属的用户;映射的用户名为userb,也就是当前库的其中一个用户。
# 查看数据库中的用户和(已MS5加密后的)密码:SELECT rolname,rolpassword FROM pg_authid;
# 修改用户密码:ALTER USER usera WITH PASSWORD 'useraaa';
# 这里被映射的用户一定要有密码,不然会报错
my_second_pgdb=# CREATE USER MAPPING FOR userb SERVER server_a_my_first_pgdb OPTIONS (user 'usera', password 'useraaa');
CREATE USER MAPPING
my_secondt_pgdb=# select * from pg_user_mappings;
-[ RECORD 1 ]----------------------------
umid | 16401
srvid | 16396
srvname | server_a_my_first_pgdb
umuser | 10
usename | userb
umoptions | {user=usera,password=aaa}
# 4. 创建外部表(相当于外部表在本地数据库的映射),欲访问外部表或外部视图的字段(列)信息,字段可以少于目标表或视图,取自己需要的
# 这里有一个schema(模式)的概念,默认所有的表都放在名为public的模式下,如果指定了模式的就放在指定模式下
# 字段可以选取内容的意思是,例如我的外部表本来有五列字段,但可以仅访问其中两列,将欲访问的写在下方即可
my_second_pgdb=# CREATE FOREIGN TABLE fdw_a_weather(city character varying(80), temp_hi integer) SERVER server_a_my_first_pgdb OPTIONS(schema_name 'public', table_name 'weather');
CREATE FOREIGN TABLE
my_second_pgdb=# select * from pg_foreign_table;
-[ RECORD 1 ]--------------------------------------
ftrelid | 16402
ftserver | 16396
ftoptions | {schema_name=public,table_name=weather}
Access Foreign Data
- 注意查询时,外部表所在主机的数据库服务器也要开启,才能成功查询到。
my_second_pgdb=# select * from fdw_a_weather;
city | temp_hi
-------------------+---------
San Francisco | 50
Changsha | 35
Hainan Ledong | 31
Hainan Sanya | 31
Hainan Baoting | 31
Hainan Qiongzhong | 31
Hainan Dongfang | 31
Hainan Lingshui | 31
Changsha | 28
(9 rows)
连接FDW中可能遇到的问题
- failed: 没有到主机的路由
my_first_pgdb=# select * from fdw_a_weather;
2023-07-05 11:10:18.061 CST [3225] ERROR: could not connect to server "server_a_my_first_pgdb"
2023-07-05 11:10:18.061 CST [3225] DETAIL: connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 没有到主机的路由
Is the server running on that host and accepting TCP/IP connections?
2023-07-05 11:10:18.061 CST [3225] STATEMENT: select * from fdw_a_weather;
ERROR: could not connect to server "server_a_my_first_pgdb"
DETAIL: connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 没有到主机的路由
Is the server running on that host and accepting TCP/IP connections?
- 解决:可能防火墙没有放行数据库端口,这里以5432为例
# 查看已放行端口号
[root@localhost ~]# firewall-cmd --list-port
# 添加放行端口号5432
[root@localhost ~]# firewall-cmd --zone=public --add-port=5432/tcp --permanent
success
# 重新加载防火墙
[root@localhost ~]# firewall-cmd --reload
success
# 再次查看,添加成功
[root@localhost ~]# firewall-cmd --list-port
5432/tcp
- failed: 拒绝连接
my_first_pgdb=# select * from fdw_a_weather;
2023-07-05 11:13:52.810 CST [3225] ERROR: could not connect to server "server_a_my_first_pgdb"
2023-07-05 11:13:52.810 CST [3225] DETAIL: connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 拒绝连接
Is the server running on that host and accepting TCP/IP connections?
2023-07-05 11:13:52.810 CST [3225] STATEMENT: select * from fdw_a_weather;
ERROR: could not connect to server "server_a_my_first_pgdb"
DETAIL: connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 拒绝连接
Is the server running on that host and accepting TCP/IP connections?
- 解决方法:进入主机a外部表所属数据簇(Database Cluster)目录下,修改
postgresql.conf
文件,添加listen_addresses = '*'
;修改pg_hba.conf
,添加host all all 0.0.0.0/0 trust
,修改完记得重启数据库的服务器(如果是别的数据库,也是找到其配置文件的监听地址项进行修改)
[usera@localhost data]$ vim postgresql.conf
[usera@localhost data]$ vim pg_hba.conf
[usera@localhost ~]$ pg_ctl -D /pgdata/15.3/poc/data/ restart
- 修改完如下图:
- ERROR: relation “my_first_pgdb.weather” does not exist
my_first_pgdb=# select * from fdw_a_weather;
2023-07-05 11:42:22.321 CST [3225] ERROR: relation "my_first_pgdb.weather" does not exist
2023-07-05 11:42:22.321 CST [3225] CONTEXT: remote SQL command: SELECT city, temp_hi FROM my_first_pgdb.weather
2023-07-05 11:42:22.321 CST [3225] STATEMENT: select * from fdw_a_weather;
ERROR: relation "my_first_pgdb.weather" does not exist
CONTEXT: remote SQL command: SELECT city, temp_hi FROM my_first_pgdb.weather
- 错误原因:
CREATE FOREIGN TABLE
时,OPTIONS
后面没有正确填写外部表所属的schema模式名称,PostgreSQL中默认模式名就是public,如果没有新建过别的模式名,就填public就行了,注意:模式这一概念在db之下
Chapter 4 Foreign Data Wrappers 外部数据封装)
4.1 Foreign Data Wrapper(FDW)
- What is Foreign Data?——Firstly, a table on a remote server is called a foreign table in SQL/MED.(运行在远程服务器上的表称之为外部表) PostgreSQL’s Foreign Data Wrappers are that use SQL/MES to manage foreign tables which are similar to local tables.(外部数据封装器会用SQL/MES来管理类似本地表的外部表)
- After installing the necessary extension and making the appropriate settings, you can access the foreign tables on the remote servers.(安装必须的拓展和进行合适的设置后,就可以访问远程服务器的外部关系表了) For example, suppose there are two remote servers, namaly, postgresql and mysql, which have foreign_pg_tbl table and foreign_my_tbl table, respectively. In this exmaple, you can access the foreign tables from the local server by issuing the SELECT queries as shown below.(假设有两个远程服务器,名字分别为postgresql和mysql,分别有两张关系表foreign_pg_tbl和foreign_my_tbl,下例中可以用SELECT查询在本地服务器上访问外部表)
- Moreover, you can execute the join operation with the foreign tables stored in different servers which are similar to the local tables.(此外,和连接本地服务器上的关系表类似,还可以对存储在不同服务器上的外部表进行连接操作)
- Many FDW extensions have been developed and listed in Postgres wiki. However, almost all extensions are not properly maintained except for postgres_fdw, which is officially developed and maintained by the PostgreSQL Global Development Group as an extension to access a remote PostgreSQL server.(目前已经开发了很多外部数据封装器拓展,都列在了Postgreswiki中,除了postgres_fdw几乎所有其他的fdw拓展都没有得到良好维护,postgres_fdw 作为提供给其他服务器的访问PG的拓展程序,是由PG全球开发小组开发并维护的)
- PostgreSQL’s FDW is described in detail in the following sections. Section 4.1.1 presents an overview of the FDW in PostgreSQL. Section 4.1.2 describes how the postgres_fdw extension performs.
4.1.1 Overview
- To use the FDW feature, you need to install the appropriate extension and execute setup commands, such as CREATE FOREIGN TABLE, CREATE SERVER and CREATE USER MAPPING (for details, refer to the official document).(使用FDW之前,需要在该数据簇正确安装次拓展和执行设置命令,例如:
CREATE EXTENSION
,CREATE FOREIGN TABLE
,CREATE SERVER
和CREATE USER MAPPING
) - After providing the appropriate setting, the functions defined in the extension are invoked during query processing to access the foreign tables.(提供了合适的配置之后,定义在拓展里的函数将会在对外部表的查询过程中被调用)
- Briefly describes how FDW performs in PostgreSQL:
- ① The analyzer/analyser creates the query tree of the input SQL.(Analyzer创建输入的SQL的查询树)
- ②The planner (or executor) connects to the remote server.(计划器Planner或者executor连接远程服务器)
- ③ If the use_remote_estimate option is on (the default is off), the planner executes EXPLAIN commands for estimating the cost of each plan path.(如果
use_remote_estimate
选项是开启的话,planner会执行EXPAIN
命令去估算每一条计划路径的成本。) - ④The planner creates the plain text SQL statement from the plan tree which is internally called deparesing.(计划器Planner根据计划师创建纯文本SQL语句,这个文本在内部被称之为desparesing,详见4.1.1.4小节)
- ⑤The executor sends the plain text SQL statement to the remote server and receives the result.(executor将纯文本SQL语句发送给远程服务器,并接受查询的结果)
- The executor then processes the received data if necessary. For example, if the multi-table query is executed, the executor perfoms the join processing of the received data and other tables.(如有需要的话executor会处理接收到的数据,例如,如果是多表查询,executor会执行收到的表数据与其他表的连接操作)
4.1.1.1. Creating a Query Tree
- The analyzer/analyser creates the query tree of the input SQL using the definitions of the foreign tables, which are stored in the
pg_catalog.pg_class
andpg_catalog.pg_foreign_table
catalogs using theCREATE FOREIGN TABLE
orIMPORT FOREIGN SCHEMA
command.(Analyzer用外部表中的定义来创建输入的SQL语句的查询树,生成的查询树用CREATE FOREIGN TABLE
和IMPORT FOREIGN SCHEMA
两个命令被存储这两个目录中)
4.1.1.2 Connecting to the Remote Server
-
To connect to the remote server, the planner (or executor) uses the specific library to connect to the remote database server.(为了连接远程服务器,计划器planner用特定的库连接远程的数据库服务器) For example, to connect to the remote PostgreSQL server,
postgres_fdw
uses thelibpq
. To connect to the mysql server,mysql_fdw
, which is developed by EnterpriseDB, uses thelibmysqlclient
.(举例,要连接远程的pg服务器,postgres_fdw
会用libpq
库;要连接远程的mysql服务器,mysql_fdw
会使用libmysqlclient
库) -
The connection parameters, such as
username
, server’s IP address and port number, are stored in thepg_catalog.pg_user_mapping
andpg_catalog.pg_foreign_server
catalogs using theCREATE USER MAPPING
andCREATE SERVER
commands.(连接参数,例如username
,服务器的IP
地址,端口号等,都用CREATE USER MAPPING
和CREATE SERVER
命令被存储在pg_catalog.pg_user_mapping
和pg_catalog.pg_foreign_server
目录下面)
4.1.1.3 Creating a Plan Tree Using EXPLAIN Commands (Optional)
-
PostgreSQL’s FDW supports the feature to obtain statistics of the foreign tables to estimate the plan tree of a query, which are used by some FDW extensions, such as
postgres_fdw
,mysql_fdw
,tds_fdw
andjdbc2_fdw
.(pg的FDW支持用获取到的外部表的统计信息去估算查询计划树的功能,具体是用一些FDW的拓展实现的,比如postgres_fdw
,mysql_fdw
,tds_fdw
和jdbc2_fdw
) -
If the use_remote_estimate option is set to on using the ALTER SERVER command, the planner queries the cost of plans to the remote server by executing the EXPLAIN command; otherwise, the embedded constant values are used by default.(如果用命令
ALTER SERVER
将use_remote_estimate
选项开启,计划器将用EXPLAIN
命令向远程服务器查询计划的成本,否则,默认情况下会使用嵌入的常数值)
localdb=# ALTER SERVER remote_server_name OPTIONS (use_remote_estimate 'on');
- Although, some extensions use the values of the EXPLAIN command, only
postgres_fdw
can reflect the results of the EXPLAIN commands because the PostgreSQL’s EXPLAIN command returns both the start-up and total costs.(虽然一些拓展会用到EXPLAIN
命令的值,但只有postgres_fdw
可以反映该命令执行后的结果,因为pg的EXPLAIN
命令会将启动成本和总成本的值全部返回) - The results of the EXPLAIN command could not be used by other DBMS fdw extensions for planning. For example, mysql’s EXPLAIN command only returns the estimated number of rows; however, PostgreSQL’s planner needs more information to estimate the cost as described in Chapter 3.(
EXPLAIN
命令的结果不能被用于其他DBMS的FDW拓展的生成计划树阶段。例如,mysql的EXPLAIN
命令只返回表的估计行数,然而,pg的计划器需要更多信息去估算Chapter 3 中提到的那些成本,所以不同DBMS的EXPLAIN
命令不能共用)
4.1.1.4 Deparesting
- To generate the plan tree, the planner creates the plain text SQL statement from the plan tree’s scan paths of the foreign tables. For example, the following picture shows the plan tree of the following SELECT statement.(为了生成计划树,计划器会根据外部表的计划树扫描路径创建纯文本SQL语句,例如:下图展示了
SELECT * FROM tbl_a AS a WHERE a.id < 10
这一外部表查询语句的计划树)
4.1.1.5 Sending SQL Statements and Receiving Result
- After deparsing, the executor sends the deparsed SQL statements to the remote server and receives the result.(上一步完成后,executor会发送一个deparsed SQL语句给远程服务器并等待接收结果)
- The method for sending the SQL statements to the remote server depends on the developer of each extension.(将SQL语句发送给远程服务器的具体方式取决于每一种FDW拓展的开发者) For example,
mysql_fdw
sends the SQL statements without using a transaction. The typical sequence of SQL statements to execute a SELECT query inmysql_fdw
is shown below.mysql_fdw
converts the result to readable data by PostgreSQL.(mysql_fdw
在下图的例子中转化返回结果为pg可读的形式)
- In postgres_fdw, the sequence of SQL commands is complicated. The typical sequence of SQL statements to execute a SELECT query in postgres_fdw is shown below.
- (5-1) Start the remote transaction.(启动远程事务) The default remote transaction isolation level is REPEATABLE READ; if the isolation level of the local transaction is set to SERIALIZABLE, the remote transaction is also set to SERIALIZABLE.(默认远程事务隔离级别是
REPEATABLE READ
,如果本地事务的隔离级别被设置为SERIALIZABLE
,远程事务也会被设置为SERIALIZABLE
.) - (5-2)-(5-4) Declare a cursor. The SQL statement is basically executed as a cursor.(声明一个游标,SQL语句基本上都被当成一个游标来执行)
- (5-5) Execute FETCH commands to obtain the result. By default, 100 rows are fetched by the FETCH command.(执行取指令以获得结果,默认情况下,取指令将取100行指令)
- (5-6) Receive the result from the remote server.(从远程服务器接收结果)
- (5-7) Close the cursor.(关闭游标)
- (5-8) Commit the remote transaction.(提交远程事务)
- (5-1) Start the remote transaction.(启动远程事务) The default remote transaction isolation level is REPEATABLE READ; if the isolation level of the local transaction is set to SERIALIZABLE, the remote transaction is also set to SERIALIZABLE.(默认远程事务隔离级别是
4.1.2 How the Postgres_fdw Extension Performs
postgres_fdw
extension is a special module that is officially maintained by the PostgreSQL Global Development Group and its source code is included in the PostgreSQL source code tree.(postgres_fdw
拓展是由PostgreSQL全球开发小组正式维护一个特殊的模块,pg源码树同时包含了它的源码。)
postgres_fdw
is gradually improved. The following table presents the release notes related to postgres_fdw
from the official document.(postgres_fdw
拓展在不断地改进中,下表展示了官方文件中关于该拓展不同版本的说明)
Version | Description |
---|---|
9.3 | postgres_fdw module is released.(首次发布) |
9.6 | Consider performing sorts on the remote server. Consider performing joins on the remote server. When feasible, perform UPDATE or DELETE entirely on the remote server. Allow the fetch size to be set as a server or table option.(添加排序,连接功能;可行时可对远程表进行删改;允许将提取大小设置为服务器或表选项。) |
10 | Push aggregate functions to the remote server, when possible.(聚集函数) |
11 | Allow to push down aggregates to foreign tables that are partitions. Allow to push UPDATEs and DELETEs using joins to foreign servers.(多表连接的删改) |
12 | Allow ORDER BY sorts and LIMIT clauses to be pushed in more cases.(排序和LIMIT) |
14 | Allow TRUNCATE to operate on foreign tables. Allow INSERT rows in bulk. Add function postgres_fdw_get_connections() to report open foreign server connections.(截断,允许批量插入行,增加了一个函数可以打开的外部服务器连接。) |
Given that the previous section describes how postgres_fdw
processes a single-table query the following subsection how postgres_fdw
processes a multi-table query, sort operation and aggregate functions.(下一小节主要描述postgres_fdw
怎么处理多表查询,排序操作和聚集函数)
This subsection focuses on the SELECT statement; however, postgres_fdw can also process other DML (INSERT, UPDATE, and DELETE) statements as shown below.
4.1.2.1 Multi-Table Query
-
To execute a multi-table query,
postgres_fdw
fetches each foreign table using a single-table SELECT statement and then join them on the local server.(执行多表查询时,postgres_fdw
会对每一个外部表都先执行单表查询,然后将他们与本地服务器进行连接) -
In version 9.5 or earlier, even if the foreign tables are stored in the same remote server,
postgres_fdw
fetches them individually and joins them.(9.5及其更早的版本中,即使外部表也是存在相同的远程服务器中,postgres_fdw
也是单独获取和连接它们) -
In version 9.6 or later,
postgres_fdw
has been improved and can execute the remote join operation on the remote server when the foreign tables are on the same server and the use_remote_estimate option is on.(在9.6及其后来的版本中,当外部表在同一个服务器上且use_remote_estimate
参数状态为on
的时候,postgres_fdw
被改进后能够在远程服务器上执行远程连接操作) -
The execution details are described as follows.(外部表的多表查询操作细节具体如下:分为9.6版本之前和9.6版本及其以后)
-
Version 9.5 or earlier:
-
EXPLAIN SELECT * FROM tbl_a AS a,tbl_b AS b WHERE a.id=b.id AND a.id<200; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=532.31..700.34 rows=10918 width=16) Merge Cond: (a.id = b.id) -> Sort (cost=200.59..202.72 rows=853 width=8) Sort Key: a.id -> Foreign Scan on tbl_a a (cost=100.00..159.06 rows=853 width=8) -> Sort (cost=331.72..338.12 rows=2560 width=8) Sort Key: b.id -> Foreign Scan on tbl_b b (cost=100.00..186.80 rows=2560 width=8) (8 rows)
-
the executor selects the merge join.(executor选择使用了merge连接,两个都是外部表)
-
How the executor fetch the rows??——
-
①Start the remote transaction.(启动远程事务)
-
②Declare the cursor c 1 c_1 c1, the SELECT statement of which is shown below:
SELECT id,data FROM public.tbl_a WHERE id<200;
.(声明一个游标 c 1 c_1 c1 ,也就是一句SELECT
语句) -
③Execute FETCH commands to obtain the result of the cursor 1.(执行
FETCH
指令获得游标 c 1 c_1 c1 的执行结果) -
④Declare the cursor c 2 c_2 c2, whose SELECT statement is shown below:
SELECT id,data FROM public.tbl_b;
.(声明一个游标 c 2 c_2 c2,也就是另一句SELECT语句) -
Note that the WHERE clause of the original double-table query is “
tbl_a.id = tbl_b.id AND tbl_a.id < 200
”; therefore, a WHERE clause “tbl_b.id < 200
” can be logically added to theSELECT
statement as shown previously. However,postgres_fdw
cannot perform this inference; therefore, the executor has to execute theSELECT
statement that does not contain anyWHERE
clauses and has to fetch all rows of the foreign tabletbl_b
.(按照最原始的SELECT
语句中的语义,其实WHERE tbl_b<200
可以被逻辑添加到这里游标 c 2 c_2 c2 的SELECT
语句中来,但是postgres_fdw
不能执行这个判断,因此executor声明的游标 c 2 c_2 c2 中没有包含任何WHERE
限制子句,而是会将外部表tbl_b
中的所有行全部读取一遍) -
This process is inefficient because unnecessary rows must be read from the remote server via the network. Furthermore, the received rows must be sorted to execute the merge join.(这个过程很低效,因为很多没必要的行也必须通过网络从远程服务器被读取过来,再者,这些多余的行也会被executor执行合并连接的时候被排序)
-
⑤Execute
FETCH
commands to obtain the result of the cursor 2.(执行取指令获得以上游标的执行结果) -
⑥Close the cursor c 1 c_1 c1.(关闭游标 c 1 c_1 c1)
-
⑦Close the cursor c 2 c_2 c2.(关闭游标 c 2 c_2 c2)
-
⑧Commit the transaction.(提交事务)
-
-
Summarize: After receiving the rows, the executor sorts both received rows of
tbl_a
andtbl_b
, and then executes a merge join operation with the sorted rows.(接收到外部表的行之后,executor将对接收到的两个表都进行排序,然后执行一次merge连接,将排序完成的两表进行连接)
-
-
Version 9.6 or later:
-
If the use_remote_estimate option is on (the default is off),
postgres_fdw
sends severalEXPLAIN
commands to obtain the costs of all plans related to the foreign tables.(如果选项use_remote_estimate
状态是on
,postgres_fdw
将会发送几个EXPLAIN
命令以获取外部表相关的所有计划的成本信息)To send the
EXPLAIN
commands,postgres_fdw
sends both theEXPLAIN
command of each single-table query and theEXPLAIN
commands of theSELECT
statements to execute remote join operations.(发送EXPLAIN
命令时,postgres_fdw
会把单表查询的EXPLAIN
和执行远程连接操作的SELECT
语句的EXPLANE
都一并发送) In this example, the following sevenEXPLAIN
commands are sent to the remote server to obtain the estimated costs of eachSELECT
statement; the planner then selects the cheapest plan.(在本例中下列七个EXPLAIN
命令都被发送给了远程服务器以获取每一个SELECT的估算成本) -
(1) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200)) (2) EXPLAIN SELECT id, data FROM public.tbl_b (3) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST (4) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((((SELECT null::integer)::integer) = id)) AND ((id < 200)) (5) EXPLAIN SELECT id, data FROM public.tbl_b ORDER BY id ASC NULLS LAST (6) EXPLAIN SELECT id, data FROM public.tbl_b WHERE ((((SELECT null::integer)::integer) = id)) (7) EXPLAIN SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200)))) localdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND a.id < 200; QUERY PLAN ----------------------------------------------------------- Foreign Scan (cost=134.35..244.45 rows=80 width=16) Relations: (public.tbl_a a) INNER JOIN (public.tbl_b b) (2 rows)
-
The result shows that the planner selects the inner join query that is processed on the remote server, which is very efficient.(执行结果显示计划器选择了在远程服务器上处理内部链接循环,非常有效)
-
How the postgrs_fdw performed??——
-
①Start the remote transaction.(启动远程事务)
-
②Execute the EXPLAIN commands for estimating the cost of each plan path.(执行EXPLAIN命令去预测每一条计划路径的成本)
-
In this example, seven EXPLAIN commands are executed. Then, the planner selects the cheapest cost of the SELECT queries using the results of the executed EXPLAIN commands.(planner计划器会选择最划算的一句SELECT语句最后执行)
-
③Declare the cursor c 1 c_1 c1, whose SELECT statement is shown below:
SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200))))
. (声明游标 c 1 c_1 c1) -
④Receive the result from the remote server.(接受远程服务器执行游标的结果)
-
⑤Close the cursor c 1 c_1 c1.(结束游标 c 1 c_1 c1)
-
⑥Commit the transaction.(提交事务)
-
-
Note that if the use_remote_estimate option is off (by default), a remote-join query is rarely selected because the costs are estimated using a very large embedded value.()
-
-
4.1.2.2 Sort Operations
- In version 9.5 or earlier, the sort operation, such as ORDER BY, is processed on the local server, i.e. the local server fetches all the target rows from the remote server prior to the sort operation. Let us explore how a simple query that includes an ORDER BY clause is processed using the EXPLAIN command.(在9.5以及更早的版本中,排序ORDER BY是运行在本地服务器上的,本地服务器执行排序操作之前会从远程服务器取所有目标外部表的行,下面将用EXPLAIN命令介绍包含ORDER BY的简单查询语句是怎样运行的)
localdb=# EXPLAIN SELECT * FROM tbl_a AS a WHERE a.id < 200 ORDER BY a.id;
QUERY PLAN
-----------------------------------------------------------------------
Sort (cost=200.59..202.72 rows=853 width=8)
Sort Key: id
-> Foreign Scan on tbl_a a (cost=100.00..159.06 rows=853 width=8)
(3 rows)
-
Line 6: The executor sends the following query to the remote server, and then fetches the query result.(executor发送以下查询语句给远程服务器,然后取回查询结果)
-
SELECT id, data FROM public.tbl_a WHERE ((id < 200))
-
Line 4: The executor sorts the fetched rows of
tbl_a
on the local server.(executor对外部表tbl_a
取回的行在本地服务器上进行排序) -
In version 9.6 or later,
postgres_fdw
can execute theSELECT
statements with anORDER BY
clause on the remote server if possible.(在9.6版本及其后来的版本中,如果可以的话postgres_fdw
直接在远程服务器上执行带ORDER BY
子句的查询)
localdb=# EXPLAIN SELECT * FROM tbl_a AS a WHERE a.id < 200 ORDER BY a.id;
QUERY PLAN
-----------------------------------------------------------------
Foreign Scan on tbl_a a (cost=100.00..167.46 rows=853 width=8)
(1 row)
-
Line 4: The executor sends the following query that has an ORDER BY clause to the remote server and then fetches the query result, which is already sorted.(executor发送以下带有
ORDER BY
子句的查询给远程服务器,然后取查询结果) -
SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
-
The workload of the local server has been reduced by this improvement.(这中改进可以给本地服务器减轻工作负载)
4.1.2.3 Aggregate Functions
- In version 9.6 or earlier, similar to the sort operation mentioned in the previous subsection, the aggregate functions such as
AVG()
andcont()
are processed on the local server as the following steps.(在9.6以及更早的版本中,类似于前面小节中提到的的排序操作,按以下步骤处理AVG()
和cont()
之类的聚合函数,也是在本地服务器上)
localdb=# EXPLAIN SELECT AVG(data) FROM tbl_a AS a WHERE a.id < 200;
QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=168.50..168.51 rows=1 width=4)
-> Foreign Scan on tbl_a a (cost=100.00..166.06 rows=975 width=4)
(2 rows)
-
Line 5: The executor sends the following query to the remote server, and then fetches the query result.(executor发送以下查询语句给远程服务器,然后取查询结果)
-
SELECT id, data FROM public.tbl_a WHERE ((id < 200))
-
Line 4: The executor calculates the average of the fetched rows of
tbl_a
on the local server.(executor在本地服务器上计算上一步取回的tbl_a
中的行的平均值) -
This process is costly because sending a large number of rows consumes heavy network traffic and takes a long time.(这个过程的成本很高,因为发送大量的行会消耗大量的网络流量并花费很长时间)
-
In version 10 or later,
postgres_fdw
executes the SELECT statement with the aggregate function on the remote server if possible.(在10以及后面的版本,postgres_fdw
用SELECT
查询语句直接在远程服务器上执行聚集函数)
localdb=# EXPLAIN SELECT AVG(data) FROM tbl_a AS a WHERE a.id < 200;
QUERY PLAN
-----------------------------------------------------
Foreign Scan (cost=102.44..149.03 rows=1 width=32)
Relations: Aggregate on (public.tbl_a a)
(2 rows)
-
Line 4: The executor sends the following query that contains an
AVG()
function to the remote server, and then fetches the query result.(executor发送以下包含聚集函数AVG()
的查询语句给远程服务器,然后取回查询结果) -
SELECT avg(data) FROM public.tbl_a WHERE ((id < 200))
-
This process is obviously efficient because the remote server calculates the average and sends only one row as the result.(这个流程明显要更高效,因为远程服务器负责计算平均值并且只需要返回一行结果即可)