PostpreSQL内核学习-Chapter4 Foreign Data Wrapper(FDW)

FDW连接

preparations

  • 两个在不同主机(满足TCP/IP连接是分属于不同IP地址下)
  • (如果是安装PG源码)进入源码目录下面的/contrib/postgres_fdw/,然后用make & make install编译和安装
[root@localhost /]# cd 源码存放的目录/postgresql-15.3/contrib/postgres_fdw/
[root@localhost postgres_fdw]#  make & make install
  • (如果是yum安装)

Connection

[userb@localhost ~]$ psql -d my_second_pgdb
psql (15.3)
Type "help" for help.

# 1.创建拓展
my_second_pgdb=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

my_first_pgdb=# \x
Expanded display is on.

my_first_pgdb=# select * from pg_foreign_data_wrapper;
-[ RECORD 1 ]+-------------
oid          | 16392
fdwname      | postgres_fdw
fdwowner     | 10
fdwhandler   | 16390
fdwvalidator | 16391
fdwacl       | 
fdwoptions   | 

my_first_pgdb=# select * from pg_extension;
-[ RECORD 1 ]--+-------------
oid            | 13673
extname        | plpgsql
extowner       | 10
extnamespace   | 11
extrelocatable | f
extversion     | 1.0
extconfig      | 
extcondition   | 
-[ RECORD 2 ]--+-------------
oid            | 16389
extname        | postgres_fdw
extowner       | 10
extnamespace   | 2200
extrelocatable | t
extversion     | 1.1
extconfig      | 
extcondition   | 



# 2. 创建目标库连接串,包含欲访问的外部表所在主机的IP地址,欲访问的数据库进程的端口号,数据库名称
# 关系型数据库常用默认端口号
# 1.Oracle数据库默认端口号为,1521; 
# 2.MySQL数据库默认端口号为,3306; 
# 3.SQLServer数据库默认端口号为,1433;
# 4.PostgreSQL数据库默认端口号为,5432;

my_second_pgdb=# CREATE SERVER server_a_my_first_pgdb FOREIGN DATA WRAPPER postgres_fdw OPTIONS(host 'xxx.xxx.xxx.xxx', port '5432', dbname 'my_first_pgdb');
CREATE SERVER
my_second_pgdb=# select * from pg_foreign_server;
-[ RECORD 1 ]----------------------------------------------------
oid        | 16396
srvname    | server_a_my_first_pgdb
srvowner   | 10
srvfdw     | 16392
srvtype    | 
srvversion | 
srvacl     | 
srvoptions | {host=xxx.xxx.xxx.xxx,port=5432,dbname=my_first_pgdb}




# 3. 用户映射:欲访问的外部表所属用户名、密码(OPTIONS中),将其映射到当前库的某一个用户(FOR关键字后面)上来
# 这里假设当前库的被映射的用户名为usera,也就是欲访问的外部表所属的用户;映射的用户名为userb,也就是当前库的其中一个用户。
# 查看数据库中的用户和(已MS5加密后的)密码:SELECT rolname,rolpassword FROM pg_authid;
# 修改用户密码:ALTER USER usera WITH PASSWORD 'useraaa';
# 这里被映射的用户一定要有密码,不然会报错

my_second_pgdb=# CREATE USER MAPPING FOR userb SERVER server_a_my_first_pgdb OPTIONS (user 'usera', password 'useraaa');
CREATE USER MAPPING
my_secondt_pgdb=# select * from pg_user_mappings;
-[ RECORD 1 ]----------------------------
umid      | 16401
srvid     | 16396
srvname   | server_a_my_first_pgdb
umuser    | 10
usename   | userb
umoptions | {user=usera,password=aaa}




# 4. 创建外部表(相当于外部表在本地数据库的映射),欲访问外部表或外部视图的字段(列)信息,字段可以少于目标表或视图,取自己需要的
# 这里有一个schema(模式)的概念,默认所有的表都放在名为public的模式下,如果指定了模式的就放在指定模式下
# 字段可以选取内容的意思是,例如我的外部表本来有五列字段,但可以仅访问其中两列,将欲访问的写在下方即可

my_second_pgdb=# CREATE FOREIGN TABLE fdw_a_weather(city character varying(80), temp_hi integer) SERVER server_a_my_first_pgdb OPTIONS(schema_name 'public', table_name 'weather');
CREATE FOREIGN TABLE
my_second_pgdb=# select * from pg_foreign_table;
-[ RECORD 1 ]--------------------------------------
ftrelid   | 16402
ftserver  | 16396
ftoptions | {schema_name=public,table_name=weather}

Access Foreign Data

  • 注意查询时,外部表所在主机的数据库服务器也要开启,才能成功查询到。
my_second_pgdb=# select * from fdw_a_weather;
       city        | temp_hi 
-------------------+---------
 San Francisco     |      50
 Changsha          |      35
 Hainan Ledong     |      31
 Hainan Sanya      |      31
 Hainan Baoting    |      31
 Hainan Qiongzhong |      31
 Hainan Dongfang   |      31
 Hainan Lingshui   |      31
 Changsha          |      28
(9 rows)

连接FDW中可能遇到的问题

  1. failed: 没有到主机的路由
my_first_pgdb=# select * from fdw_a_weather;
2023-07-05 11:10:18.061 CST [3225] ERROR:  could not connect to server "server_a_my_first_pgdb"
2023-07-05 11:10:18.061 CST [3225] DETAIL:  connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 没有到主机的路由
		Is the server running on that host and accepting TCP/IP connections?
2023-07-05 11:10:18.061 CST [3225] STATEMENT:  select * from fdw_a_weather;
ERROR:  could not connect to server "server_a_my_first_pgdb"
DETAIL:  connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 没有到主机的路由
	Is the server running on that host and accepting TCP/IP connections?
  • 解决:可能防火墙没有放行数据库端口,这里以5432为例
# 查看已放行端口号
[root@localhost ~]# firewall-cmd --list-port
# 添加放行端口号5432
[root@localhost ~]# firewall-cmd --zone=public --add-port=5432/tcp --permanent
success
# 重新加载防火墙
[root@localhost ~]# firewall-cmd --reload
success
# 再次查看,添加成功
[root@localhost ~]# firewall-cmd --list-port
5432/tcp
  1. failed: 拒绝连接
my_first_pgdb=# select * from fdw_a_weather;
2023-07-05 11:13:52.810 CST [3225] ERROR:  could not connect to server "server_a_my_first_pgdb"
2023-07-05 11:13:52.810 CST [3225] DETAIL:  connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 拒绝连接
		Is the server running on that host and accepting TCP/IP connections?
2023-07-05 11:13:52.810 CST [3225] STATEMENT:  select * from fdw_a_weather;
ERROR:  could not connect to server "server_a_my_first_pgdb"
DETAIL:  connection to server at "xxx.xxx.xxx.xxx", port 5432 failed: 拒绝连接
	Is the server running on that host and accepting TCP/IP connections?
  • 解决方法:进入主机a外部表所属数据簇(Database Cluster)目录下,修改postgresql.conf文件,添加listen_addresses = '*';修改pg_hba.conf,添加host all all 0.0.0.0/0 trust,修改完记得重启数据库的服务器(如果是别的数据库,也是找到其配置文件的监听地址项进行修改)
[usera@localhost data]$ vim postgresql.conf 
[usera@localhost data]$ vim pg_hba.conf 
[usera@localhost ~]$ pg_ctl -D /pgdata/15.3/poc/data/ restart
  • 修改完如下图:

    在这里插入图片描述
  1. ERROR: relation “my_first_pgdb.weather” does not exist
my_first_pgdb=# select * from fdw_a_weather;
2023-07-05 11:42:22.321 CST [3225] ERROR:  relation "my_first_pgdb.weather" does not exist
2023-07-05 11:42:22.321 CST [3225] CONTEXT:  remote SQL command: SELECT city, temp_hi FROM my_first_pgdb.weather
2023-07-05 11:42:22.321 CST [3225] STATEMENT:  select * from fdw_a_weather;
ERROR:  relation "my_first_pgdb.weather" does not exist
CONTEXT:  remote SQL command: SELECT city, temp_hi FROM my_first_pgdb.weather

  • 错误原因:CREATE FOREIGN TABLE时,OPTIONS后面没有正确填写外部表所属的schema模式名称,PostgreSQL中默认模式名就是public,如果没有新建过别的模式名,就填public就行了,注意:模式这一概念在db之下

Chapter 4 Foreign Data Wrappers 外部数据封装)

4.1 Foreign Data Wrapper(FDW)

  • What is Foreign Data?——Firstly, a table on a remote server is called a foreign table in SQL/MED.(运行在远程服务器上的表称之为外部表) PostgreSQL’s Foreign Data Wrappers are that use SQL/MES to manage foreign tables which are similar to local tables.(外部数据封装器会用SQL/MES来管理类似本地表的外部表)
  • After installing the necessary extension and making the appropriate settings, you can access the foreign tables on the remote servers.(安装必须的拓展和进行合适的设置后,就可以访问远程服务器的外部关系表了) For example, suppose there are two remote servers, namaly, postgresql and mysql, which have foreign_pg_tbl table and foreign_my_tbl table, respectively. In this exmaple, you can access the foreign tables from the local server by issuing the SELECT queries as shown below.(假设有两个远程服务器,名字分别为postgresql和mysql,分别有两张关系表foreign_pg_tblforeign_my_tbl,下例中可以用SELECT查询在本地服务器上访问外部表)
  • Moreover, you can execute the join operation with the foreign tables stored in different servers which are similar to the local tables.(此外,和连接本地服务器上的关系表类似,还可以对存储在不同服务器上的外部表进行连接操作)
  • Many FDW extensions have been developed and listed in Postgres wiki. However, almost all extensions are not properly maintained except for postgres_fdw, which is officially developed and maintained by the PostgreSQL Global Development Group as an extension to access a remote PostgreSQL server.(目前已经开发了很多外部数据封装器拓展,都列在了Postgreswiki中,除了postgres_fdw几乎所有其他的fdw拓展都没有得到良好维护,postgres_fdw 作为提供给其他服务器的访问PG的拓展程序,是由PG全球开发小组开发并维护的)
  • PostgreSQL’s FDW is described in detail in the following sections. Section 4.1.1 presents an overview of the FDW in PostgreSQL. Section 4.1.2 describes how the postgres_fdw extension performs.
4.1.1 Overview
  • To use the FDW feature, you need to install the appropriate extension and execute setup commands, such as CREATE FOREIGN TABLE, CREATE SERVER and CREATE USER MAPPING (for details, refer to the official document).(使用FDW之前,需要在该数据簇正确安装次拓展和执行设置命令,例如:CREATE EXTENSION, CREATE FOREIGN TABLE, CREATE SERVERCREATE USER MAPPING
  • After providing the appropriate setting, the functions defined in the extension are invoked during query processing to access the foreign tables.(提供了合适的配置之后,定义在拓展里的函数将会在对外部表的查询过程中被调用)
  • Briefly describes how FDW performs in PostgreSQL:
    • ① The analyzer/analyser creates the query tree of the input SQL.(Analyzer创建输入的SQL的查询树)
    • ②The planner (or executor) connects to the remote server.(计划器Planner或者executor连接远程服务器)
    • ③ If the use_remote_estimate option is on (the default is off), the planner executes EXPLAIN commands for estimating the cost of each plan path.(如果use_remote_estimate选项是开启的话,planner会执行EXPAIN命令去估算每一条计划路径的成本。)
    • ④The planner creates the plain text SQL statement from the plan tree which is internally called deparesing.(计划器Planner根据计划师创建纯文本SQL语句,这个文本在内部被称之为desparesing,详见4.1.1.4小节)
    • ⑤The executor sends the plain text SQL statement to the remote server and receives the result.(executor将纯文本SQL语句发送给远程服务器,并接受查询的结果)

  • The executor then processes the received data if necessary. For example, if the multi-table query is executed, the executor perfoms the join processing of the received data and other tables.(如有需要的话executor会处理接收到的数据,例如,如果是多表查询,executor会执行收到的表数据与其他表的连接操作)
4.1.1.1. Creating a Query Tree
  • The analyzer/analyser creates the query tree of the input SQL using the definitions of the foreign tables, which are stored in the pg_catalog.pg_class and pg_catalog.pg_foreign_table catalogs using the CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA command.(Analyzer用外部表中的定义来创建输入的SQL语句的查询树,生成的查询树用CREATE FOREIGN TABLEIMPORT FOREIGN SCHEMA两个命令被存储这两个目录中)
4.1.1.2 Connecting to the Remote Server
  • To connect to the remote server, the planner (or executor) uses the specific library to connect to the remote database server.(为了连接远程服务器,计划器planner用特定的库连接远程的数据库服务器) For example, to connect to the remote PostgreSQL server, postgres_fdw uses the libpq. To connect to the mysql server, mysql_fdw, which is developed by EnterpriseDB, uses the libmysqlclient.(举例,要连接远程的pg服务器,postgres_fdw会用libpq库;要连接远程的mysql服务器,mysql_fdw会使用libmysqlclient库)

  • The connection parameters, such as username, server’s IP address and port number, are stored in the pg_catalog.pg_user_mapping and pg_catalog.pg_foreign_server catalogs using the CREATE USER MAPPING and CREATE SERVER commands.(连接参数,例如username,服务器的IP地址,端口号等,都用CREATE USER MAPPINGCREATE SERVER命令被存储在pg_catalog.pg_user_mappingpg_catalog.pg_foreign_server目录下面)

4.1.1.3 Creating a Plan Tree Using EXPLAIN Commands (Optional)
  • PostgreSQL’s FDW supports the feature to obtain statistics of the foreign tables to estimate the plan tree of a query, which are used by some FDW extensions, such as postgres_fdw, mysql_fdw, tds_fdw and jdbc2_fdw.(pg的FDW支持用获取到的外部表的统计信息去估算查询计划树的功能,具体是用一些FDW的拓展实现的,比如postgres_fdwmysql_fdwtds_fdwjdbc2_fdw

  • If the use_remote_estimate option is set to on using the ALTER SERVER command, the planner queries the cost of plans to the remote server by executing the EXPLAIN command; otherwise, the embedded constant values are used by default.(如果用命令ALTER SERVERuse_remote_estimate选项开启,计划器将用EXPLAIN命令向远程服务器查询计划的成本,否则,默认情况下会使用嵌入的常数值)

localdb=# ALTER SERVER remote_server_name OPTIONS (use_remote_estimate 'on');
  • Although, some extensions use the values of the EXPLAIN command, only postgres_fdw can reflect the results of the EXPLAIN commands because the PostgreSQL’s EXPLAIN command returns both the start-up and total costs.(虽然一些拓展会用到EXPLAIN命令的值,但只有postgres_fdw可以反映该命令执行后的结果,因为pg的EXPLAIN命令会将启动成本和总成本的值全部返回)
  • The results of the EXPLAIN command could not be used by other DBMS fdw extensions for planning. For example, mysql’s EXPLAIN command only returns the estimated number of rows; however, PostgreSQL’s planner needs more information to estimate the cost as described in Chapter 3.(EXPLAIN命令的结果不能被用于其他DBMS的FDW拓展的生成计划树阶段。例如,mysql的EXPLAIN命令只返回表的估计行数,然而,pg的计划器需要更多信息去估算Chapter 3 中提到的那些成本,所以不同DBMS的EXPLAIN命令不能共用)
4.1.1.4 Deparesting
  • To generate the plan tree, the planner creates the plain text SQL statement from the plan tree’s scan paths of the foreign tables. For example, the following picture shows the plan tree of the following SELECT statement.(为了生成计划树,计划器会根据外部表的计划树扫描路径创建纯文本SQL语句,例如:下图展示了SELECT * FROM tbl_a AS a WHERE a.id < 10这一外部表查询语句的计划树)

4.1.1.5 Sending SQL Statements and Receiving Result
  • After deparsing, the executor sends the deparsed SQL statements to the remote server and receives the result.(上一步完成后,executor会发送一个deparsed SQL语句给远程服务器并等待接收结果)
  • The method for sending the SQL statements to the remote server depends on the developer of each extension.(将SQL语句发送给远程服务器的具体方式取决于每一种FDW拓展的开发者) For example, mysql_fdw sends the SQL statements without using a transaction. The typical sequence of SQL statements to execute a SELECT query in mysql_fdw is shown below. mysql_fdw converts the result to readable data by PostgreSQL.(mysql_fdw在下图的例子中转化返回结果为pg可读的形式)

  • In postgres_fdw, the sequence of SQL commands is complicated. The typical sequence of SQL statements to execute a SELECT query in postgres_fdw is shown below.
    • (5-1) Start the remote transaction.(启动远程事务) The default remote transaction isolation level is REPEATABLE READ; if the isolation level of the local transaction is set to SERIALIZABLE, the remote transaction is also set to SERIALIZABLE.(默认远程事务隔离级别是REPEATABLE READ,如果本地事务的隔离级别被设置为SERIALIZABLE,远程事务也会被设置为SERIALIZABLE.)
    • (5-2)-(5-4) Declare a cursor. The SQL statement is basically executed as a cursor.(声明一个游标,SQL语句基本上都被当成一个游标来执行)
    • (5-5) Execute FETCH commands to obtain the result. By default, 100 rows are fetched by the FETCH command.(执行取指令以获得结果,默认情况下,取指令将取100行指令)
    • (5-6) Receive the result from the remote server.(从远程服务器接收结果)
    • (5-7) Close the cursor.(关闭游标)
    • (5-8) Commit the remote transaction.(提交远程事务)

4.1.2 How the Postgres_fdw Extension Performs

postgres_fdw extension is a special module that is officially maintained by the PostgreSQL Global Development Group and its source code is included in the PostgreSQL source code tree.(postgres_fdw拓展是由PostgreSQL全球开发小组正式维护一个特殊的模块,pg源码树同时包含了它的源码。)

postgres_fdw is gradually improved. The following table presents the release notes related to postgres_fdw from the official document.(postgres_fdw拓展在不断地改进中,下表展示了官方文件中关于该拓展不同版本的说明)

VersionDescription
9.3postgres_fdw module is released.(首次发布)
9.6Consider performing sorts on the remote server. Consider performing joins on the remote server. When feasible, perform UPDATE or DELETE entirely on the remote server. Allow the fetch size to be set as a server or table option.(添加排序,连接功能;可行时可对远程表进行删改;允许将提取大小设置为服务器或表选项。)
10Push aggregate functions to the remote server, when possible.(聚集函数)
11Allow to push down aggregates to foreign tables that are partitions. Allow to push UPDATEs and DELETEs using joins to foreign servers.(多表连接的删改)
12Allow ORDER BY sorts and LIMIT clauses to be pushed in more cases.(排序和LIMIT)
14Allow TRUNCATE to operate on foreign tables. Allow INSERT rows in bulk. Add function postgres_fdw_get_connections() to report open foreign server connections.(截断,允许批量插入行,增加了一个函数可以打开的外部服务器连接。)

Given that the previous section describes how postgres_fdw processes a single-table query the following subsection how postgres_fdw processes a multi-table query, sort operation and aggregate functions.(下一小节主要描述postgres_fdw怎么处理多表查询,排序操作和聚集函数)

This subsection focuses on the SELECT statement; however, postgres_fdw can also process other DML (INSERT, UPDATE, and DELETE) statements as shown below.

4.1.2.1 Multi-Table Query
  • To execute a multi-table query, postgres_fdw fetches each foreign table using a single-table SELECT statement and then join them on the local server.(执行多表查询时,postgres_fdw会对每一个外部表都先执行单表查询,然后将他们与本地服务器进行连接)

  • In version 9.5 or earlier, even if the foreign tables are stored in the same remote server, postgres_fdw fetches them individually and joins them.(9.5及其更早的版本中,即使外部表也是存在相同的远程服务器中,postgres_fdw也是单独获取和连接它们)

  • In version 9.6 or later, postgres_fdw has been improved and can execute the remote join operation on the remote server when the foreign tables are on the same server and the use_remote_estimate option is on.(在9.6及其后来的版本中,当外部表在同一个服务器上且use_remote_estimate参数状态为on的时候,postgres_fdw被改进后能够在远程服务器上执行远程连接操作)

  • The execution details are described as follows.(外部表的多表查询操作细节具体如下:分为9.6版本之前和9.6版本及其以后)

    • Version 9.5 or earlier:

      • EXPLAIN SELECT * FROM tbl_a AS a,tbl_b AS b WHERE a.id=b.id AND a.id<200;
                                          QUERY PLAN                                  
        ------------------------------------------------------------------------------
         Merge Join  (cost=532.31..700.34 rows=10918 width=16)
           Merge Cond: (a.id = b.id)
           ->  Sort  (cost=200.59..202.72 rows=853 width=8)
                 Sort Key: a.id
                 ->  Foreign Scan on tbl_a a  (cost=100.00..159.06 rows=853 width=8)
           ->  Sort  (cost=331.72..338.12 rows=2560 width=8)
                 Sort Key: b.id
                 ->  Foreign Scan on tbl_b b  (cost=100.00..186.80 rows=2560 width=8)
        (8 rows)
        
      • the executor selects the merge join.(executor选择使用了merge连接,两个都是外部表)

      • How the executor fetch the rows??——

        • ①Start the remote transaction.(启动远程事务)

        • ②Declare the cursor c 1 c_1 c1, the SELECT statement of which is shown below: SELECT id,data FROM public.tbl_a WHERE id<200;.(声明一个游标 c 1 c_1 c1 ,也就是一句SELECT语句)

        • ③Execute FETCH commands to obtain the result of the cursor 1.(执行FETCH指令获得游标 c 1 c_1 c1 的执行结果)

        • ④Declare the cursor c 2 c_2 c2, whose SELECT statement is shown below: SELECT id,data FROM public.tbl_b;.(声明一个游标 c 2 c_2 c2​,也就是另一句SELECT语句)

        • Note that the WHERE clause of the original double-table query is “tbl_a.id = tbl_b.id AND tbl_a.id < 200”; therefore, a WHERE clause “tbl_b.id < 200” can be logically added to the SELECT statement as shown previously. However, postgres_fdw cannot perform this inference; therefore, the executor has to execute the SELECT statement that does not contain any WHERE clauses and has to fetch all rows of the foreign table tbl_b.(按照最原始的SELECT语句中的语义,其实WHERE tbl_b<200可以被逻辑添加到这里游标 c 2 c_2 c2SELECT语句中来,但是postgres_fdw不能执行这个判断,因此executor声明的游标 c 2 c_2 c2 中没有包含任何WHERE限制子句,而是会将外部表tbl_b中的所有行全部读取一遍)

        • This process is inefficient because unnecessary rows must be read from the remote server via the network. Furthermore, the received rows must be sorted to execute the merge join.(这个过程很低效,因为很多没必要的行也必须通过网络从远程服务器被读取过来,再者,这些多余的行也会被executor执行合并连接的时候被排序)

        • ⑤Execute FETCH commands to obtain the result of the cursor 2.(执行取指令获得以上游标的执行结果)

        • ⑥Close the cursor c 1 c_1 c1.(关闭游标 c 1 c_1 c1

        • ⑦Close the cursor c 2 c_2 c2.(关闭游标 c 2 c_2 c2

        • ⑧Commit the transaction.(提交事务)

      • Summarize: After receiving the rows, the executor sorts both received rows of tbl_a and tbl_b, and then executes a merge join operation with the sorted rows.(接收到外部表的行之后,executor将对接收到的两个表都进行排序,然后执行一次merge连接,将排序完成的两表进行连接)

    • Version 9.6 or later:

      • If the use_remote_estimate option is on (the default is off), postgres_fdw sends several EXPLAIN commands to obtain the costs of all plans related to the foreign tables.(如果选项use_remote_estimate状态是onpostgres_fdw将会发送几个EXPLAIN命令以获取外部表相关的所有计划的成本信息)

        To send the EXPLAIN commands, postgres_fdw sends both the EXPLAIN command of each single-table query and the EXPLAIN commands of the SELECT statements to execute remote join operations.(发送EXPLAIN命令时,postgres_fdw会把单表查询的EXPLAIN和执行远程连接操作的SELECT语句的EXPLANE都一并发送) In this example, the following seven EXPLAIN commands are sent to the remote server to obtain the estimated costs of each SELECT statement; the planner then selects the cheapest plan.(在本例中下列七个EXPLAIN命令都被发送给了远程服务器以获取每一个SELECT的估算成本)

      • (1) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200))
        (2) EXPLAIN SELECT id, data FROM public.tbl_b
        (3) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
        (4) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((((SELECT null::integer)::integer) = id)) AND ((id < 200))
        (5) EXPLAIN SELECT id, data FROM public.tbl_b ORDER BY id ASC NULLS LAST
        (6) EXPLAIN SELECT id, data FROM public.tbl_b WHERE ((((SELECT null::integer)::integer) = id))
        (7) EXPLAIN SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200))))
        
        
        localdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND a.id < 200;
                                QUERY PLAN                         
        -----------------------------------------------------------
         Foreign Scan  (cost=134.35..244.45 rows=80 width=16)
           Relations: (public.tbl_a a) INNER JOIN (public.tbl_b b)
        (2 rows)
        
      • The result shows that the planner selects the inner join query that is processed on the remote server, which is very efficient.(执行结果显示计划器选择了在远程服务器上处理内部链接循环,非常有效)

      • How the postgrs_fdw performed??——

        • ①Start the remote transaction.(启动远程事务)

        • ②Execute the EXPLAIN commands for estimating the cost of each plan path.(执行EXPLAIN命令去预测每一条计划路径的成本)

        • In this example, seven EXPLAIN commands are executed. Then, the planner selects the cheapest cost of the SELECT queries using the results of the executed EXPLAIN commands.(planner计划器会选择最划算的一句SELECT语句最后执行)

        • ③Declare the cursor c 1 c_1 c1, whose SELECT statement is shown below: SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200)))). (声明游标 c 1 c_1 c1

        • ④Receive the result from the remote server.(接受远程服务器执行游标的结果)

        • ⑤Close the cursor c 1 c_1 c1.(结束游标 c 1 c_1 c1

        • ⑥Commit the transaction.(提交事务)

      • Note that if the use_remote_estimate option is off (by default), a remote-join query is rarely selected because the costs are estimated using a very large embedded value.()

4.1.2.2 Sort Operations
  • In version 9.5 or earlier, the sort operation, such as ORDER BY, is processed on the local server, i.e. the local server fetches all the target rows from the remote server prior to the sort operation. Let us explore how a simple query that includes an ORDER BY clause is processed using the EXPLAIN command.(在9.5以及更早的版本中,排序ORDER BY是运行在本地服务器上的,本地服务器执行排序操作之前会从远程服务器取所有目标外部表的行,下面将用EXPLAIN命令介绍包含ORDER BY的简单查询语句是怎样运行的)
localdb=# EXPLAIN SELECT * FROM tbl_a AS a WHERE a.id < 200 ORDER BY a.id;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Sort  (cost=200.59..202.72 rows=853 width=8)
   Sort Key: id
   ->  Foreign Scan on tbl_a a  (cost=100.00..159.06 rows=853 width=8)
(3 rows)
  • Line 6: The executor sends the following query to the remote server, and then fetches the query result.(executor发送以下查询语句给远程服务器,然后取回查询结果)

  • SELECT id, data FROM public.tbl_a WHERE ((id < 200))
    
  • Line 4: The executor sorts the fetched rows of tbl_a on the local server.(executor对外部表tbl_a取回的行在本地服务器上进行排序)

  • In version 9.6 or later, postgres_fdw can execute the SELECT statements with an ORDER BY clause on the remote server if possible.(在9.6版本及其后来的版本中,如果可以的话postgres_fdw直接在远程服务器上执行带ORDER BY子句的查询)

localdb=# EXPLAIN SELECT * FROM tbl_a AS a WHERE a.id < 200 ORDER BY a.id;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Foreign Scan on tbl_a a  (cost=100.00..167.46 rows=853 width=8)
(1 row)
  • Line 4: The executor sends the following query that has an ORDER BY clause to the remote server and then fetches the query result, which is already sorted.(executor发送以下带有ORDER BY子句的查询给远程服务器,然后取查询结果)

  • SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
    
  • The workload of the local server has been reduced by this improvement.(这中改进可以给本地服务器减轻工作负载)

4.1.2.3 Aggregate Functions
  • In version 9.6 or earlier, similar to the sort operation mentioned in the previous subsection, the aggregate functions such as AVG() and cont() are processed on the local server as the following steps.(在9.6以及更早的版本中,类似于前面小节中提到的的排序操作,按以下步骤处理AVG()cont()之类的聚合函数,也是在本地服务器上)
localdb=# EXPLAIN SELECT AVG(data) FROM tbl_a AS a WHERE a.id < 200;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Aggregate  (cost=168.50..168.51 rows=1 width=4)
   ->  Foreign Scan on tbl_a a  (cost=100.00..166.06 rows=975 width=4)
(2 rows)
  • Line 5: The executor sends the following query to the remote server, and then fetches the query result.(executor发送以下查询语句给远程服务器,然后取查询结果)

  • SELECT id, data FROM public.tbl_a WHERE ((id < 200))
    
  • Line 4: The executor calculates the average of the fetched rows of tbl_a on the local server.(executor在本地服务器上计算上一步取回的tbl_a中的行的平均值)

  • This process is costly because sending a large number of rows consumes heavy network traffic and takes a long time.(这个过程的成本很高,因为发送大量的行会消耗大量的网络流量并花费很长时间)

  • In version 10 or later, postgres_fdw executes the SELECT statement with the aggregate function on the remote server if possible.(在10以及后面的版本,postgres_fdwSELECT查询语句直接在远程服务器上执行聚集函数)

localdb=# EXPLAIN SELECT AVG(data) FROM tbl_a AS a WHERE a.id < 200;
                     QUERY PLAN                      
-----------------------------------------------------
 Foreign Scan  (cost=102.44..149.03 rows=1 width=32)
   Relations: Aggregate on (public.tbl_a a)
(2 rows)
  • Line 4: The executor sends the following query that contains an AVG() function to the remote server, and then fetches the query result.(executor发送以下包含聚集函数AVG()的查询语句给远程服务器,然后取回查询结果)

  • SELECT avg(data) FROM public.tbl_a WHERE ((id < 200))
    
  • This process is obviously efficient because the remote server calculates the average and sends only one row as the result.(这个流程明显要更高效,因为远程服务器负责计算平均值并且只需要返回一行结果即可)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值