Note: PostgreSQL 14 has been released. Please visit PostgreSQL 14 – Performance, Security, Usability, and Observability for additional information.
It’s exciting times in the PostgreSQL world with the version 14 beta released a few days ago. It’s now time to look under the hood and find out what has changed from version 13, see what has improved, and what behaviors are changed that we should be aware of. Putting it all in a single blog would take weeks to write and days to read, so this one will focus solely on the changes expected in the Foreign Data Wrapper in the GA release of PostgreSQL version 14.
Foreign Data Wrappers (FDWs) provide a mechanism by which regular SQL queries can be used to access data that resides outside PostgreSQL. There are many different FDWs available, however, and PostgreSQL comes with a “File FDW” and a “PostgreSQL FDW”. PostgreSQL FDW may seem counterintuitive, but, it is an extremely useful feature. And there have been some very useful updates to this FDW.
So, let’s start understanding what has changed.
Performance Feature
If you are already using PostgreSQL FDW for any use case, take note of the performance improvements.
1 – Parallel /Async Foreign Scans
(Allow a query referencing multiple foreign tables to perform foreign table scans in parallel)
Remote aggregations and remote joins might have been a performance nightmare when performed across multiple servers. The performance benefit comes from the parallelization of ForeignScan which can now be executed in parallel asynchronously. The sequential execution previously was very slow, and in some cases, too slow. For this, a new server option is added “async_capable” which allows for parallel planning and execution of ForeignScan.
Create Servers and User-Mappings
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | -- Create foreign server 1. CREATE SERVER postgres_svr1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', async_capable "true"); -- Create foreign server 2. CREATE SERVER postgres_svr2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', async_capable "true"); CREATE USER MAPPING FOR vagrant SERVER postgres_svr1 OPTIONS (user 'postgres', password 'pass'); CREATE USER MAPPING FOR vagrant SERVER postgres_svr2 OPTIONS (user 'postgres', password 'pass'); |
Create Local Tables
1 2 3 4 5 | CREATE TABLE parent_local (a INTEGER, b CHAR, c TEXT, d VARCHAR(255)) PARTITION BY RANGE (a); CREATE TABLE child_local1 (a INTEGER, b CHAR, c TEXT, d VARCHAR(255)); CREATE TABLE child_local2 (a int, b CHAR, c text, d VARCHAR(255)); GRANT ALL ON child_local1 to postgres; GRANT ALL ON child_local2 to postgres; |
Create Foreign TABLES
1 2 3 4 5 6 7 8 | CREATE FOREIGN TABLE parent_remote1 PARTITION OF parent_local VALUES FROM 1000 TO 2000 SERVER postgres_svr1 OPTIONS table_name 'child_local1'); CREATE FOREIGN TABLE parent_remote2 PARTITION OF parent_local FOR VALUES FROM 2000 TO 3000 SERVER postgres_svr2 OPTIONS table_name 'child_local2'); |
Now try that, and see the plan tree, now you can see two Async Foreign plans in the tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | CREATE TABLE sample_table (a INTEGER, b CHAR, c TEXT, d VARCHAR(255)); EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO sample_table SELECT * FROM parent_local WHERE a % 100 = 0; QUERY PLAN ---------------------------------------------------------------------------------------------- Insert on public.sample_table -> Append -> Async Foreign Scan on public.parent_remote1 parent_local_1 Output: parent_local_1.a, parent_local_1.b, parent_local_1.c, parent_local_1.d Remote SQL: SELECT a, b, c, d FROM public.child_local1 WHERE (((a % 100) = 0)) -> Async Foreign Scan on public.parent_remote2 parent_local_2 Output: parent_local_2.a, parent_local_2.b, parent_local_2.c, parent_local_2.d Remote SQL: SELECT a, b, c, d FROM public.child_local2 WHERE (((a % 100) = 0)) (8 rows) |
2 – Bulk Insert
(Allow postgres_fdw to INSERT rows in bulk.)
Now bulk insert functionality has been added to Foreign Data Wrapper, and postgres_fdw is now supporting that function. There is an opportunity for other foreign data wrappers to implement bulk insert. A complete blog can be seen here.
Functional Features
1 – TRUNCATE Command
Allow TRUNCATE to operate on foreign tables.
Foreign Data Wrapper is enhanced to support TRUNCATE command on the foreign table as the target. That means it issues a TRUNCATE command shipped to the foreign server and executes on the table. Fortunately, this functionality is implemented in postgres_fdw. Here is an example of that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | CREATE SERVER postgres_svr FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1'); CREATE USER MAPPING FOR vagrant SERVER postgres_svr OPTIONS (user 'postgres', password 'pass'); CREATE FOREIGN TABLE foo_remote (a INTEGER, b CHAR, c TEXT, d VARCHAR(255)) SERVER postgres_svr OPTIONS(table_name 'foo_local'); |
Now a foreign table can be truncated using the TRUNCATE command.
1 2 | postgres=# TRUNCATE foo_remote; TRUNCATE TABLE |
2 – LIMIT TO Child Partitioning
(Allow postgres_fdw to import table partitions if specified by IMPORT FOREIGN SCHEMA … LIMIT TO.)
The postgres_fdw does not allow the import of table partitions, because data can be accessed using the root partition. But in case the user wants to import the partitioned table partitioning, PostgreSQL 14 added a new option call “LIMIT TO”. Create a new schema on a remote machine and add a parent table “foo_schema.foo_table_parent” and one child table “foo_schema.foo_table_child”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | postgres=# \d+ foo_schema.* Table "foo_schema.foo_table_child" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- a | integer | | | | plain | | | Partition of: foo_schema.foo_table_parent FOR VALUES FROM (0) TO (10) Partition constraint: ((a IS NOT NULL) AND (a >= 0) AND (a < 10)) Access method: heap Partitioned table "foo_schema.foo_table_parent" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- a | integer | | | | plain | | | Partition key: RANGE (a) Partitions: foo_schema.foo_table_child FOR VALUES FROM (0) TO (10) |
Import schema without specifying the LIMIT TO, you can see only the parent table is imported.
1 2 3 4 5 6 7 8 | IMPORT FOREIGN SCHEMA foo_schema FROM SERVER postgres_svr INTO bar_schema; postgres=# \d+ bar_schema.* Foreign table "bar_schema.foo_table_parent" Column | Type | Collation | Nullable | Default | FDW options | Storage | Stats target | Description --------+---------+-----------+----------+---------+-------------------+---------+--------------+------------- a | integer | | | | (column_name 'a') | plain | | Server: postgres_svr FDW options: (schema_name 'foo_schema', table_name 'foo_table_parent') |
If you explicitly specified partitioned table into LIMIT TO clause then it will import that table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | postgres=# IMPORT FOREIGN SCHEMA foo_schema LIMIT TO (foo_table_parent, foo_table_child) FROM SERVER loopback INTO bar_schema; IMPORT FOREIGN SCHEMA postgres=# \d+ bar_schema.* Foreign table "bar_schema.foo_table_child" Column | Type | Collation | Nullable | Default | FDW options | Storage | Stats target | Description --------+---------+-----------+----------+---------+-------------------+---------+--------------+------------- a | integer | | | | (column_name 'a') | plain | | Server: loopback FDW options: (schema_name 'foo_schema', table_name 'foo_table_child') Foreign table "bar_schema.foo_table_parent" Column | Type | Collation | Nullable | Default | FDW options | Storage | Stats target | Description --------+---------+-----------+----------+---------+-------------------+---------+--------------+------------- a | integer | | | | (column_name 'a') | plain | | Server: loopback FDW options: (schema_name 'foo_schema', table_name 'foo_table_parent') |
3 – Active and Valid Connection List
(Add postgres_fdw function postgres_fdw_get_connections to report open foreign server connections)
A new function postgres_fdw_get_connections() is added. The function returns the open connection names local session to the foreign servers of postgres_fdw. It also outputs the validity of the connection.
1 2 3 4 5 6 | postgres=# SELECT * FROM postgres_fdw_get_connections() ORDER BY 1; server_name | valid -------------------+------- postgres_svr | t postgres_svr_bulk | t (2 rows) |
Now disconnect all connections connection and try the query again.
1 2 3 4 5 6 7 8 9 10 | postgres=# SELECT 1 FROM postgres_fdw_disconnect_all(); ?column? ---------- 1 (1 row) postgres=# SELECT * FROM postgres_fdw_get_connections() ORDER BY 1; server_name | valid -------------+------- (0 rows) |
4 – Keep Connections
(Add postgres_fdw functions to discard cached connections)
A new option keep_connections is added to keep the connections alive so that subsequent queries can reuse them. By default, this option is on, but when turned off, the connections will be discarded as the transactions end.
Set the option off
1 | ALTER SERVER loopback OPTIONS (keep_connections 'off'); |
Establish the connection using the remote query.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | postgres=# BEGIN; BEGIN postgres=*# select * from foo_remote; a | b | c | d ---+---+---+--- (0 rows) postgres=*# END; COMMIT postgres=# SELECT * FROM postgres_fdw_get_connections() ORDER BY 1; server_name | valid -------------+------- (0 rows) |
Set the keep_connections option on
1 | ALTER SERVER postgres_svr options (set keep_connections 'on'); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | postgres=# BEGIN; BEGIN postgres=*# select * from foo_remote; a | b | c | d ---+---+---+--- (0 rows) -- Establish the connection using the remote query. postgres=*# END; COMMIT postgres=# SELECT * FROM postgres_fdw_get_connections() ORDER BY 1; server_name | valid --------------+------- postgres_svr | t (1 row) |
5 – Reestablish Brocken Connection
(Allow postgres_fdw to reestablish foreign server connections if necessary)
Previously when the remote server restarted and the postgres_fdw connection was broken, then the error was thrown because the cached connection is no longer available. This is fixed in PostgreSQL, and, in any case, the connection is broken and no longer exists in the cache, and postgres_fdw will establish the connection.
Conclusion
It is quite promising that the Foreign Data Wrapper API is expanding over every release, but PostgreSQL 14 provides some user-centric new features. The performance-related improvements give another reason to use FDWs for many relevant use cases. The functionality will surely be added in the next few versions, making these more performant and easy to use.
I have 1,000 stores. Each has it’s own Postgres server and database. The tables are the same at each store. Can I use postgres_fdw to create a single db that has access to the cumulative data of all stores? Ideally I want each table to have an extra field added that will identify which store provided the data.
That’s amazing. I am surprised by the small number of companies using FDW today, as it is such an advanced and powerful tool. One question though, let’s say I have 35 microservices, each one running a remote database and one of these databases changes its schema. Is there a way of syncing this to the MASTER PostgreSQL so that it can refresh the schema on its side? Thanks!!