Database management systems are meant to house data but, occasionally, they may need to talk with another DBMS. For example, to access an external server which may be hosting a different DBMS. With heterogeneous environments becoming more and more common, a bridge between the servers is established. We call this bridge a “Foreign Data Wrapper” (FDW). PostgreSQL completed its support of SQL/MED (SQL Management of External Data) with release 9.3 in 2013. A foreign data wrapper is a shared library that is loaded by a PostgreSQL server. It enables the creation of foreign tables in PostgreSQL that act as proxies for another data source.

When you query a foreign table, Postgres passes the request to the associated foreign data wrapper. The FDW creates the connection and retrieves or updates the data in the external data store. Since PostgreSQL planner is involved in all of this process as well, it may perform certain operations like aggregate or joins on the data when retrieved from the data source. I cover some of these later in this post.

ClickHouse Database

ClickHouse is an open source column based database management system which claims to be 100–1,000x faster than traditional approaches, capable of processing of more than a billion rows in less than a second.

clickhousedb_fdw

clickhousedb_fdw is an open source project – Apache licensed – from Percona. Here’s the link for GitHub project repository:

https://github.com/Percona-Lab/clickhousedb_fdw

It is an FDW for ClickHouse that allows you to SELECT from, and INSERT INTO, a ClickHouse database from within a PostgreSQL v11 server.

The FDW supports advanced features like aggregate pushdown and joins pushdown. These significantly improve performance by utilizing the remote server’s resources for these resource intensive operations.

If you would like to follow this post and try the FDW between Postgres and ClickHouse, you can download and set up the ontime dataset for ClickHouse.  After following the instructions, the test that you have the desired data. The ClickHouse client is a client CLI for the ClickHouse Database.

Prepare Data for ClickHouse

Now the data is ready in ClickHouse, the next step is to set up PostgreSQL. We need to create a ClickHouse foreign server, user mapping, and foreign tables.

Install the clickhousedb_fdw extension

There are manual ways to install the clickhousedb_fdw, but clickhousedb_fdw uses PostgreSQL’s coolest extension install feature. By just entering a SQL command you can use the extension:

Performance Features

PostgreSQL has improved foreign data wrapper processing by added the pushdown feature. Push down improves performance significantly, as the processing of data takes place earlier in the processing chain. Push down abilities include:

  • Operator and function Pushdown
  • Predicate Pushdown
  • Aggregate Pushdown
  • Join Pushdown

Operator and function Pushdown

The function and operators send to Clickhouse instead of calculating and filtering at the PostgreSQL end.

Predicate Pushdown

Instead of filtering the data at PostgreSQL, clickhousedb_fdw send the predicate to Clikhouse Database.

Aggregate Pushdown

Aggregate push down is a new feature of PostgreSQL FDW. There are currently very few foreign data wrappers that support aggregate push down – clickhousedb_fdw is one of them. Planner decides which aggregates are pushed down and which aren’t. Here is an example for both cases.

Join Pushdown

Again, this is a new feature in PostgreSQL FDW, and our clickhousedb_fdw also supports join push down. Here’s an example of that.

Percona’s support for PostgreSQL

As part of our commitment to being unbiased champions of the open source database eco-system, Percona offers support for PostgreSQL – you can read more about that here. And as you can see, as part of our support commitment, we’re now developing our own open source PostgreSQL projects such as the clickhousedb_fdw. Subscribe to the blog to be amongst the first to know of PostgreSQL and other open source projects from Percona.

As an author of the new clickhousdb_fdw – as well as other  FDWs – I’d be really happy to hear of your use cases and your experience of using this feature.


Photo by Hidde Rensink on Unsplash

15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Andrey

it would be great if someone made a similar extension for MySQL

Steven Lee

please google Using ClickHouse like MySQL by ProxySQL

faibistes

What functions are supported in function pushdown? IIUC, it means that it translates postgresql functions and operators to clickhouse equivalents, if there is one. Am I right?

faibistes

Does it mean that it will fail if it uses a non-translatable function in a join or aggregate, or that it will do it on the postgresql side?

Magnus Falch

CREATE EXTENSION clickhousedb_fdw;
Which distros/repos will this work with currently?

Magnus Brun Falch

Could you please provide some more details?
I’ve tried Ubuntu 18.10/19.04 and Centos7. Couldn’t get it to run out of the box.
Tried setting up a dev environment but no files matching the name of the .so file in the documentation were produced by the make install step.
I’m working on a more exhaustive dependency list for Centos7 that would make getting started a lot easier for someone who hasn’t tinkered with building extensions for PostgreSQL before.

Steven Lee

Hi,
I translated it into Chinese, and want to publish on my blog, hope you can allow it.
and I have successfully compiled, installed and tested it, it is great.
and I wrote a compiling method in details, http://blog.rubypdf.com/2019/04/22/how-to-compile-and-install-clickhousedb_fdw/

xiatian

why my test cannot push down,the follwing is my case:
PostgreSQL 11
OS: Centos

postgres# explain (analyze,verbose) select count(year) from ontime;
QUERY PLAN

Aggregate (cost=0.00..0.01 rows=1 width=8) (actual time=31441.737..31441.737 rows=1 loops=1)
Output: count(year)
-> Foreign Scan on public.ontime (cost=0.00..0.00 rows=0 width=4) (actual time=13.313..30108.858 rows=17189046 loops=1)
Output: year, quarter, month, dayofmonth, dayofweek, flightdate, uniquecarrier, airlineid, carrier, tailnum, flightnum,
originairportid, originairportseqid, origincitymarketid, origin, origincityname, originstate, originstatefips, originstatename, o
riginwac, destairportid, destairportseqid, destcitymarketid, dest, destcityname, deststate, deststatefips, deststatename, destwac
, crsdeptime, deptime, depdelay, depdelayminutes, depdel15, departuredelaygroups, deptimeblk, taxiout, wheelsoff, wheelson, taxii
n, crsarrtime, arrtime, arrdelay, arrdelayminutes, arrdel15, arrivaldelaygroups, arrtimeblk, cancelled, cancellationcode, diverte
d, crselapsedtime, actualelapsedtime, airtime, flights, distance, distancegroup, carrierdelay, weatherdelay, nasdelay, securityde
lay, lateaircraftdelay, firstdeptime, totaladdgtime, longestaddgtime, divairportlandings, divreacheddest, divactualelapsedtime, d
ivarrdelay, divdistance, div1airport, div1airportid, div1airportseqid, div1wheelson, div1totalgtime, div1longestgtime, div1wheels
off, div1tailnum, div2airport, div2airportid, div2airportseqid, div2wheelson, div2totalgtime, div2longestgtime, div2wheelsoff, di
v2tailnum, div3airport, div3airportid, div3airportseqid, div3wheelson, div3totalgtime, div3longestgtime, div3wheelsoff, div3tailn
um, div4airport, div4airportid, div4airportseqid, div4wheelson, div4totalgtime, div4longestgtime, div4wheelsoff, div4tailnum, div
5airport, div5airportid, div5airportseqid, div5wheelson, div5totalgtime, div5longestgtime, div5wheelsoff, div5tailnum
Remote SQL: SELECT year FROM “default”.ontime
Planning Time: 0.111 ms
Execution Time: 31441.813 ms
(7 rows)

Time: 31442.579 ms (00:31.443)

Ibrar Ahmed

@xiatian, Please create a issue on the Github, I will respond on that.

Konstantin

is it possible to see datatype array(string) via clickhouse_fdw?

nishitshah

Does it support Postgresql 12 and 13? If not, then expected time line to make it work with Postgresql 12 and 13.