Comments on: How PostgreSQL Pipeline Mode Works

By: Charly Batista

Charly Batista — Fri, 04 Mar 2022 04:25:53 +0000

In reply to Jeremy. > For some reason I was thinking that only the simple query protocol supported multiple statements This is partially accurate. The "Parse" message only supports one SQL statement but the extended protocol supports multiple parse messages and this is the "trick" used by libpq. Every SQL statement is enclosed in its own "Parse, Bind, portal Describe, Execute, portal Close" but without the "Sync". Libpq will concatenate all the SQL statements sent via "PQsendQuery" and will add the "Sync" message in the end of the string as the last message/command. It's quite a smart trick I would say.

By: Charly Batista

Charly Batista — Fri, 04 Mar 2022 04:07:46 +0000

In reply to D. Vérité. Not sure I understood what you mean here when you say: "But pipelining is a different way of sending multiple statements efficiently, without stringing them together in the same query. Each statement is distinct but they're grouped as a series at the network level". Pipeline does exactly the oposite if we don't call "PQpipelineSync" at every query execution. What it does is to use the PGconn->outBuffer to concatenate all the queries we send through "PQsendQuery" and then when we call "PQpipelineSync" it will invoke "PQflush" which will send all the concatenated queries inside "PGconn->outBuffer". Also, not sure what exactly the "grouped as a series at the network level" means. Do you mean packets? I can't really see how that would be possible as the libpq isn't only the origin of the communication but operates in a higher level protocol-wise than the packets, and if we want to do something at the network level we need to intercept and work with the packets at the transmission protocol level. But again, I don't think I understood what you said and I'm just speculating here.

By: Charly Batista

Charly Batista — Fri, 04 Mar 2022 03:24:09 +0000

That’s not totally accurate. The client protocol always send the SYNC (“S”) message. What happens is that the “PQpipelineSync” command from libpq dispatches the queries in memory. For example, if I have the below code it will send the whole message with every single time:

char fmt[40] = "INSERT INTO t2 VALUES(%d)"; char sql[40]; for (int i = 0; i < 5; i++) { sprintf(sql, fmt, i); printf("Query: %s\n", sql); fflush(stdout);


    if (!PQsendQuery(conn, sql))

    {

        printf("Failed to send statement \"%s\" to database", sql);

        continue;

    }

    if (!PQpipelineSync(conn))

    {

        printf("Failed while running PQpipelineSync");

        continue;

    }

}

If I remove the "PQpipelineSync" from the loop and put it outside then it will behave like you described but with one detail, the libpq will send all the queries in one single call with a big SQL, something like:

INSERT INTO t2 VALUES(0);INSERT INTO t2 VALUES(1);INSERT INTO t2 VALUES(2);INSERT INTO t2 VALUES(3);INSERT INTO t2 VALUES(4 );

Which should be something like below at the protocol level:

Well it shouldn’t be a problem and as we are sending multiple statements at the same time we are saving network round trip so always preferable right? Not really. When we aggregate those statements we are sending them all in a single transaction, and if for example the ID=3 is duplicated they will all fail causing a full rollback. This is not a problem in itself, but it can be very harmful to the application if the application was expecting them to work as individual isolated transaction. The former option, sending them individually, will not have this issue and only the failed one will be rolled back. Note that it’s not the case of one be better than another, they are just different and have different implications, and it’s very important for the developers/designers take them into consideration while using either option.

In time, the ability to send queries asynchronously has been around for a long time and one can easily use them and completely bypass pipeline mode with similar results.

By: Matteo

Matteo — Wed, 16 Feb 2022 13:18:54 +0000

In reply to Matteo. Sorry, I meant to respond to your answer of my previous comment

By: Matteo

Matteo — Wed, 16 Feb 2022 13:17:25 +0000

In reply to D. Vérité. Got it. So it will only work with the extended query protocol, right? Thanks

By: D. Vérité

D. Vérité — Tue, 15 Feb 2022 15:57:34 +0000

In reply to Jeremy. Pipelining works only with the extended query protocol. You're right that only the simple query protocol allows multiple statements in the same query (separated by semi-colons). But pipelining is a different way of sending multiple statements efficiently, without stringing them together in the same query. Each statement is distinct but they're grouped as a series at the network level.

By: D. Vérité

D. Vérité — Tue, 15 Feb 2022 15:51:00 +0000

In reply to Matteo. Not really. Switching to pipeline mode is mostly about the client *not* sending SYNC messages (at the protocol level) after each SQL statement. See https://www.postgresql.org/docs/current/protocol-flow.html

By: D. Vérité

D. Vérité — Tue, 15 Feb 2022 15:47:19 +0000

Please note that the pipeline diagram showing and saying that “SQL statements are returned out of order” is incorrect. The doc says: “The server executes statements, and returns results, in the order the client sends them”.

By: Matteo

Matteo — Tue, 15 Feb 2022 09:08:56 +0000

Switching the connection to PIPELINE mode is something that the backend is aware of (For example sending an additional command) or completely handled in the client side? I hope my question is clear

By: Jeremy

Jeremy — Mon, 14 Feb 2022 18:14:16 +0000

Does this work with the extended query protocol or only with the simple query protocol? For some reason I was thinking that only the simple query protocol supported multiple statements, which – if accurate – would mean there are a bunch of additional caveats (eg. it would be incompatible with bind var usage of that requires extended protocol).

https://www.postgresql.org/docs/current/protocol.html