Pipelining versus Parallel Query ExecutionIn this blog post, we’ll look at pipelining versus parallel query execution when using X Plugin for MySQL 5.7.

In my previous blog post, I showed how to use X Plugin for MySQL 5.7 for parallel query execution. The tricks I used to make it work:

  • Partitioning by hash
  • Open N connections to MySQL, where N = number of CPU cores

I had to do it manually (as well as to sort the result at the end) as X Plugin only supports “pipelining” (which only saves the round trip time) and does not “multiplex” connections to MySQL (MySQL does not use multiple CPU cores for a single query).

TL:DR; version

In this (long) post I’m playing with MySQL 5.7 X Plugin / X Protocol and document store. Here is the summary:

  1. X Plugin does not “multiplex” connections/sessions to MySQL. Similar to the original protocol, one connection to X Plugin will result in one session open to MySQL
  2. An X Plugin query (if the library supports it) returns immediately and does not wait until the query is finished (async call). MySQL works like a queue.
  3. X Plugin does not have any additional server-level durability settings. Unless you check or wait for the acknowledgement (which is asynchronous) from the server, the data might or might not be written into MySQL (“fire and forget”).

At the same time, X Protocol can be helpful if:

  • We want to implement an asynchronous client (i.e., we do not want to block the network communication such as downloading or API calls) when the MySQL table is locked.
  • We want to use MySQL as a queue and save the round-trip time.
Benchmark results: “pipelining” versus “parallelizing” versus a single query

I’ve done a couple of tests comparing the results between “pipelining” versus “parallelizing” versus a single query. Here are the results:

      1. Parallel queries with NodeJS:
      2. Pipeline with NojeJS:

        In the pipeline with NojeJS, I’m reusing the same connection (and do not open a new one for each thread).
      3. Direct query – partitioned table:
      4. Direct query – non-partitioned table.
Advantages of pipelines with X Plugin 

Although pipelining with X Plugin does not significantly increase query response time (it can reduce the total latency), it might be helpful in some cases. For example, let’s say we are downloading something from the Internet and need to save the progress of the download as well as the metadata for the document. In this example, I use youtube-dl to search and download the metadata about YouTube videos, then save the metadata JSON into MySQL 5.7 Document Store. Here is the code:

In the above example, I execute the youtube-dl binary (you need to have it installed first) to search for “MySQL 5.7” videos. Instead of downloading the videos, I only grab the video’s metadata in JSON format  (“-j” flag). Because it is JSON, I can save it into MySQL document store. The table has the following structure:

Here is the execution example:

Now, here is what make this example interesting: as NodeJS + X Plugin = Asynchronous + Pipelining, the program execution will not stop if the table is locked. I’ve opened two sessions:

  • session 1: $ node yt.js > test_lock_table.log
  • session 2:

Results:

As we can see, the first two writes were immediate. Then I’ve locked the table, and no MySQL queries went through. At the same time the download process (which is the slowest part here) proceeded and was not blocked (we can see the titles above, which are not followed by lines “… => wrote to MySQL:”). When the table was unlocked, a pile of waiting queries succeeded.

This can be very helpful when running a “download” process, and the network is a bottleneck. In a traditional synchronous query execution, when we lock a table the application gets blocked (including the network communication). With NodeJS and X Plugin, the download part will proceed with MySQL acting as a queue.

Pipeline Durability

How “durable” this pipeline, you might ask. In other words, what will happen if I will kill the connection? To test it out, I have (once again) locked the table (but now before starting the nodejs), killed the connection and finally unlocked the table. Here are the results:

Please note: in the above, there isn’t a single acknowledgement from the MySQL server. When code receives a response from MySQL it prints “Wrote to MySQL: {“_state”:{“rows_affected”:1,”doc_ids”:[“…”]}}“. Also, note that when the connection was killed the MySQL process is still there, waiting on the table lock.

What is interesting here is is that only two rows have been inserted into the document store. Is there a “history length” here or some other buffer that we can increase? I’ve asked Jan Kneschke, one of the authors of the X Protocol, and the answers were:

  • Q: Is there any history length or any buffer and can we tune it?
    • A: There is no “history” or “buffer” at all, it is all at the connector level.
  • Q: Then why is 2 rows were finally inserted?
    • To answer this question I’ve collected tcpdump to port 33060 (X Protocol), see below

This is very important information! Keep in mind that the asynchronous pipeline has no durability settings: if the application fails and there are some pending writes, those writes can be lost (or could be written).

To fully understand how the protocol works, I’ve captured tcpdump (Jan Kneschke helped me to analyze it):

(see update below for the tcpdump visualization)

This is what is happening:

  • When I hit CTRL+C, nodejs closes the connection. As the table is still locked, MySQL can’t write to it and will not send the result of the insert back.
  • When the table is unlocked, it starts the first statement despite the fact that the connection has been closed. It then acknowledges the first insert and starts the second one.
  • However, at this point the script (client) has already closed the connection and the final packet (write done, here is the id) gets denied. The X Plugin then finds out that the client closed the connection and stops executing the pipeline.

Actually, this is very similar to how the original MySQL protocol worked. If we kill the script/application, it doesn’t automatically kill the MySQL connection (unless you hit CTRL+C in the MySQL client, sends the kill signal) and the connection waits for the table to get unlocked. When the table is unlocked, it inserts the first statement from a file.

Enforcing unique checks

If I restart my script, it finds the same videos again. We will probably need to enforce the consistency of our data. By default the plugin generates the unique key (_id) for the document, so it prevents inserting the duplicates.

Another way to enforce the unique checks is to create a unique key for youtube id. Here is the updated table structure:

I’ve changed the default “_id” column to the YouTube’s unique ID. Now when I restart the script it shows:

…as this document has already been loaded.

Conclusion

Although X Plugin pipelining does not necessarily significantly increase query response (it might save the roundtrip time) it can be helpful for some applications.We might not want to block the network communication (i.e., downloading or API calls) when the MySQL table is locked, for example. At the same time, unless you check/wait for the acknowledgement from the server, the data might or might not be written into MySQL.

Bonus: data analysis

Now we can see what we have downloaded. There are a number of interesting fields in the result:

We can see the most popular videos. To do that I’ve added one more virtual field on view_count, and created an index on it:

We can run the queries like:

Or if we want to find out the most popular resolutions:

Special thanks to Jan Kneschke and Morgan Tocker from Oracle for helping with the X Protocol internals.

Update: Jan Kneschke also generated the visualization for the tcpdump I’ve collected (when connection was killed):

node-js-pipelining-early-close