PostgreSQL for MySQL DBAs ExplainThe differences between MySQL and PostgreSQL are often trivial but occasionally there are stark differences.  A MySQL DBA wanting to optimize a query on a PostgreSQL server will hopefully have some experience with using EXPLAIN.  For the uninitiated, the keyword EXPLAIN is pre-pended to a query to reveal what the server wants to do to return the data requested in that query.  The implementations of the two versions of EXPLAIN are very different.  Episode six of the PostgreSQL for MySQL DBA series covers EXPLAIN.

So what is different? PostgreSQL adds XML and YAML output formats options past the traditional and JSON found in MySQL.  The PostgreSQL output looks, at least to me, like the TREE output of MySQL. But since the two databases are ‘mechanically’ different you need to learn how to interpret the output PostgreSQL provides.

The following example provides details such as the mechanism the server will use to get the data, the start-up cost, the overall cost, the number of rows to be returned, and the name of the key (if any) used. Refer to the video for details.

Please refer to video to see the differences and a quick introduction to PostgreSQL’s EXPLAIN.

Quiz Answer

I added a ‘bonus quiz question’ to the presentation and video.

Postgresql for mysql

The bonus quiz question from the video

And the first person to respond was Jack T:

You said earlier that a ‘Seq Scan’ is a full-table scan and it is taking 15.54ms to execute just that scan. In theory, if you add an index on postal_code, then that changes to an ‘Index Scan’ and the execution time should decrease. In MySQL, this sub-query pattern is recognized as a ‘semi-join’ and is executed as a JOIN. Does PGSQL have similar optimizations for rewriting?

This is where experience with one database helps you master another.  Generally adding an index will speed up a query.  But one of the big things to learn about PostgreSQL to remember is that has different ways of doing things.

Let’s rerun the EXPLAIN and thankfully the numbers match from my test machine.

Then we can create an index in the postal_code column.

So, we rerun explain and peek at the results.

The results are interesting. Note the costs for the index scan on idx_fx_address_id stay the same as the new index does not work on the customer table. But the new index does bring down the scan on the address from 15.54 to 9.32. And the nested look cost drop from 32.14 to 25.92. The optimization is a bitmap scan.

From the PostgreSQL manual – Here the planner has decided to use a two-step plan: the bottom plan node visits an index to find the locations of rows matching the index condition, and then the upper plan node actually fetches those rows from the table itself. Fetching the rows separately is much more expensive than sequentially reading them, but because not all the pages of the table have to be visited, this is still cheaper than a sequential scan. (The reason for using two levels of plan is that the upper plan node sorts the row locations identified by the index into physical order before reading them, so as to minimize the costs of the separate fetches. The “bitmap” mentioned in the node names is the mechanism that does the sorting.)

So the index does speed the query up but with a much different optimization than what MySQL would use.

Next episode — Vacuuming Tables

Stay tuned!

The past videos for PostgreSQL for MySQL Database Administrators (DBA) can be found here: episode oneepisode two, episode three, episode four, and episode five.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments