PostgreSQL for MySQL DBAs Episode 6 - Explaining EXPLAIN (And an Answer to a Bonus Question)

PostgreSQL for MySQL DBAs Explain The differences between MySQL and PostgreSQL are often trivial but occasionally there are stark differences. A MySQL DBA wanting to optimize a query on a PostgreSQL server will hopefully have some experience with using EXPLAIN. For the uninitiated, the keyword EXPLAIN is pre-pended to a query to reveal what the server wants to do to return the data requested in that query. The implementations of the two versions of EXPLAIN are very different. Episode six of the PostgreSQL for MySQL DBA series covers EXPLAIN.

So what is different? PostgreSQL adds XML and YAML output formats options past the traditional and JSON found in MySQL. The PostgreSQL output looks, at least to me, like the TREE output of MySQL. But since the two databases are ‘mechanically’ different you need to learn how to interpret the output PostgreSQL provides.

The following example provides details such as the mechanism the server will use to get the data, the start-up cost, the overall cost, the number of rows to be returned, and the name of the key (if any) used. Refer to the video for details.

test=# EXPLAIN SELECT 1 FROM t1 WHERE ID=101;
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using t1_pkey on t1 (cost=0.29..4.31 rows=1 width=4)
Index Cond: (id = 101)
(2 rows)

test=# EXPLAIN SELECT 1 FROM t1 WHERE ID=101;

QUERY PLAN

-----------------------------------------------------------------------

Index Only Scan using t1_pkey on t1 (cost=0.29..4.31 rows=1 width=4)

Index Cond: (id = 101)

(2 rows)

Please refer to video to see the differences and a quick introduction to PostgreSQL’s EXPLAIN.

Quiz Answer

I added a ‘bonus quiz question’ to the presentation and video.

The bonus quiz question from the video

And the first person to respond was Jack T:

You said earlier that a ‘Seq Scan’ is a full-table scan and it is taking 15.54ms to execute just that scan. In theory, if you add an index on postal_code, then that changes to an ‘Index Scan’ and the execution time should decrease. In MySQL, this sub-query pattern is recognized as a ‘semi-join’ and is executed as a JOIN. Does PGSQL have similar optimizations for rewriting?

This is where experience with one database helps you master another. Generally adding an index will speed up a query. But one of the big things to learn about PostgreSQL to remember is that has different ways of doing things.

Let’s rerun the EXPLAIN and thankfully the numbers match from my test machine.

dvdrental=# EXPLAIN SELECT * FROM customer WHERE address_id IN (SELECT address_id FROM address WHERE postal_code = '52137');
QUERY PLAN
-----------------------------------------------------------------------------------------
Nested Loop (cost=0.28..32.14 rows=2 width=70)
-> Seq Scan on address (cost=0.00..15.54 rows=2 width=4)
Filter: ((postal_code)::text = '52137'::text)
-> Index Scan using idx_fk_address_id on customer (cost=0.28..8.29 rows=1 width=70)
Index Cond: (address_id = address.address_id)

<span style="font-size: 10px;">(5 rows)</span>

dvdrental=# EXPLAIN SELECT * FROM customer WHERE address_id IN (SELECT address_id FROM address WHERE postal_code = '52137');

QUERY PLAN

-----------------------------------------------------------------------------------------

Nested Loop (cost=0.28..32.14 rows=2 width=70)

-> Seq Scan on address (cost=0.00..15.54 rows=2 width=4)

Filter: ((postal_code)::text = '52137'::text)

-> Index Scan using idx_fk_address_id on customer (cost=0.28..8.29 rows=1 width=70)

Index Cond: (address_id = address.address_id)

Then we can create an index in the postal_code column.

dvdrental=# CREATE INDEX quiz_answer_1 ON address (postal_code);
CREATE INDEX

1 2	dvdrental=# CREATE INDEX quiz_answer_1 ON address (postal_code); CREATE INDEX

So, we rerun explain and peek at the results.

dvdrental=# EXPLAIN SELECT * FROM customer WHERE address_id IN (SELECT address_id FROM address WHERE postal_code = '52137');
QUERY PLAN
-----------------------------------------------------------------------------------------
Nested Loop (cost=4.57..25.92 rows=2 width=70)
-> Bitmap Heap Scan on address (cost=4.29..9.32 rows=2 width=4)
Recheck Cond: ((postal_code)::text = '52137'::text)
-> Bitmap Index Scan on quiz_answer_1 (cost=0.00..4.29 rows=2 width=0)
Index Cond: ((postal_code)::text = '52137'::text)
-> Index Scan using idx_fk_address_id on customer (cost=0.28..8.29 rows=1 width=70)
Index Cond: (address_id = address.address_id)

(7 rows)

dvdrental=# EXPLAIN SELECT * FROM customer WHERE address_id IN (SELECT address_id FROM address WHERE postal_code = '52137');

QUERY PLAN

-----------------------------------------------------------------------------------------

Nested Loop (cost=4.57..25.92 rows=2 width=70)

-> Bitmap Heap Scan on address (cost=4.29..9.32 rows=2 width=4)

Recheck Cond: ((postal_code)::text = '52137'::text)

-> Bitmap Index Scan on quiz_answer_1 (cost=0.00..4.29 rows=2 width=0)

Index Cond: ((postal_code)::text = '52137'::text)

-> Index Scan using idx_fk_address_id on customer (cost=0.28..8.29 rows=1 width=70)

Index Cond: (address_id = address.address_id)

(7 rows)

The results are interesting. Note the costs for the index scan on idx_fx_address_id stay the same as the new index does not work on the customer table. But the new index does bring down the scan on the address from 15.54 to 9.32. And the nested look cost drop from 32.14 to 25.92. The optimization is a bitmap scan.

From the PostgreSQL manual – Here the planner has decided to use a two-step plan: the bottom plan node visits an index to find the locations of rows matching the index condition, and then the upper plan node actually fetches those rows from the table itself. Fetching the rows separately is much more expensive than sequentially reading them, but because not all the pages of the table have to be visited, this is still cheaper than a sequential scan. (The reason for using two levels of plan is that the upper plan node sorts the row locations identified by the index into physical order before reading them, so as to minimize the costs of the separate fetches. The “bitmap” mentioned in the node names is the mechanism that does the sorting.)

So the index does speed the query up but with a much different optimization than what MySQL would use.

Next episode — Vacuuming Tables

Stay tuned!

The past videos for PostgreSQL for MySQL Database Administrators (DBA) can be found here: episode one, episode two, episode three, episode four, and episode five.

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

PostgreSQL for MySQL DBAs Episode 6 – Explaining EXPLAIN (And an Answer to a Bonus Question)

Quiz Answer

Next episode — Vacuuming Tables

Related

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

PostgreSQL for MySQL DBAs Episode 6 – Explaining EXPLAIN (And an Answer to a Bonus Question)

Quiz Answer

Next episode — Vacuuming Tables

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation