MySQL Query Tuning for Dev[Op]s webinarFirst, I want to thank everyone who attended my November 9, 2021 webinar “Introduction to MySQL Query Tuning for Dev[Op]s“. Recording and slides are available on our webinars page.

Here are answers to the questions from participants which I was not able to provide during the webinar due to a technical issue we experienced at the end.

Q: If a large/complex SELECT is run from inside a Stored Procedure, will it use/not use the same optimizations as it would if run as a direct query? Is there anything to bear in mind when writing Stored Procedures that need to run large/complex SELECTs from within them?

Just wanted to clarify, using a stored procedure does not utilize any of the indexes on the corresponding tables?

A: Any query inside a stored procedure will be optimized the same way as if it was called outside of the routine. We cannot run EXPLAIN  on stored procedures but we can prove it with the help of Performance Schema.

For example, let’s take a standard test database employees and perform a quite ineffective query on it:

Then let’s create a stored routine that uses this query:

This routine call takes about the same time on my laptop as if I run the query separately:

If I then enable statement instrumentation in Performance Schema and run the query again I would have the following output:

The interesting part of the output is the performance fields with non-zero values. Namely,

Now let’s call the stored procedure and compare these values.

This time, the select from the performance_schema.events_statements_history returned two rows: one for the SELECT  statement and one for the CALL  command. We are interested in the data for the SELECT  statement. It is:

So performance metrics are absolutely the same as for the original query.

OK, this was for the slow, not effective query. But what about the one that uses indexes?

Let’s craft another example. Consider a query:

And the stored procedure:

Now let’s call the query and examine data in the Performance Schema:

In this case, the picture is different:

This is still not a very effective query: it uses an index scan but it certainly uses the index.

Let’s check data for the stored procedure:

Counters again have exactly the same values as for the standalone statement:

So we have proof that the optimizer creates the same query plan no matter if the query was called inside the stored procedure or not.

Q: A few times when I was building a query for a table with multiple indexes, the EXPLAIN command shows me an index that is not clearly the best choice and I had to use FORCE INDEX in the query. I never understand why sometimes this happens, is it possible that sometimes the engine makes that mistake?

A: The engine can make a mistake for sure. If you want to understand more about such mistakes I recommend you to try EXPLAIN FORMAT=JSON  as described in these series as well as studying the Optimizer Trace. While the different formats of the EXPLAIN do not change the query plan, EXPLAIN FORMAT=JSON  provides more information on the optimizations that are used to resolve the query.

Q: I have a question about statuses. It’s about the Handler_read you mentioned. It keeps on increasing when queries are being executed. If the server has an uptime of more than 14 days, having 1000qps, the handler will be in the range of millions. Once we do a query tuning workshop, I would like to reset the Handlers counter. How to perform that? (just to see: for the same time range, whether the Handler_read decreases)

A: To reset counters for the session use the FLUSH STATUS  statement. Then you can run the original statement and examine which job needs to be done inside the engine to resolve it.

Here is the pseudo-code, showing how to do it:

Q: Can you talk a bit on explain extended? How can we use that to further help us in tuning queries? Does it show the full query that the server/optimizer will execute in the correct format?

A: EXPLAIN EXTENDED  is included in the standard output of the EXPLAIN  command since version 5.7. If you started using MySQL earlier and preferred to run EXPLAIN  command you would now notice two differences.

First, this is the “Filtered”  column. This column shows how many rows the optimizer expects to be discarded from the output. For example, for the query:

It expects that the engine will read 2838426 rows but only 11.11% of them would be used to get the final result. This usually indicates that the query is not effective.

On the other way around, the query:

Would use all 442189 retrieved rows to create the final result set (filtered: 100.00).

Another feature of the EXPLAIN EXTENDED  before version 5.7 and regular EXPLAIN  since version 5.7 is that it returns a query as it was rewritten by the optimizer in its diagnostic area, accessible by the SHOW WARNINGS  command.

For example, let’s take a query:

And then run EXPLAIN  on it, followed by the SHOW WARNINGS :

In the SHOW WARNINGS  output, you see that the optimizer used semi-join optimization for the query: it practically converted one of the subqueries into a JOIN. The same query could be written as:

If we turn semi-join optimization off we will see a different warning:

This feature could help to understand why one or particular optimization was used.

Q: Have you ever used the mysqltuner perl script and if so, would you suggest it as a short-term option?

A: Do you mean https://github.com/major/MySQLTuner-perl ?

I just run it on my laptop and here are the recommendations I got:

——– Recommendations —————————————————————————

General recommendations:

Reduce or eliminate unclosed connections and network issues

Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1

This is, probably, OK, unless you want to provide access to your MySQL server from the outside.

    We will suggest raising the ‘join_buffer_size’ until JOINs not using indexes are found.

See https://dev.mysql.com/doc/internals/en/join-buffer-size.html

(specially the conclusions at the bottom of the page).

I do not understand why to raise join_buffer_size  if it is not required by queries I use. It also could be dangerous if the number of connections increases.

    Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: https://bit.ly/2TcGgtU

It is always good to read the user manual before changing options. I recommend you to do it every time you follow performance tuning advice. Even mine.

Variables to adjust:

join_buffer_size (> 256.0K, or always use indexes with JOINs)

Again, this could not be suggested without examining the queries.

    innodb_buffer_pool_size (>= 713.2M) if possible.

This conclusion is based on my data size and this is sane for my laptop with 32G RAM. But if the amount of data is larger than the amount of RAM on the machine this advice would not help you to identify the ideal InnoDB buffer pool size. In this case, I recommend you to start from this blog post and follow the links at the end of it.

innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.

InnoDB redo log file size should hold as much data, so InnoDB can flush the data from the buffer pool and reuse log space and does not have to perform aggressive flushing.

If you have a small active data set but you write a lot you may have InnoDB’s total log files size greater than 25% of the buffer pool size. Or, if you have a large dataset, but your workload is almost read, you may have very small redo log files. This advice does not make any sense by itself.

In conclusion, I can say that MySQL Tuner is a product that performs static analysis of your MySQL or MariaDB instance and makes suggestions, based on what its authors think the best practices are.

Unfortunately, it is not possible to tune MySQL the same way for all use cases. InnoDB redo log file size above is just one example. There are other options that should be tuned differently depending on which workload you have.

I suggest you study the product you use and understand what and why you are changing. And it is better to start from the question: “Which problem do I need to solve?” instead of modifying random options.

For a general-purpose server, running on the dedicated machine, you can use option –innodb-dedicated-server. Though, in my opinion, it is far from the ideal too.

Q: Mixed based?

A: Is this the question for the slide “Asynchronous Replica”, explaining how binary logging format affects replication performance? Mixed-based binary log format instructs MySQL to log everything in the STATEMENT  format by default and automatically switch to the ROW  format when a user issues commands that are not safe for the STATEMENT  format. This means that safe queries that were slow on the source server and replicated in the STATEMENT  format, will experience the same performance issues on the replica too. Unsafe queries are not affected by this behavior, but tables that they modify have to have PRIMARY KEY  defined. Otherwise, the replica would have to do a full table scan for each row updated.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments