First, I want to thank everyone who attended my November 9, 2021 webinar “Introduction to MySQL Query Tuning for Dev[Op]s“. Recording and slides are available on our webinars page.
Here are answers to the questions from participants which I was not able to provide during the webinar due to a technical issue we experienced at the end.
Q: If a large/complex SELECT is run from inside a Stored Procedure, will it use/not use the same optimizations as it would if run as a direct query? Is there anything to bear in mind when writing Stored Procedures that need to run large/complex SELECTs from within them?
Just wanted to clarify, using a stored procedure does not utilize any of the indexes on the corresponding tables?
A: Any query inside a stored procedure will be optimized the same way as if it was called outside of the routine. We cannot run EXPLAIN on stored procedures but we can prove it with the help of Performance Schema.
For example, let’s take a standard test database employees and perform a quite ineffective query on it:
1 2 3 4 5 6 7 8 9 10 11 12 | mysql> SELECT emp_no, first_name, last_name, title, salary FROM employees -> JOIN titles USING(emp_no) JOIN salaries USING(emp_no) -> WHERE salary = (SELECT MAX(salary) FROM salaries) -> OR salary = (SELECT MIN(salary) FROM salaries); +--------+------------+-----------+------------------+--------+ | emp_no | first_name | last_name | title | salary | +--------+------------+-----------+------------------+--------+ | 43624 | Tokuyasu | Pesch | Senior Staff | 158220 | | 43624 | Tokuyasu | Pesch | Staff | 158220 | | 253406 | Olivera | Baek | Technique Leader | 38623 | +--------+------------+-----------+------------------+--------+ 3 rows in set (4,38 sec) |
Then let’s create a stored routine that uses this query:
1 2 3 4 5 6 7 8 9 10 11 12 | mysql> \d | mysql> CREATE PROCEDURE my_test() -> BEGIN -> SELECT emp_no, first_name, last_name, title, salary -> FROM employees JOIN titles USING(emp_no) -> JOIN salaries USING(emp_no) -> WHERE salary = (SELECT MAX(salary) FROM salaries) -> OR salary = (SELECT MIN(salary) FROM salaries); -> END -> | Query OK, 0 rows affected (0,02 sec) |
This routine call takes about the same time on my laptop as if I run the query separately:
1 2 3 4 5 6 7 8 9 10 | mysql> CALL my_test(); +--------+------------+-----------+------------------+--------+ | emp_no | first_name | last_name | title | salary | +--------+------------+-----------+------------------+--------+ | 43624 | Tokuyasu | Pesch | Senior Staff | 158220 | | 43624 | Tokuyasu | Pesch | Staff | 158220 | | 253406 | Olivera | Baek | Technique Leader | 38623 | +--------+------------+-----------+------------------+--------+ 3 rows in set (4,35 sec) Query OK, 0 rows affected (4,35 sec) |
If I then enable statement instrumentation in Performance Schema and run the query again I would have the following output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | mysql> USE performance_schema Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> UPDATE setup_instruments SET enabled='yes' WHERE name LIKE 'statement/%'; Query OK, 0 rows affected (0,00 sec) Rows matched: 213 Changed: 0 Warnings: 0 mysql> UPDATE setup_consumers SET enabled='yes' WHERE name LIKE '%statement%'; Query OK, 1 row affected (0,00 sec) Rows matched: 4 Changed: 1 Warnings: 0 mysql> USE employees; Database changed mysql> TRUNCATE performance_schema.events_statements_history; Query OK, 0 rows affected (0,01 sec) mysql> SELECT emp_no, first_name, last_name, title, salary FROM employees -> JOIN titles USING(emp_no) JOIN salaries USING(emp_no) -> WHERE salary = (SELECT MAX(salary) FROM salaries) -> OR salary = (SELECT MIN(salary) FROM salaries); +--------+------------+-----------+------------------+--------+ | emp_no | first_name | last_name | title | salary | +--------+------------+-----------+------------------+--------+ | 43624 | Tokuyasu | Pesch | Senior Staff | 158220 | | 43624 | Tokuyasu | Pesch | Staff | 158220 | | 253406 | Olivera | Baek | Technique Leader | 38623 | +--------+------------+-----------+------------------+--------+ 3 rows in set (4,41 sec) mysql> SELECT * FROM performance_schema.events_statements_history -> WHERE thread_id=(SELECT thread_id FROM performance_schema.threads WHERE processlist_id=@@pseudo_thread_id) -> AND event_name != 'statement/sql/truncate'\G *************************** 1. row *************************** THREAD_ID: 100414 EVENT_ID: 24392721 END_EVENT_ID: 26360924 EVENT_NAME: statement/sql/select SOURCE: init_net_server_extension.cc:93 TIMER_START: 271361154464092000 TIMER_END: 271365562450546000 TIMER_WAIT: 4407986454000 LOCK_TIME: 325000000 SQL_TEXT: SELECT emp_no, first_name, last_name, title, salary FROM employees JOIN titles USING(emp_no) JOIN salaries USING(emp_no) WHERE salary = (SELECT MAX(salary) FROM salaries) OR salary = (SELECT MIN(salary) FROM salaries) DIGEST: 4fe5ea4a8a96539aa9e837031c7c4ea59c1b772272a83c1db5e2833170c2d563 DIGEST_TEXT: SELECT `emp_no` , `first_name` , `last_name` , `title` , `salary` FROM `employees` JOIN `titles` USING ( `emp_no` ) JOIN `salaries` USING ( `emp_no` ) WHERE `salary` = ( SELECT MAX ( `salary` ) FROM `salaries` ) OR `salary` = ( SELECT MIN ( `salary` ) FROM `salaries` ) CURRENT_SCHEMA: employees OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 0 RETURNED_SQLSTATE: NULL MESSAGE_TEXT: NULL ERRORS: 0 WARNINGS: 0 ROWS_AFFECTED: 0 ROWS_SENT: 3 ROWS_EXAMINED: 11069933 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 0 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 3 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 1 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: NULL NESTING_EVENT_TYPE: NULL NESTING_EVENT_LEVEL: 0 STATEMENT_ID: 428707 1 row in set (0,00 sec) |
The interesting part of the output is the performance fields with non-zero values. Namely,
1 2 3 4 | ROWS_SENT: 3 ROWS_EXAMINED: 11069933 SELECT_SCAN: 3 NO_INDEX_USED: 1 |
Now let’s call the stored procedure and compare these values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | mysql> TRUNCATE performance_schema.events_statements_history; Query OK, 0 rows affected (0,00 sec) mysql> CALL my_test(); +--------+------------+-----------+------------------+--------+ | emp_no | first_name | last_name | title | salary | +--------+------------+-----------+------------------+--------+ | 43624 | Tokuyasu | Pesch | Senior Staff | 158220 | | 43624 | Tokuyasu | Pesch | Staff | 158220 | | 253406 | Olivera | Baek | Technique Leader | 38623 | +--------+------------+-----------+------------------+--------+ 3 rows in set (4,55 sec) Query OK, 0 rows affected (4,55 sec) mysql> SELECT * FROM performance_schema.events_statements_history -> WHERE thread_id=(SELECT thread_id FROM performance_schema.threads WHERE processlist_id=@@pseudo_thread_id) -> AND event_name != 'statement/sql/truncate'\G *************************** 1. row *************************** THREAD_ID: 100414 EVENT_ID: 28336202 END_EVENT_ID: 30292242 EVENT_NAME: statement/sp/stmt SOURCE: sp_head.cc:2210 TIMER_START: 271549571279270000 TIMER_END: 271554128099300000 TIMER_WAIT: 4556820030000 LOCK_TIME: 139000000 SQL_TEXT: SELECT emp_no, first_name, last_name, title, salary FROM employees JOIN titles USING(emp_no) JOIN salaries USING(emp_no) WHERE salary = (SELECT MAX(salary) FROM salaries) OR salary = (SELECT MIN(salary) FROM salaries) DIGEST: NULL DIGEST_TEXT: NULL CURRENT_SCHEMA: employees OBJECT_TYPE: PROCEDURE OBJECT_SCHEMA: employees OBJECT_NAME: my_test OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 0 RETURNED_SQLSTATE: NULL MESSAGE_TEXT: NULL ERRORS: 0 WARNINGS: 0 ROWS_AFFECTED: 0 ROWS_SENT: 3 ROWS_EXAMINED: 11069933 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 0 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 3 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 1 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: 28336201 NESTING_EVENT_TYPE: STATEMENT NESTING_EVENT_LEVEL: 1 STATEMENT_ID: 429597 *************************** 2. row *************************** THREAD_ID: 100414 EVENT_ID: 28336201 END_EVENT_ID: 30292243 EVENT_NAME: statement/sql/call_procedure SOURCE: init_net_server_extension.cc:93 TIMER_START: 271549571185646000 TIMER_END: 271554128135521000 TIMER_WAIT: 4556949875000 LOCK_TIME: 0 SQL_TEXT: CALL my_test() DIGEST: 070cb7d4d2eba9690a3a993ec61700fb6d6c3e46166e329408d7284e45fd58c2 DIGEST_TEXT: CALL `my_test` ( ) CURRENT_SCHEMA: employees OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 0 RETURNED_SQLSTATE: 00000 MESSAGE_TEXT: NULL ERRORS: 0 WARNINGS: 0 ROWS_AFFECTED: 0 ROWS_SENT: 0 ROWS_EXAMINED: 0 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 0 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 0 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 0 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: NULL NESTING_EVENT_TYPE: NULL NESTING_EVENT_LEVEL: 0 STATEMENT_ID: 429596 2 rows in set (0,00 sec) |
This time, the select from the performance_schema.events_statements_history returned two rows: one for the SELECT statement and one for the CALL command. We are interested in the data for the SELECT statement. It is:
1 2 3 4 | ROWS_SENT: 3 ROWS_EXAMINED: 11069933 SELECT_SCAN: 3 NO_INDEX_USED: 1 |
So performance metrics are absolutely the same as for the original query.
OK, this was for the slow, not effective query. But what about the one that uses indexes?
Let’s craft another example. Consider a query:
1 | SELECT COUNT(*), title FROM titles GROUP BY title; |
And the stored procedure:
1 2 3 4 5 6 | mysql> CREATE PROCEDURE my_second_test() -> BEGIN -> SELECT COUNT(*), title FROM titles GROUP BY title; -> END -> | Query OK, 0 rows affected (0,02 sec) |
Now let’s call the query and examine data in the Performance Schema:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | mysql> TRUNCATE performance_schema.events_statements_history; Query OK, 0 rows affected (0,00 sec) mysql> SELECT COUNT(*), title FROM titles GROUP BY title; +----------+--------------------+ | COUNT(*) | title | +----------+--------------------+ | 97750 | Senior Engineer | | 107391 | Staff | | 115003 | Engineer | | 92853 | Senior Staff | | 15128 | Assistant Engineer | | 15159 | Technique Leader | | 24 | Manager | +----------+--------------------+ 7 rows in set (0,28 sec) mysql> SELECT * FROM performance_schema.events_statements_history -> WHERE thread_id=(SELECT thread_id FROM performance_schema.threads WHERE processlist_id=@@pseudo_thread_id) -> AND event_name != 'statement/sql/truncate'\G *************************** 1. row *************************** THREAD_ID: 100414 EVENT_ID: 30340666 END_EVENT_ID: 30346280 EVENT_NAME: statement/sql/select SOURCE: init_net_server_extension.cc:93 TIMER_START: 271987423077801000 TIMER_END: 271987697673114000 TIMER_WAIT: 274595313000 LOCK_TIME: 137000000 SQL_TEXT: SELECT COUNT(*), title FROM titles GROUP BY title DIGEST: 3b63daf99fa8586784dcbc73f82b79e777e4fc074c37de3ebd25753f0d684a46 DIGEST_TEXT: SELECT COUNT ( * ) , `title` FROM `titles` GROUP BY `title` CURRENT_SCHEMA: employees OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 0 RETURNED_SQLSTATE: NULL MESSAGE_TEXT: NULL ERRORS: 0 WARNINGS: 0 ROWS_AFFECTED: 0 ROWS_SENT: 7 ROWS_EXAMINED: 443308 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 1 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 1 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 0 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: NULL NESTING_EVENT_TYPE: NULL NESTING_EVENT_LEVEL: 0 STATEMENT_ID: 431634 1 row in set (0,00 sec) |
In this case, the picture is different:
1 2 3 4 5 6 | ROWS_SENT: 7 ROWS_EXAMINED: 443308 CREATED_TMP_TABLES: 1 SELECT_SCAN: 1 NO_INDEX_USED: 0 NO_GOOD_INDEX_USED: 0 |
This is still not a very effective query: it uses an index scan but it certainly uses the index.
Let’s check data for the stored procedure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | mysql> TRUNCATE performance_schema.events_statements_history; Query OK, 0 rows affected (0,00 sec) mysql> CALL my_second_test(); +----------+--------------------+ | COUNT(*) | title | +----------+--------------------+ | 97750 | Senior Engineer | | 107391 | Staff | | 115003 | Engineer | | 92853 | Senior Staff | | 15128 | Assistant Engineer | | 15159 | Technique Leader | | 24 | Manager | +----------+--------------------+ 7 rows in set (0,28 sec) Query OK, 0 rows affected (0,28 sec) mysql> SELECT * FROM performance_schema.events_statements_history -> WHERE thread_id=(SELECT thread_id FROM performance_schema.threads WHERE processlist_id=@@pseudo_thread_id) -> AND event_name != 'statement/sql/truncate'\G *************************** 1. row *************************** THREAD_ID: 100414 EVENT_ID: 30354940 END_EVENT_ID: 30360553 EVENT_NAME: statement/sp/stmt SOURCE: sp_head.cc:2210 TIMER_START: 272124909174908000 TIMER_END: 272125181823179000 TIMER_WAIT: 272648271000 LOCK_TIME: 222000000 SQL_TEXT: SELECT COUNT(*), title FROM titles GROUP BY title DIGEST: NULL DIGEST_TEXT: NULL CURRENT_SCHEMA: employees OBJECT_TYPE: PROCEDURE OBJECT_SCHEMA: employees OBJECT_NAME: my_second_test OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 0 RETURNED_SQLSTATE: NULL MESSAGE_TEXT: NULL ERRORS: 0 WARNINGS: 0 ROWS_AFFECTED: 0 ROWS_SENT: 7 ROWS_EXAMINED: 443308 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 1 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 1 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 0 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: 30354939 NESTING_EVENT_TYPE: STATEMENT NESTING_EVENT_LEVEL: 1 STATEMENT_ID: 432318 *************************** 2. row *************************** THREAD_ID: 100414 EVENT_ID: 30354939 END_EVENT_ID: 30360554 EVENT_NAME: statement/sql/call_procedure <I omitted data for the CALL statement> |
Counters again have exactly the same values as for the standalone statement:
1 2 3 4 5 6 | ROWS_SENT: 7 ROWS_EXAMINED: 443308 CREATED_TMP_TABLES: 1 SELECT_SCAN: 1 NO_INDEX_USED: 0 NO_GOOD_INDEX_USED: 0 |
So we have proof that the optimizer creates the same query plan no matter if the query was called inside the stored procedure or not.
Q: A few times when I was building a query for a table with multiple indexes, the EXPLAIN command shows me an index that is not clearly the best choice and I had to use FORCE INDEX in the query. I never understand why sometimes this happens, is it possible that sometimes the engine makes that mistake?
A: The engine can make a mistake for sure. If you want to understand more about such mistakes I recommend you to try EXPLAIN FORMAT=JSON as described in these series as well as studying the Optimizer Trace. While the different formats of the EXPLAIN do not change the query plan, EXPLAIN FORMAT=JSON provides more information on the optimizations that are used to resolve the query.
Q: I have a question about statuses. It’s about the Handler_read you mentioned. It keeps on increasing when queries are being executed. If the server has an uptime of more than 14 days, having 1000qps, the handler will be in the range of millions. Once we do a query tuning workshop, I would like to reset the Handlers counter. How to perform that? (just to see: for the same time range, whether the Handler_read decreases)
A: To reset counters for the session use the FLUSH STATUS statement. Then you can run the original statement and examine which job needs to be done inside the engine to resolve it.
Here is the pseudo-code, showing how to do it:
1 2 3 | FLUSH STATUS; <RUN YOUR QUERY> SHOW STATUS LIKE ‘Handler%’; |
Q: Can you talk a bit on explain extended? How can we use that to further help us in tuning queries? Does it show the full query that the server/optimizer will execute in the correct format?
A: EXPLAIN EXTENDED is included in the standard output of the EXPLAIN command since version 5.7. If you started using MySQL earlier and preferred to run EXPLAIN command you would now notice two differences.
First, this is the “Filtered” column. This column shows how many rows the optimizer expects to be discarded from the output. For example, for the query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | mysql> EXPLAIN SELECT salary FROM salaries WHERE salary BETWEEN 158000 AND 159000\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2838426 filtered: 11.11 Extra: Using where 1 row in set, 1 warning (0,00 sec) |
It expects that the engine will read 2838426 rows but only 11.11% of them would be used to get the final result. This usually indicates that the query is not effective.
On the other way around, the query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | mysql> EXPLAIN SELECT DISTINCT(title) FROM titles\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: titles partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 209 ref: NULL rows: 442189 filtered: 100.00 Extra: Using index; Using temporary 1 row in set, 1 warning (0,00 sec) |
Would use all 442189 retrieved rows to create the final result set (filtered: 100.00).
Another feature of the EXPLAIN EXTENDED before version 5.7 and regular EXPLAIN since version 5.7 is that it returns a query as it was rewritten by the optimizer in its diagnostic area, accessible by the SHOW WARNINGS command.
For example, let’s take a query:
1 2 | SELECT emp_no, first_name, last_name FROM employees WHERE emp_no IN (SELECT emp_no FROM salaries WHERE salary = (SELECT MAX(salary) FROM salaries)); |
And then run EXPLAIN on it, followed by the SHOW WARNINGS :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | mysql> EXPLAIN SELECT emp_no, first_name, last_name FROM employees -> WHERE emp_no IN (SELECT emp_no FROM salaries WHERE salary = (SELECT MAX(salary) FROM salaries))\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 299113 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: <subquery2> partitions: NULL type: eq_ref possible_keys: <auto_distinct_key> key: <auto_distinct_key> key_len: 4 ref: employees.employees.emp_no rows: 1 filtered: 100.00 Extra: NULL *************************** 3. row *************************** id: 2 select_type: MATERIALIZED table: salaries partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 2838426 filtered: 10.00 Extra: Using where *************************** 4. row *************************** id: 3 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2838426 filtered: 100.00 Extra: NULL 4 rows in set, 1 warning (0,00 sec) mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`first_name` AS `first_name`, `employees`.`employees`.`last_name` AS `last_name` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`<subquery2>`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` = (/* select#3 */ select max(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) 1 row in set (0,00 sec) |
In the SHOW WARNINGS output, you see that the optimizer used semi-join optimization for the query: it practically converted one of the subqueries into a JOIN. The same query could be written as:
1 | SELECT emp_no, first_name, last_name FROM employees JOIN salaries USING(emp_no) WHERE salary = (SELECT MAX(salary) FROM salaries); |
If we turn semi-join optimization off we will see a different warning:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | mysql> SET optimizer_switch='semijoin=off'; Query OK, 0 rows affected (0,00 sec) mysql> EXPLAIN SELECT emp_no, first_name, last_name FROM employees WHERE emp_no IN (SELECT emp_no FROM salaries WHERE salary = (SELECT MAX(salary) FROM salaries))\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299113 filtered: 100.00 Extra: Using where *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 2838426 filtered: 10.00 Extra: Using where *************************** 3. row *************************** id: 3 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2838426 filtered: 100.00 Extra: NULL 3 rows in set, 1 warning (0,00 sec) mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`first_name` AS `first_name`, `employees`.`employees`.`last_name` AS `last_name` from `employees`.`employees` where <in_optimizer>(`employees`.`employees`.`emp_no`,`employees`.`employees`.`emp_no` in ( <materialize> (/* select#2 */ select `employees`.`salaries`.`emp_no` from `employees`.`salaries` where (`employees`.`salaries`.`salary` = (/* select#3 */ select max(`employees`.`salaries`.`salary`) from `employees`.`salaries`)) ), <primary_index_lookup>(`employees`.`employees`.`emp_no` in <temporary table> on <auto_distinct_key> where ((`employees`.`employees`.`emp_no` = `<materialized_subquery>`.`emp_no`))))) 1 row in set (0,00 sec) |
This feature could help to understand why one or particular optimization was used.
Q: Have you ever used the mysqltuner perl script and if so, would you suggest it as a short-term option?
A: Do you mean https://github.com/major/MySQLTuner-perl ?
I just run it on my laptop and here are the recommendations I got:
——– Recommendations —————————————————————————
General recommendations:
Reduce or eliminate unclosed connections and network issues
Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
This is, probably, OK, unless you want to provide access to your MySQL server from the outside.
We will suggest raising the ‘join_buffer_size’ until JOINs not using indexes are found.
See https://dev.mysql.com/doc/internals/en/join-buffer-size.html
(specially the conclusions at the bottom of the page).
I do not understand why to raise join_buffer_size if it is not required by queries I use. It also could be dangerous if the number of connections increases.
Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: https://bit.ly/2TcGgtU
It is always good to read the user manual before changing options. I recommend you to do it every time you follow performance tuning advice. Even mine.
Variables to adjust:
join_buffer_size (> 256.0K, or always use indexes with JOINs)
Again, this could not be suggested without examining the queries.
innodb_buffer_pool_size (>= 713.2M) if possible.
This conclusion is based on my data size and this is sane for my laptop with 32G RAM. But if the amount of data is larger than the amount of RAM on the machine this advice would not help you to identify the ideal InnoDB buffer pool size. In this case, I recommend you to start from this blog post and follow the links at the end of it.
innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
InnoDB redo log file size should hold as much data, so InnoDB can flush the data from the buffer pool and reuse log space and does not have to perform aggressive flushing.
If you have a small active data set but you write a lot you may have InnoDB’s total log files size greater than 25% of the buffer pool size. Or, if you have a large dataset, but your workload is almost read, you may have very small redo log files. This advice does not make any sense by itself.
In conclusion, I can say that MySQL Tuner is a product that performs static analysis of your MySQL or MariaDB instance and makes suggestions, based on what its authors think the best practices are.
Unfortunately, it is not possible to tune MySQL the same way for all use cases. InnoDB redo log file size above is just one example. There are other options that should be tuned differently depending on which workload you have.
I suggest you study the product you use and understand what and why you are changing. And it is better to start from the question: “Which problem do I need to solve?” instead of modifying random options.
For a general-purpose server, running on the dedicated machine, you can use option –innodb-dedicated-server. Though, in my opinion, it is far from the ideal too.
Q: Mixed based?
A: Is this the question for the slide “Asynchronous Replica”, explaining how binary logging format affects replication performance? Mixed-based binary log format instructs MySQL to log everything in the STATEMENT format by default and automatically switch to the ROW format when a user issues commands that are not safe for the STATEMENT format. This means that safe queries that were slow on the source server and replicated in the STATEMENT format, will experience the same performance issues on the replica too. Unsafe queries are not affected by this behavior, but tables that they modify have to have PRIMARY KEY defined. Otherwise, the replica would have to do a full table scan for each row updated.