Correcting MySQL Inaccurate Table Statistics for Better Execution Plan

Abstract:
By diving into the details of our case study, we will explain how incorrect table statistics may lead the optimizer to choose a suboptimal execution plan. We will also go into how MySQL calculates the table statistics and the ways to correct the table statistics to prevent it from happening again.

Case study: Incorrect table statistics lead the optimizer to choose a poor execution plan.

A customer reported a drastic performance degradation of a query while there were no code changes and no configuration changes made. The data in this article has been edited for brevity and modified to mitigate the exposure of confidential information. The case has also been approved for publication by the customer.

We obtained the query execution plan, and got the results as shown below (execution plan #1):

mysql> explain 
-> SELECT count(con.id)          ,
->        MAX(DAYNAME(con.date)) ,
->        now()                  ,
->        pcz.type,
->        pcz.c_c
-> FROM   con AS con
->        join orders o on con.o_id = o.id
->        JOIN pcz AS pcz ON o.d_p_c_z_id = pcz.id
->        left join c c on con.c_id = c.id
-> WHERE con.date = current_date() and pcz.type = "T_D"
-> GROUP BY con.date, pcz.c_c,  pcz.type;
+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys     | key      | key_len | ref                        | rows | filtered | Extra                                        |
+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | pcz   | NULL       | ALL    | PRIMARY           | NULL     | NULL    | NULL                       |  194 |    10.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | o     | NULL       | ref    | PRIMARY,dpcz_FK   | dpcz_FK  | 9       | custom.pcz.id              | 1642 |   100.00 | Using index                                  |
|  1 | SIMPLE      | con   | NULL       | ref    | FK_order,IDX_date | FK_order | 8       | custom.o.id                |    1 |     4.23 | Using where                                  |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY           | PRIMARY  | 8       | custom.con.c_id            |    1 |   100.00 | Using index                                  |
+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+

mysql> explain

-> SELECT count(con.id) ,

-> MAX(DAYNAME(con.date)) ,

-> now() ,

-> pcz.type,

-> pcz.c_c

-> FROM con AS con

-> join orders o on con.o_id = o.id

-> JOIN pcz AS pcz ON o.d_p_c_z_id = pcz.id

-> left join c c on con.c_id = c.id

-> WHERE con.date = current_date() and pcz.type = "T_D"

-> GROUP BY con.date, pcz.c_c, pcz.type;

+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+

+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+

| 1 | SIMPLE | pcz | NULL | ALL | PRIMARY | NULL | NULL | NULL | 194 | 10.00 | Using where; Using temporary; Using filesort |

| 1 | SIMPLE | o | NULL | ref | PRIMARY,dpcz_FK | dpcz_FK | 9 | custom.pcz.id | 1642 | 100.00 | Using index |

| 1 | SIMPLE | con | NULL | ref | FK_order,IDX_date | FK_order | 8 | custom.o.id | 1 | 4.23 | Using where |

| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index |

+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+

The estimated rows to be examined = (194*10%)*1642*(1*4.23%)=1347

Upon taking a closer look at the query, we see the condition: con.date = current_date() . This condition seems to be a better choice to help filter results, but why did the MySQL optimizer skip out on using the index? Let’s take a look at the execution plan by forcing the use of the index on the con.date field. The explain output (execution plan #2) will be:

mysql> explain 
-> SELECT count(con.id)          ,
->        MAX(DAYNAME(con.date)) ,
->        now()                  ,
->        pcz.type,
->        pcz.c_c
-> FROM   con AS con USE INDEX(IDX_date)
->        join orders o on con.o_id = o.id
->        JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id
->        left join c c on con.c_id = c.id
-> WHERE con.date = current_date() and pcz.type = "T_D"
-> GROUP BY con.date, pcz.c_c,  pcz.type;
+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+
| id | select_type | table | partitions | type   | possible_keys   | key      | key_len | ref                                   | rows   | filtered | Extra                           |
+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+
|  1 | SIMPLE      | con   | NULL       | ref    | IDX_date        | IDX_date | 3       | const                                 | 110446 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY         | PRIMARY  | 8       | custom.con.c_id                       |      1 |   100.00 | Using index                     |
|  1 | SIMPLE      | o     | NULL       | eq_ref | PRIMARY,dpcz_FK | PRIMARY  | 8       | custom.con.o_id                       |      1 |   100.00 | Using where                     |
|  1 | SIMPLE      | pcz   | NULL       | eq_ref | PRIMARY         | PRIMARY  | 8       | custom.o.d_p_c_z_id                   |      1 |    10.00 | Using where                     |
+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+

mysql> explain

-> SELECT count(con.id) ,

-> MAX(DAYNAME(con.date)) ,

-> now() ,

-> pcz.type,

-> pcz.c_c

-> FROM con AS con USE INDEX(IDX_date)

-> join orders o on con.o_id = o.id

-> JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id

-> left join c c on con.c_id = c.id

-> WHERE con.date = current_date() and pcz.type = "T_D"

-> GROUP BY con.date, pcz.c_c, pcz.type;

+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+

+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+

| 1 | SIMPLE | con | NULL | ref | IDX_date | IDX_date | 3 | const | 110446 | 100.00 | Using temporary; Using filesort |

| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index |

| 1 | SIMPLE | o | NULL | eq_ref | PRIMARY,dpcz_FK | PRIMARY | 8 | custom.con.o_id | 1 | 100.00 | Using where |

| 1 | SIMPLE | pcz | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.o.d_p_c_z_id | 1 | 10.00 | Using where |

+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+

The estimated rows to be examined = 110446*(1*10%)=11045 rows

Based on the estimate, because 1347 is around one-tenth of 11045, it is reasonable for MySQL to choose execution plan #1, as it appears to be optimal.

However, in comparing the expected response times with the actual results, something was clearly off. While the second execution plan returned results within the expected time (before the performance degradation), execution plan #1 exceeded the estimated time for response.

Taking a further look at the structure of the table orders and execution plan #1, we found out that there are actually 194 rows of the table pcz. Moreover, looking at the index orders.dpcz_FK, the table orders will return 1642 rows, because of the FOREIGN KEY constraint orders_ibfk_10 as shown below – this implies that the number of rows in the table orders should be 194*1642=318548, but the actual number of rows of the table orders is 32508150, which is more than 100 times the estimated amount of 318548.

CREATE TABLE `orders` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  ...
  `d_p_c_z_id` bigint(20) DEFAULT NULL,
  ...,
  PRIMARY KEY (`id`),
  ...
  KEY `dpcz_FK` (`d_p_c_z_id`),
  ...
  CONSTRAINT `orders_ibfk_10` FOREIGN KEY (`d_p_c_z_id`) REFERENCES `p_c_z` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  ...
) ENGINE=InnoDB ....
mysql> select * from mysql.innodb_table_stats  where database_name='cutom' and table_name='orders';
+---------------+------------+---------------------+----------+----------------------+--------------------------+
| database_name | table_name | last_update         | n_rows   | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+----------+----------------------+--------------------------+
| custom        | orders     | 2022-03-03 21:58:18 | 32508150 |               349120 |                   697618 |
+---------------+------------+---------------------+----------+----------------------+--------------------------+

CREATE TABLE `orders` (

`id` bigint(20) NOT NULL AUTO_INCREMENT,

...

`d_p_c_z_id` bigint(20) DEFAULT NULL,

...,

PRIMARY KEY (`id`),

...

KEY `dpcz_FK` (`d_p_c_z_id`),

...

CONSTRAINT `orders_ibfk_10` FOREIGN KEY (`d_p_c_z_id`) REFERENCES `p_c_z` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

...

) ENGINE=InnoDB ....

mysql> select * from mysql.innodb_table_stats where database_name='cutom' and table_name='orders';

+---------------+------------+---------------------+----------+----------------------+--------------------------+

+---------------+------------+---------------------+----------+----------------------+--------------------------+

| custom | orders | 2022-03-03 21:58:18 | 32508150 | 349120 | 697618 |

+---------------+------------+---------------------+----------+----------------------+--------------------------+

As such, we suspected that the table statistics of orders.dpcz_FK are not accurate. We verified it by running the test below:

mysql> select * from mysql.innodb_index_stats where database_name='cutom' and table_name='orders' and index_name='dpcz_FK';
mysql> select * from mysql.innodb_index_stats  where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom        | orders     | dpcz_FK    | 2022-02-28 12:35:30 | n_diff_pfx01 |      19498 |          50 | d_p_c_z_id                        |
| custom        | orders     | dpcz_FK    | 2022-02-28 12:35:30 | n_diff_pfx02 |   32283087 |         128 | d_p_c_z_id,id                     |
| custom        | orders     | dpcz_FK    | 2022-02-28 12:35:30 | n_leaf_pages |      55653 |        NULL | Number of leaf pages in the index |
| custom        | orders     | dpcz_FK    | 2022-02-28 12:35:30 | size         |      63864 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
mysql> select count(distinct d_p_c_z_id) from orders; 
+----------------------------------------------+
| count(distinct d_p_c_z_id)                   |
+----------------------------------------------+
|                                          195 |
+----------------------------------------------+

mysql> select * from mysql.innodb_index_stats where database_name='cutom' and table_name='orders' and index_name='dpcz_FK';

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

mysql> select count(distinct d_p_c_z_id) from orders;

+----------------------------------------------+

| count(distinct d_p_c_z_id) |

+----------------------------------------------+

| 195 |

+----------------------------------------------+

Bingo! The actual cardinality of the column d_p_c_z_id (of which the index dpcz_FK index on) of the table orders is 195. In the statistics table mysql.innodb_index_stats the stat_value for the index dpcz_FK is 19498, which is incorrect and is a large difference from the actual value! Moreover, the stat_value for an index is supposed to be the cardinality of the column of the table.

Using the correct stat_value 195 for the index dpcz_FK, we can get the actual number of rows to return in line 2 of execution plan #1 to be 32508150/195=166708, and then the estimated rows to be examined will be (194*10%)*166708*(1*4.23%)=136804. As this new value is over 10 times larger than 11045, the estimated row size of execution plan #2, MySQL will now correctly select execution plan #2 without us having to force the use of the index.

But why did MySQL calculate the table statistics incorrectly, and how can we fix it?

To answer that question, we first need to know how MySQL calculates the table statistics and what parameters control the process. Then the path to the solution can be discovered easily.

How InnoDB Calculates the Table Statistics

The Table Statistics can be collected explicitly or automatically. People usually enable(also by default) innodb_stats_auto_recalc to automatically recalculate persistent statistics after the data in a table is changed substantially. When 10% of the rows in the table are changed, InnoDB will recalculate the statistics. Alternatively, we can use ANALYZE TABLE to recalculate statistics explicitly.

Behind the specifications, InnoDB uses the sampling technique known as a random dive which samples random pages from each index on a table to estimate the cardinality of the index. The innodb_stats_persistent_sample_pages controls the number of sampled pages. Refer to the link.

As per the code and description, random sampling is not fully random. The sampling pages are actually selected based on the sampling algorithm. Ultimately, the total number of different key values, namely, the stat_value of the index will be calculated by the formula: N * R * N_DIFF_AVG_LEAF. where:

N: the number of leaf pages

R: the ratio of the number of different key values on level LA to the total number of records on level LA

N_DIFF_AVG_LEAF: the average number of different key values found in all the A leaf pages.

The details of the sampling algorithm code can be found in the link:

With the above understanding, we know that when the index of a table is fragmented, both the number of leaf pages(N) and the ratio of the number of different key values on level LA to the total number of records on level LA(R) becomes more and more inaccurate, and so the calculation of the stat_value may be incorrect. Once that happens, unless the parameter innodb_stats_persistent_sample_pages is changed or the index is reconstructed, recalculation explicitly(manually run ANALYZE TABLE) will not be able to produce the correct stat_value.

Solution: How can we correct the table statistics and prevent it from happening again?

Due to the sampling algorithm as discussed above, we now know that there are only two factors that impact the calculation: the parameter innodb_stats_persistent_sample_pages: A; and how the index is organized.

To allow InnoDB to get the correct table statistics, we will have to either increase the innodb_stats_persistent_sample_pages or rebuild/reconstruct the index. The head-on approach to reconstructing the index is to rebuild the table, for example, execute a no-op alter against the table.

Let’s take a look at the following three examples:

ANALYZE TABLE without rebuilding, keeping innodb_stats_persistent_sample_pages as it is (128), the stat_value slightly changed to 19582, close to the original incorrect 19498, still off. The number of the leaf pages in the index slightly changed from 55653 to 55891, the number of pages in the index also slightly changed from 63864 to 64248:

mysql> show variables = 'innodb_stats_persistent_sample_pages;
+--------------------------------------+-------+
| Variable_name                        | Value |
+--------------------------------------+-------+
| innodb_stats_persistent_sample_pages | 128   |
+--------------------------------------+-------+
mysql> analyze table orders;
+---------------+---------+----------+----------+
| Table         | Op      | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| custom.orders | analyze | status   | OK       |
+---------------+---------+----------+----------+
mysql> select * from mysql.innodb_index_stats  where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom        | orders     | dpcz_FK    | 2022-03-03 21:58:18 | n_diff_pfx01 |      19582 |          50 | d_p_c_z_id                        |
| custom        | orders     | dpcz_FK    | 2022-03-03 21:58:18 | n_diff_pfx02 |   32425512 |         128 | d_p_c_z_id,id                     |
| custom        | orders     | dpcz_FK    | 2022-03-03 21:58:18 | n_leaf_pages |      55891 |        NULL | Number of leaf pages in the index |
| custom        | orders     | dpcz_FK    | 2022-03-03 21:58:18 | size         |      64248 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

mysql> show variables = 'innodb_stats_persistent_sample_pages;

+--------------------------------------+-------+

| Variable_name | Value |

+--------------------------------------+-------+

| innodb_stats_persistent_sample_pages | 128 |

+--------------------------------------+-------+

mysql> analyze table orders;

+---------------+---------+----------+----------+

+---------------+---------+----------+----------+

+---------------+---------+----------+----------+

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

ANALYZE TABLE without rebuilding, but increasing innodb_stats_persistent_sample_pages from 128 to 512, got stat_value to 192 very close to the real cardinality 195. There was a big change in the number of leaf pages in the index, from 55653 to 44188. The number of pages in the index also changed drastically, from 63864 to 50304.

mysql> show variables like '%persistent_sample%';
+--------------------------------------+-------+
| Variable_name                        | Value |
+--------------------------------------+-------+
| innodb_stats_persistent_sample_pages | 512   |
+--------------------------------------+-------+
mysql> analyze table orders;
+---------------+---------+----------+----------+
| Table         | Op      | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| custom.orders | analyze | status   | OK       |
+---------------+---------+----------+----------+
mysql> select * from mysql.innodb_index_stats  where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom        | orders     | dpcz_FK    | 2022-03-09 06:54:29 | n_diff_pfx01 |        192 |         179 | d_p_c_z_id                        |
| custom        | orders     | dpcz_FK    | 2022-03-09 06:54:29 | n_diff_pfx02 |   31751321 |         512 | d_p_c_z_id,id                     |
| custom        | orders     | dpcz_FK    | 2022-03-09 06:54:29 | n_leaf_pages |      44188 |        NULL | Number of leaf pages in the index |
| custom        | orders     | dpcz_FK    | 2022-03-09 06:54:29 | size         |      50304 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

mysql> show variables like '%persistent_sample%';

+--------------------------------------+-------+

| Variable_name | Value |

+--------------------------------------+-------+

| innodb_stats_persistent_sample_pages | 512 |

+--------------------------------------+-------+

mysql> analyze table orders;

+---------------+---------+----------+----------+

+---------------+---------+----------+----------+

+---------------+---------+----------+----------+

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

Rebuilding the table, keeping innodb_stats_persistent_sample_pages as 128, also got the correct stat_value 187 close to the real cardinality 195. The number of leaf pages in the index substantially changed, from 55653 to 43733, and the number of pages in the index also changed from 63864 to 50111.

mysql> show variables = 'innodb_stats_persistent_sample_pages';
+--------------------------------------+-------+
| Variable_name                        | Value |
+--------------------------------------+-------+
| innodb_stats_persistent_sample_pages | 128   |
+--------------------------------------+-------+
mysql> alter table orders engine=innodb;
Query OK, 0 rows affected (11 min 16.37 sec)
mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom        | orders     | dpcz_FK    | 2022-03-07 18:44:43 | n_diff_pfx01 |        187 |         128 | d_p_c_z_id                        |
| custom        | orders     | dpcz_FK    | 2022-03-07 18:44:43 | n_diff_pfx02 |   31531493 |         128 | d_p_c_z_id,id                     |
| custom        | orders     | dpcz_FK    | 2022-03-07 18:44:43 | n_leaf_pages |      43733 |        NULL | Number of leaf pages in the index |
| custom        | orders     | dpcz_FK    | 2022-03-07 18:44:43 | size         |      50111 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

mysql> show variables = 'innodb_stats_persistent_sample_pages';

+--------------------------------------+-------+

| Variable_name | Value |

+--------------------------------------+-------+

| innodb_stats_persistent_sample_pages | 128 |

+--------------------------------------+-------+

mysql> alter table orders engine=innodb;

Query OK, 0 rows affected (11 min 16.37 sec)

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

After the table statistics data is corrected, the MySQL optimizer would choose the correct execution plan as well:

mysql> explain
SELECT count(con.id)          ,
       MAX(DAYNAME(con.date)) ,
       now()                  ,
       pcz.type,
       pcz.c_c
FROM con AS con
         join orders o on con.order_id = o.id
         JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id
         left join c c on con.c_id = c.id
WHERE con.date = current_date()
and pcz.type = "T_D"
GROUP BY con.date, pcz.c_c,  pcz.type;
+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type   | possible_keys     | key      | key_len | ref                                   | rows | filtered | Extra                           |
+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+
|  1 | SIMPLE      | con   | NULL       | ref    | FK_order,IDX_date | IDX_date | 3       | const                                 | 3074 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY           | PRIMARY  | 8       | custom.con.c_id                       |    1 |   100.00 | Using index                     |
|  1 | SIMPLE      | o     | NULL       | eq_ref | PRIMARY,dpcz_FK   | PRIMARY  | 8       | custom.con.order_id                   |    1 |   100.00 | Using where                     |
|  1 | SIMPLE      | pcz   | NULL       | eq_ref | PRIMARY           | PRIMARY  | 8       | custom.o.d_p_c_z_id                   |    1 |    10.00 | Using where                     |
+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+
4 rows in set, 1 warning (0.01 sec)

mysql> explain

SELECT count(con.id) ,

MAX(DAYNAME(con.date)) ,

now() ,

pcz.type,

pcz.c_c

FROM con AS con

join orders o on con.order_id = o.id

JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id

left join c c on con.c_id = c.id

WHERE con.date = current_date()

and pcz.type = "T_D"

GROUP BY con.date, pcz.c_c, pcz.type;

+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+

+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+

| 1 | SIMPLE | con | NULL | ref | FK_order,IDX_date | IDX_date | 3 | const | 3074 | 100.00 | Using temporary; Using filesort |

| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index |

| 1 | SIMPLE | o | NULL | eq_ref | PRIMARY,dpcz_FK | PRIMARY | 8 | custom.con.order_id | 1 | 100.00 | Using where |

| 1 | SIMPLE | pcz | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.o.d_p_c_z_id | 1 | 10.00 | Using where |

+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+

4 rows in set, 1 warning (0.01 sec)

Conclusion

MySQL optimizer depends on the accuracy of the table statistics in order to select the optimal execution plan. We can control the accuracy of the table statistics by changing the parameter innodb_stats_persistent_sample_pages. We can also choose to force a full recalculation of the table statistics by rebuilding/reconstructing the table while defragmenting the indexes, which helps to improve the accuracy of the table statistics. To reconstruct the table, we can directly alter the table with no-op or use pt-online-schema-change to achieve the same effect.

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Ivan Baldo

2 years ago

Is there a tool to automatically determine which stats are off?
I realize that doing a count() of a big table could be expensive but maybe could be done on off peak hours or a replica.

Edwin Wang

Author

Reply to Ivan Baldo

2 years ago

Hey Ivan, AFAIK, there is no such a tool to automatically determine which stats are off. However, doing a count(distinct columns) of a indexed column will only read the index, so usually shouldn’t take too long.

2 years ago

What does level LA mean?

Edwin Wang

Author

Reply to ss

2 years ago

Hello ss,
As mentioned, the details can be found even the code:
https://github.com/mysql/mysql-server/blob/6846e6b2f72931991cc9fd589dc9946ea2ab58c9/storage/innobase/dict/dict0stats.cc

“Definition: N-prefix-boring record is a record on a non-leaf page that equals
the next (to the right, cross page boundaries, skipping the supremum and
infimum) record on the same level when looking at the fist n-prefix columns.
The last (user) record on a level is not boring (it does not match the
non-existent user record to the right). We call the records boring because all
the records on the page below a boring record are equal to that boring record.
We avoid diving below boring records when searching for a leaf page to
estimate the number of distinct records because we know that such a leaf
page will have number of distinct records == 1.
For each n-prefix: start from the root level and full scan subsequent lower
levels until a level that contains at least A*10 distinct records is found.
Lets call this level LA.”

Brandon

1 year ago

Thanks for this writeup! Very helpful.

We’re encountering the opposite issue, wondering if you have any insight.

My understanding is that if the server setting innodb_stats_persistent is turned ON, then cardinality (from SHOW INDEXES for example) should be reading from the table mysql.innodb_index_stats.

That doesn’t seem to be the case from what we’re experiencing.

What we’re seeing is that the result stored in mysql.innodb_index_stats is completely accurate, however SHOW INDEXES is returning something completely wrong. (This is after running ANALYZE and FLUSH on the table).

Further, if we change innodb_stats_persistent to OFF, we still get an incorrect cardinality report from SHOW INDEXES, however it’s slightly less wrong.

Have you seen anything like this before?

We’re on MariaDB 10.6.10, and there seems to be a bug report on something similar but not the exact issue: https://jira.mariadb.org/browse/MDEV-28327

Edwin Wang

Author

Reply to Brandon

1 year ago

Hey Brandon,
innodb_stats_persistent is to specify “whether InnoDB index statistics are persisted to disk”. when it is turned off, “statistics may be recalculated frequently”, so it is possible that “it’s slightly less wrong”. but in the end, it shouldn’t impact the statistic calculation.

innodb_stats_persistent=OFF, innodb_stats_transient_sample_pages would be the factor to adjust the statistic calculation.
innodb_stats_persistent=ON, use innodb_stats_persistent_sample_pages instead.

btw, “As of MySQL 5.6.6, optimizer statistics are persisted to disk by default”
Hope the info help

Best

Brandon

Reply to Edwin Wang

1 year ago

Thanks for the reply Edwin!

The one thing we’re missing is: Why would mysql.innodb_index_stats not match the result of SHOW INDEXES when innodb_stats_persistent=ON?

We have that turned on, and we’ve increased innodb_stats_persistent_sample_pages significantly, and see the correct result in mysql.innodb_index_stats, however SHOW INDEXES still returns a wildly different number for cardinality.

Are we missing something?

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Correcting MySQL Inaccurate Table Statistics for Better Execution Plan

Case study: Incorrect table statistics lead the optimizer to choose a poor execution plan.

How InnoDB Calculates the Table Statistics

Solution: How can we correct the table statistics and prevent it from happening again?

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Correcting MySQL Inaccurate Table Statistics for Better Execution Plan

Case study: Incorrect table statistics lead the optimizer to choose a poor execution plan.

How InnoDB Calculates the Table Statistics

Solution: How can we correct the table statistics and prevent it from happening again?

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation