analyze tableIn this post, I’ll discuss the fix for lp:1704195 (migrated to PS-2503), which prevents ANALYZE TABLE from blocking all subsequent queries on the same table.

In November 2017, Percona released a fix for lp:1704195 (migrated to PS-2503), created by Laurynas Biveinis. The fix, included with Percona Server for MySQL since versions 5.6.38-83.0 and 5.7.20-18, stops ANALYZE TABLE from invalidating query and table definition cache content for supported storage engines (InnoDB, TokuDB and MyRocks).

Why is this important?

In short, it is now safe to run ANALYZE TABLE in production environments because it won’t trigger a situation where all queries on the same table stack are in the state "Waiting for table flush". Check this blog post for details on how this situation can happen.

Why do we need to run ANALYZE TABLE?

When Optimizer decides which index to choose to resolve the query, it uses statistics stored for this table by storage engine. If the statistics are not up to date, Optimizer might choose the wrong index when it creates the query execution plan. This can cause performance to suffer.

To prevent this, storage engines support automatic and manual statistics updates. While automatic statistics updates usually work fine, there are cases when they do not do their job properly.

For example, InnoDB uses 20 sample 16K pages when it updates persistent statistics, and eight 16K pages when it updates transient statistics. If your data distribution is even, it does not matter how big your table is: even for 1T tables, using a sample of 320K is enough. But if your data isn’t even, statistics might get wrongly created. The solution for this issue is to increase either the innodb_stats_transient_sample_pages or innodb_stats_persistent_sample_pages variable. But increasing the number of pages to examine while collecting statistics leads to longer update runs, and thus higher IO activity, which is probably not what you want to happen often.

To control this, you can disable automatic statistics updates for such tables, and schedule a job that periodically runs  ANALYZE TABLE.

Will it be safe before the fix for lp:1704195 (migrated to PS-2503)?

Theoretically yes, but we could easily hit a situation as described in this blog post by Miguel Angel Nieto. The article describes what if some long-running query starts and doesn’t finish before ANALYZE TABLE. All the queries on the analyzing table get stuck in the state "Waiting for table flush" at some time.

This happens because before the fix,  ANALYZE TABLE worked as follows:

  1. Opens table statistics: concurrent DML operations ( INSERT/UPDATE/DELETE/SELECT) are allowed
  2. Updates table statistics: concurrent DML operations are allowed
  3. Update finished
  4. Invalidates table entry in the table definition cache: concurrent DML operations are forbidden
    1. What happens here is ANALYZE TABLE marks the currently open table share instances as invalid. This does not affect running queries: they will complete as usual. But all incoming queries will not start until they can re-open table share instance. And this will not happen until all currently running queries complete.
  5. Invalidates query cache: concurrent DML operations are forbidden

Last two operations are usually fast, but they cannot finish if another query touched either the table share instance or acquired query cache mutex. And, in its turn, it cannot allow for incoming queries to start.

However ANALYZE TABLE modifies table statistics, not table definition!

Practically, it cannot affect already running queries in any way. If a query started before ANALYZE TABLE finished updating statistics, it uses old statistics. ANALYZE TABLE does not affect data in the table. Thus old entries in the query cache will still be correct. It hasn’t changed the definition of the table. Therefore there is no need to remove it from the table definition cache. As a result, we avoid operations 4 and 5 above.

The fix for lp:1704195 (migrated to PS-2503) removes these additional updates and locks required for them, and makes ANALYZE TABLE always safe to run in busy production environments.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Vojtech Kurka

This is good news, thank you

Luke

good job

Jean-François Gagné

Hi Sveta,

I have a somehow related question about the innodb_stats_transient_sample_pages and innodb_stats_persistent_sample_pages variables. You write that if the data distribution is even, the defaults are good, but if not, those variables should be increased. However, those variables have a global scope, so increasing them will have an effect on all the tables, even for the tables that have even distribution, right ?

A solution to that could be to change the variable just before running an ANALYSE TABLE and fix it back to the original value after, but this is prone to error. Maybe having a session scope for this variable would be useful…

But even a session scope is only a small improvement. The right solution would be to set those at the “table” scope. This way, we would not need to “remember” which value is good and simply running ANALYSE TABLE would do the right thing.

Could you share your thoughts on this subject ? Does Percona have plans to improve InnoDB there ?

Many thanks and best regards,

Jean-François Gagné

Thanks for the pointer Sveta.