Unstable Query Digests in MySQLIf you’re using MySQL’s Performance Schema you may use “query digests” as IDs to identify specific query patterns in the events_statements_summary_by_digest Performance Schema Table.

You might assume these hashes are stable between different versions, so, for example, when upgrading from MySQL 5.7 to MySQL 8, you can compare the query response time and other execution details for the same hashes as part of your upgrade process and have confidence that queries run the same (or better) after the upgrade.   Unfortunately, you can’t.

For some reason, the hashing algorithm is different between MySQL 5.6,  MySQL 5.7, and MySQL 8 even for the most trivial queries:

MySQL 5.6

 

MySQL 5.7

 

MySQL 8.0

 

MySQL 8.0 is employing a twice as long digest (in my opinion needlessly so, as 128bit already had a very low chance of collision) where MySQL 5.6 and MySQL 5.7 just have different hashes of the same length.

As a side effect of this behavior, you will see duplicate queries in Percona Monitoring and Management (PMM) if running multiple major MySQL versions and Performance Schema is used as a data source:

Unstable Query Digests in MySQL

You can clearly see three “commit” queries in this list, as well as some other likely duplicates.

If you want to avoid this problem in Percona Monitoring and Management, you can use Slow Query Log as your Query Data Source, which is best used with Query Sampling available in Percona Server for MySQL.

10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
sjmudd

Is the reason for this change documented?

I too think it is not ideal and it would simply be better to have a configuration option ps_digest_method or similar so if needed you can configure all servers to use the same mechanism. Probably too late to do that now. We should have paid more attention during 8.0 DMR rollouts and flagged this then. Maybe something to check when 9.0 DMR gets released…

Federico Razzoli

Fort the future, maybe Percona can port the most recent algorithm into PMM and use it to avoid duplicating queries?

Mark Denial

Very detailed information on MySQL’s Performance Schema. I am looking for more information on MySQL 8.0 and hoping to read more post amazing posts from you.

lefred

Hi Peter,

I can understand your surprise, but the digest is not created from the normalized query, it’s done much earlier. The digest is created during parsing tree fragments and of course this differs across versions, and nowhere have we advertised it to be stable across MySQL version
In your case, maybe having a common digest of the normalized query will help you identifying your queries between different version, a new view or a feature request ? 😉

Now about the length of this hash that has been increased, this is because we moved away from MD5 (to sha256, https://dev.mysql.com/doc/dev/mysql-server/latest/structsql__digest__storage.html#ab295f51157fe09403347b5a329c9becd) in MySQL 8.0.

I would recommend for PMM to implement something similar to this then:

mysql> select sha2(statement_digest_text(‘select foo from bar where baz = \’foobar\”), 224);
+——————————————————————————–+
| sha2(statement_digest_text(‘select foo from bar where baz = \’foobar\”), 224) |
+——————————————————————————–+
| 698bb2e6e310db6cb4de647b0105de5a074593a3a7680cb9e38113aa |
+——————————————————————————–+
1 row in set (0.00 sec)

This will provide you a stable and smaller digest.

Diego

Hi, just a question about your screenshot showing those 2 commits at the top of the digest. Why would they appear there, on top of over all other queries? That means that everything is optimized to their max? Is it possible to optimize Commits as well? Thanks!

Øystein Grøvlen

To me it seems like a good thing that the queries has different digests in different versions. For example, a query may use different query plans in 5.7 and 8.0. Lumping the statistics from different query plans together, may hide issues with one of the query plans. Another example is that commit processing may differ between versions, e.g., due to changes to redo logging. Hence, I think the real problem here is not that the digests differ, but that PMM does present version numbers for the queries.