If you’re using MySQL’s Performance Schema you may use “query digests” as IDs to identify specific query patterns in the events_statements_summary_by_digest Performance Schema Table.
You might assume these hashes are stable between different versions, so, for example, when upgrading from MySQL 5.7 to MySQL 8, you can compare the query response time and other execution details for the same hashes as part of your upgrade process and have confidence that queries run the same (or better) after the upgrade. Unfortunately, you can’t.
For some reason, the hashing algorithm is different between MySQL 5.6, MySQL 5.7, and MySQL 8 even for the most trivial queries:
MySQL 5.6
1 2 3 4 5 6 | mysql> select digest,digest_text from events_statements_summary_by_digest where digest_text='commit'; +----------------------------------+-------------+ | digest | digest_text | +----------------------------------+-------------+ | 83e158a7b21d1f3f1579455e446139ec | COMMIT | +----------------------------------+-------------+ |
MySQL 5.7
1 2 3 4 5 6 7 | mysql> select digest,digest_text from events_statements_summary_by_digest where digest_text='commit'; +----------------------------------+-------------+ | digest | digest_text | +----------------------------------+-------------+ | 20381e49cdd9c4f151494275cec46e08 | COMMIT | +----------------------------------+-------------+ 1 row in set (0.00 sec) |
MySQL 8.0
1 2 3 4 5 6 7 | mysql> select digest,digest_text from events_statements_summary_by_digest where digest_text='commit'; +------------------------------------------------------------------+-------------+ | digest | digest_text | +------------------------------------------------------------------+-------------+ | f2b9be5f0f32ca7507e4ddf133cd2fad0993a1c26fde9caa6f067b1af2b7603c | COMMIT | +------------------------------------------------------------------+-------------+ 1 row in set (0.00 sec) |
MySQL 8.0 is employing a twice as long digest (in my opinion needlessly so, as 128bit already had a very low chance of collision) where MySQL 5.6 and MySQL 5.7 just have different hashes of the same length.
As a side effect of this behavior, you will see duplicate queries in Percona Monitoring and Management (PMM) if running multiple major MySQL versions and Performance Schema is used as a data source:
You can clearly see three “commit” queries in this list, as well as some other likely duplicates.
If you want to avoid this problem in Percona Monitoring and Management, you can use Slow Query Log as your Query Data Source, which is best used with Query Sampling available in Percona Server for MySQL.
Is the reason for this change documented?
I too think it is not ideal and it would simply be better to have a configuration option ps_digest_method or similar so if needed you can configure all servers to use the same mechanism. Probably too late to do that now. We should have paid more attention during 8.0 DMR rollouts and flagged this then. Maybe something to check when 9.0 DMR gets released…
Hi, I think this is a great idea to have a different hashes so you cans specify one
I’m not sure about 5.6 to 5.7 change but I think 8.0 was getting rid of SHA-1 which is considered insecure… though I’m not sure why this would be critical for digests.
Fort the future, maybe Percona can port the most recent algorithm into PMM and use it to avoid duplicating queries?
Very detailed information on MySQL’s Performance Schema. I am looking for more information on MySQL 8.0 and hoping to read more post amazing posts from you.
Hi Peter,
I can understand your surprise, but the digest is not created from the normalized query, it’s done much earlier. The digest is created during parsing tree fragments and of course this differs across versions, and nowhere have we advertised it to be stable across MySQL version
In your case, maybe having a common digest of the normalized query will help you identifying your queries between different version, a new view or a feature request ? 😉
Now about the length of this hash that has been increased, this is because we moved away from MD5 (to sha256, https://dev.mysql.com/doc/dev/mysql-server/latest/structsql__digest__storage.html#ab295f51157fe09403347b5a329c9becd) in MySQL 8.0.
I would recommend for PMM to implement something similar to this then:
mysql> select sha2(statement_digest_text(‘select foo from bar where baz = \’foobar\”), 224);
+——————————————————————————–+
| sha2(statement_digest_text(‘select foo from bar where baz = \’foobar\”), 224) |
+——————————————————————————–+
| 698bb2e6e310db6cb4de647b0105de5a074593a3a7680cb9e38113aa |
+——————————————————————————–+
1 row in set (0.00 sec)
This will provide you a stable and smaller digest.
Thank you for explanation and feature suggestion. This particular function is MySQL 8 only, no ?
We’ll see to how much this is going to be problem for users in PMM and if this is something which needs solving.
Hi, just a question about your screenshot showing those 2 commits at the top of the digest. Why would they appear there, on top of over all other queries? That means that everything is optimized to their max? Is it possible to optimize Commits as well? Thanks!
Hi Diego,
For this workload COMMIT just took most of the time (slow disk) For some workloads when transactions are short and writes happen to data in memory this situation can be expected
To me it seems like a good thing that the queries has different digests in different versions. For example, a query may use different query plans in 5.7 and 8.0. Lumping the statistics from different query plans together, may hide issues with one of the query plans. Another example is that commit processing may differ between versions, e.g., due to changes to redo logging. Hence, I think the real problem here is not that the digests differ, but that PMM does present version numbers for the queries.
Oystein,
Same query may have different plans on the same version as well, for example due to different Index structure or just different stats. I’m all for having some sort of “Plan Hash” to identify that but It is not the same as the Query Hash