If you have enough experience with MySQL, it is very possible that you stumbled upon an unusually slow SELECT COUNT(*) FROM TABLE;
query execution, at least occasionally.
Recently, I had a chance to investigate some of these cases closer, and it stunned me what huge differences there can be depending on the circumstance given the very same table. As the problem turned out to be much more complex than I expected, I decided to write a post to share some facts and observations.
So, why can a simple table row count query be excessively slow while, in other cases pretty fast? Let’s split the possibly related circumstances into two categories.
Well-known/obvious reasons
- Table size
- Storage engine (i.e., MyISAM returns the count immediately by design)
- Concurrent workload
- Memory cached vs. disk reads
Less obvious reasons
- MySQL variant and version
- Transactional context
- Table fragmentation
- Parallel reads
- Table optimization
- Bugs
Let’s take a closer look at some of the less obvious reasons!
MySQL ecosystem dynamics
Both MySQL and MariaDB are very active projects, and new versions have been released very regularly for many years to this day. This has good and bad sides. For the price of new cool features, we often observe various regression problems related to basic performance.
I found it pretty interesting to see how different results are observed depending on the database variant for the same transactional contexts with the simple count query.
MVCC challenge
Let me show two slow log examples of the same count query but executed in two different circumstances:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Time: 240318 0:09:07 # User@Host: msandbox[msandbox] @ localhost [] # Thread_id: 9 Schema: db1 QC_hit: No # Query_time: 2.048288 Lock_time: 0.000098 Rows_sent: 1 Rows_examined: 10000000 # Rows_affected: 0 Bytes_sent: 72 # Pages_accessed: 12050 Pages_read: 3582 Pages_updated: 0 Old_rows_read: 0 # Pages_read_time: 398.2334 Engine_time: 1635.2910 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No # # explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra # explain: 1 SIMPLE sbtest1 index NULL k_1 4 NULL 9863064 10000000.00 100.00 100.00 Using index # SET timestamp=1710716947; select count(*) from sbtest1; |
vs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Time: 240318 0:11:24 # User@Host: msandbox[msandbox] @ localhost [] # Thread_id: 9 Schema: db1 QC_hit: No # Query_time: 688.817720 Lock_time: 0.000089 Rows_sent: 1 Rows_examined: 10000000 # Rows_affected: 0 Bytes_sent: 72 # Pages_accessed: 11510893 Pages_read: 3697864 Pages_updated: 0 Old_rows_read: 4000 # Pages_read_time: 669656.9344 Engine_time: 688246.7927 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No # # explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra # explain: 1 SIMPLE sbtest1 index NULL k_1 4 NULL 9859064 10000000.00 100.00 100.00 Using index # SET timestamp=1710717084; select count(*) from sbtest1; |
Both query instances were executed on the same server and the same, completely idle MariaDB instance. The only difference was that before the second instance, another session started a transaction in which 4k rows were deleted. Here is a quick live example of a similar scenario (faster than above due to a bigger BP):
The select count(*) query, due to being in the REPEATABLE-READ isolation level, did not see the above rows deleted, as the other transaction has not been committed yet. Thanks to the MariaDB verbose slow log, we can nicely determine the situation via the Old_rows_read info, and the cost on the InnoDB engine side via how many pages were read. Now, the cost of having an isolated data view in this case turned out to be surprisingly huge! The query was ~340 times slower! I’d say, much higher than I would have expected!
This problem is completely reproducible, especially when the buffer pool is too small to fit the whole table (BP 128MB vs. table size 2.3GB). I asked myself what was wrong here and how the other MySQL variants and versions behave in that matter. It turns out that there are huge differences!
Below you will find a quick table from my tests on various versions. I distinguish between three different query contexts:
- Executed alone (no other active transactions)
- With an open transaction where some rows were deleted, the query is executed in the same session/transaction
- Delete and count transactions are in separate client sessions
When there is a transactional context impact, in InnoDB status, we can see:
1 2 3 4 5 | ---TRANSACTION 64015, ACTIVE 89 sec 57 lock struct(s), heap size 8312, 4055 row lock(s), undo log entries 4000 MySQL thread id 8, OS thread handle 140189750068992, query id 12 localhost msandbox TABLE LOCK table `db1`.`sbtest1` trx id 64015 lock mode IX RECORD LOCKS space id 3400 page no 5 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 64015 lock_mode X |
The difference in query execution time when it’s run within the same active transaction or as a new connection is visible by comparing the last two columns.
Database variant | Alone | Trx/Same session | Trx/Separate session |
MySQL 5.1.73 | 3.45 sec | 4.43 sec | 11 min 8.14 sec |
MySQL 5.5.62 | 1.34 sec | 1.67 sec | 13 min 43.93 sec |
MySQL 5.6.34 | 1.42 sec | 1.49 sec | 2 min 39.68 sec |
MySQL 5.6.46 | 3.69 sec | 7.13 sec | 14 min 12.96 sec |
MySQL 5.6.51 | 1.52 sec | 4.06 sec | 15 min 28.07 sec |
MySQL 5.7.17 | 2.68 sec | 2.62 sec | 2.73 sec |
MySQL 5.7.18 | 2.00 sec | 2.35 sec | 12 min 44.85 sec |
MySQL 5.7.44 | 2.15 sec | 2.18 sec | 13 min 46.51 sec |
MySQL 8.0.12 | 2.67 sec | 2.73 sec | 2.77 sec |
MySQL 8.0.13 | 1.53 sec | 2.73 sec | 14 min 5.74 sec |
MySQL 8.0.16 | 1.26 sec | 3.30 sec | 12 min 38.26 sec |
MySQL 8.0.17 | 1.93 sec | 1.92 sec | 1.74 sec |
MySQL 8.0.19 | 1.79 sec | 3.60 sec | 1.71 sec |
MySQL 8.0.20 | 8.06 sec | 9.21 sec | 15.88 sec |
MySQL 8.0.36 | 6.86 sec | 6.88 sec | 6.88 sec |
MySQL 8.2.0 | 8.24 sec | 9.94 sec | 8.09 sec |
MySQL 8.3.0 | 8.44 sec | 9.98 sec | 8.06 sec |
MariaDB 10.0.38 | 1.43 sec | 1.76 sec | 13 min 31.66 sec |
MariaDB 10.1.41 | 1.24 sec | 1.31 sec | 2 min 36.65 sec |
MariaDB 10.2.14 | 1.65 sec | 3.40 sec | 45.96 sec |
MariaDB 10.2.44 | 1.83 sec | 1.93 sec | 11 min 58.72 sec |
MariaDB 10.3.39 | 1.947 sec | 11 min 47.183 sec | 13 min 30.429 sec |
MariaDB 10.4.31 | 1.959 sec | 11 min 41.352 sec | 13 min 37.713 sec |
MariaDB 10.5.23 | 1.940 sec | 13 min 14.588 sec | 12 min 49.191 sec |
MariaDB 10.11.7 | 2.591 sec | 13 min 10.440 sec | 13 min 0.717 sec |
MariaDB 11.3.2 | 1.697 sec | 11 min 48.214 sec | 11 min 21.800 sec |
I am not listing Percona Server for MySQL variants here, as the results were very close to MySQL Community, at least in a couple of versions I checked.
Based on the above results, we can see roughly three types of behavior:
- Table count is fast regardless of transactional context (undo log entries presence).
- The query is fast when executed in the same transaction which has deleted rows, but when executed in another transaction, is very slow.
- Query is very slow with the presence of deleted but uncommitted rows, regardless if we run in the same or other transaction.
Of course, the transactional context may be way more complicated, but testing all possible cases would take ages, so I did only the most basic ones. For example, the more and the bigger separate transactions with undo entries, the slower the select query is expected to be.
But my point is – the same test can yield completely different results depending only on the MySQL or MariaDB version!
It is very interesting to see that MySQL 5.7 was all fast until version 5.7.17, as something got broken since 5.7.18 and applied up to the last available minor version. Ironically, the release notes mention an improvement in that area:
- InnoDB: SELECT COUNT(*) performance regressed in some cases due to a modification introduced in MySQL 5.7.2 that caused InnoDB to count rows by traversing the clustered index instead of a smaller secondary index. The modification was reverted. (Bug #23046302, Bug #80580)
In the 8.0 series, it was initially fast, then from 8.0.13 till 8.0.16 was slow in the third context, and then since version 8.0.17, it was all fast. Funny enough, regression started after this change was introduced in 8.0.13:
- Performance of SELECT COUNT(*) FROM tbl_name queries for InnoDB tables was improved for single-threaded workloads and when no extra clauses such as WHERE or GROUP BY are used. (WL #10398)
Versions 8.0.20 and later were a bit slower in all contexts, but I’ll get to this later.
The worst results were with MariaDB; the very old versions were at least fast in the first two contexts, but all the recent ones are very slow in the presence of undo log entries. I reported this issue accordingly:
https://jira.mariadb.org/browse/MDEV-33099
Parallel InnoDB read threads in MySQL 8.0
In MySQL 8.0.14, we can find an important new functionality: for the first time, InnoDB is able to utilize multiple parallel threads for certain queries. In the initial implementation, just the table count:
https://dev.mysql.com/worklog/task/?id=11720
This feature is enabled via the default value of innodb_parallel_read_threads set to 4.
And this default behavior seemed to work well… until version 8.0.19. There was again quite a surprising regression introduced when this “improvement” was made in version 8.0.20:
- InnoDB: Changes to parallel read threads functionality introduced in MySQL 8.0.17 caused a degradation in SELECT COUNT(*) performance. Pages were read from disk unnecessarily. (Bug #30766089)
The problem is that in a situation when the table pages are not in memory, or simply the table is bigger than the buffer pool, using parallel threads results in way more disk reads versus when one thread is used. This means the same operation is now much more expensive! Unfortunately, this problem applies to all the later releases, including the latest innovation 8.3.0.
I reported this problem here: https://bugs.mysql.com/bug.php?id=113482
Why is there such a big negative impact in multi-threaded mode? Unfortunately, the verbose slow log adds even more confusion to the problem, as faster execution shows way more IOPS than the slow one.
With innodb_parallel_read_threads = 4
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Time: 2024-03-22T22:13:29.898885Z # User@Host: msandbox[msandbox] @ localhost [] Id: 9 # Schema: db1 Last_errno: 0 Killed: 0 # Query_time: 8.028897 Lock_time: 0.000005 Rows_sent: 1 Rows_examined: 0 Rows_affected: 0 Bytes_sent: 63 # Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 0 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 84 InnoDB_IO_r_bytes: 1376256 InnoDB_IO_r_wait: 0.097099 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 229 use db1; SET timestamp=1711145601; select count(*) from sbtest1; |
vs. with innodb_parallel_read_threads = 1
:
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Time: 2024-03-22T22:14:06.364565Z # User@Host: msandbox[msandbox] @ localhost [] Id: 9 # Schema: db1 Last_errno: 0 Killed: 0 # Query_time: 3.408407 Lock_time: 0.000005 Rows_sent: 1 Rows_examined: 0 Rows_affected: 0 Bytes_sent: 63 # Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 0 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 137049 InnoDB_IO_r_bytes: 2245410816 InnoDB_IO_r_wait: 0.992006 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 8191 SET timestamp=1711145642; select count(*) from sbtest1; |
Let’s test various parallel threads settings:
innodb_parallel_read_threads | execution time BP 128MB | execution time BP 4GB |
1 | 3.32 sec | 1.74 sec |
2 | 15.62 sec | 0.42 sec |
3 | 10.59 sec | 0.28 sec |
4 | 8.17 sec | 0.21 sec |
5 | 6.51 sec | 0.17 sec |
6 | 5.61 sec | 0.17 sec |
7 | 5.19 sec | 0.13 sec |
8 | 4.72 sec | 0.12 sec |
These tests were made on MySQL 8.3.0, table size 2.3 GB, on a system with 16 CPU cores and an NVMe disk.
For clarity – in all tests, the parallel threads were fully utilizing the CPU cores, which means that with the setting of innodb_parallel_read_threads=8
, the server was using ~800% of CPUs as compared to ~100% with one thread, but still needed more time to run the query (with small BP)!
New table populated with INSERTs vs. Optimized one
This is not all of our surprises! I found out that also the same table will get quite a different result depending on how the data file was created. Here is a quick test to illustrate:
1 2 3 4 5 6 | $ sysbench /usr/share/sysbench/oltp_common.lua --mysql_storage_engine=innodb --table-size=10000000 --tables=1 --mysql-db=db1 --mysql-user=msandbox --mysql-password=msandbox --db-driver=mysql --mysql-socket=/tmp/mysql_sandbox8300.sock prepare sysbench 1.0.20 (using system LuaJIT 2.0.4) Creating table 'sbtest1'... Inserting 10000000 records into 'sbtest1' Creating a secondary index on 'sbtest1'... |
-rw-r—–. 1 mysql mysql 2.3G Mar 24 06:38 sbtest1.ibd
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | mysql > select @@version,@@innodb_parallel_read_threads; +-----------+--------------------------------+ | @@version | @@innodb_parallel_read_threads | +-----------+--------------------------------+ | 8.3.0 | 1 | +-----------+--------------------------------+ 1 row in set (0.00 sec) mysql > select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (17.09 sec) 1 row in set (17.31 sec) 1 row in set (17.30 sec) |
vs.
1 2 3 4 5 6 7 8 9 10 11 12 | mysql > optimize table sbtest1G *************************** 1. row *************************** Table: db1.sbtest1 Op: optimize Msg_type: note Msg_text: Table does not support optimize, doing recreate + analyze instead *************************** 2. row *************************** Table: db1.sbtest1 Op: optimize Msg_type: status Msg_text: OK 2 rows in set (2 min 30.81 sec) |
-rw-r—–. 1 mysql mysql 2.6G Mar 24 07:09 sbtest1.ibd
1 2 3 4 5 6 7 8 9 | mysql > select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (36.68 sec) 1 row in set (34.51 sec) 1 row in set (36.06 sec) |
Moreover, not only does the select count behave differently in both cases, but also other reads. You will find more details in my report: https://bugs.mysql.com/bug.php?id=114468
Other factors/bugs
In this blog post, I am showing only three example factors that I found particularly interesting. However, the number of variations and different factors impacting the basic select cont(*) from table query may be way bigger. Let me name a few other bug report examples:
https://bugs.mysql.com/bug.php?id=114154 – “Parallel read is slow after partition table is read in parallel”
https://bugs.mysql.com/bug.php?id=99717 – “Performance regression of parallel count”
https://bugs.mysql.com/bug.php?id=97709 – “MySQL 8 Select Count(*) is very slow”
https://bugs.mysql.com/bug.php?id=100597 – “INDEX hint does not affect count(*) execution”
https://bugs.mysql.com/bug.php?id=100966 – “select count(*) works very slow for uncommitted data”
https://bugs.mysql.com/bug.php?id=112767 – “SELECT COUNT(*) degraded performance on 8.0 compared to 5.7”
Summary
From these tests, I think we can come to the following conclusions:
- None of the MySQL Community versions demonstrated a problem when the count query was executed within the same transaction as the delete.
- MySQL 5.1, 5.5, and 5.6 are slow with the presence of delete, when executed in a separate transaction (however, due to their long EOL status, I did not test all the oldest minor releases).
- MySQL 5.7 was fast in all contexts up to version 5.7.17 but is slow in the separate transaction context from 5.7.18 till the last one, 5.7.44.
- MySQL 8.0, 8.2, and 8.3 are fast in all contexts except minor versions 8.0.13 – 8.0.16.
- All the recent minor MariaDB versions are slow in the separate transaction context, while the impact is lower in some very old minor versions, like 10.2.14.
- Unfortunately, since major version 10.3, all MariaDB versions are also slow when the count query is executed within the same transaction, making the MVCC impact very high; for example, MariaDB 10.3+ is ~37000% slower than 10.2 and earlier in that context!
- MySQL 8.0.20+ introduced another problem related to parallel reads—when the data does not fit in memory, the count query is more expensive in terms of disk reads; using a single thread allows one to work around that issue.
The overall impact of transactional context depends on the number of concurrent write transactions, rows changed, and other factors, but these very simple tests I presented hopefully illustrate how potentially important it can be in investigating slow queries.
Percona Distribution for MySQL is a complete, stable, scalable, and secure, open source MySQL solution, delivering enterprise-grade database environments for your most critical business applications. Deploy anywhere and implement easily with one-to-one compatibility with MySQL Community Edition.
Nice finding Przemysław. About MVCC, I think the variations are caused by a combination of the plan chosen by the optimizer (you imply this by mentioning Bug #80580 and WL #10398)) and the way InnoDB implements MVCC on secondary indexes (you do not mention this at all). I am giving more info about InnoDB MVCC and secondary indexes below.
My recollection is vague on the subject, not much time to find references, but from what I understood (probably from reading content fromn Mark Callaghan or Marko Mäkelä), secondary indexes do not contain “row level MVCC information”, only “page level MVCC information”. So when this page-level information tells InnoDB that something is “outdated” in the secondary index, a clustered-index lookup is needed to get the “good” version of the row. It looks like the slow-log entry showing a slower query matches this as the number of “Pages_accessed” matches the number of Rows_examined (11.510.893 and 10.000.000). So with DELETE, if the optimizer uses the clustered-index, things will be fast (might scan more data), but if using a secondary index, the best case is better (scanning less data) and the worse case is much worse (needing a PK lookup for each row).
I actually found a reference to this in Marko’s talk at the MariaDB 2023 (Un)Conference: How InnoDB Undo Logs and the Purge of History Work (goto 5:38 in YouTube video).