pt-stalk MySQL PerconaThe combination of the information obtained from the “pt-summaries” discussed in the previous posts of this series (Part 1: The Percona Support Way, Part 2: Knowing the Server, Part 3: What MySQL?) helps us come up with the first impression of a MySQL server. However, apart from the quick glance we get at two samples of a selective group of MySQL status variables, they provide what I call a “static” view of the server, akin to looking at a still picture of an engine. We get the chance to spot some major discrepancies in the MySQL configuration in view of the available resources in the host and adjust those following some best practices but the real tuning starts by observing how the engine performs in motion.

pt-stalk is the final tool in our Percona Toolkit trifecta: we use it to collect data from a running server, from both the OS side as well as MySQL’s, so we can then analyze how it is operating. For better gains, you should run pt-stalk during a period of high activity (ideally, peak time). It can also be used to collect data while the server is undergoing a performance issue, for troubleshooting. 

If you have the Percona Toolkit installed on the server, you can run pt-stalk like follows:

A few observations:

  • Options –no-stalk –iterations=2 –sleep=30 are used to collect two sets of diagnostic data with a 30-second interval between then; since –run-time is not defined, the default value of 30 seconds is employed thus the command above provides a 1-minute worth of data.
  • By default, pt-stalk will save the captured data under /var/lib/pt-stalk. You can choose a different destination directory with option –dest .
  • If your system is configured in such a way that you don’t need to provide credentials to access MySQL then you can remove the — –user=root –password=<mysql-root-pass> section.
  • Many of the data collection tools used by pt-stalk are part of the sysstat package so make sure it is installed on the server.

Let’s have a look at the data collected.

A Mix of OS and MySQL Diagnostics Data

Part of the data collected by pt-stalk is purely OS-related, which makes the tool partially useful for inspecting other kinds of Linux servers as well. The other part is purely MySQL. Here’s the list of files generated by the command above in one of my test servers – note there are two groups of files below, prefixed with timestamps set 30 seconds apart:

We won’t be examining all of these in detail here; I’ll focus on the most important ones (in general), following the order (and logic) I usually employ. As was the case in the previous posts, the excerpts of data used in the following sections do not all come from the same pt-stalk collection, not even from the same server; they are assorted from different sources to better illustrate the case at hand.

Was the Server Under Moderate or High Load When the Data was Captured?

My first stop tend to be the first top sample; there we can see reasonably clear if the server was under high load when pt-stalk was run, and for how long:

In the sample above, we can see the server was undergoing an increase of load, “averaging” close to 15 for the past 15 minutes, 18 for the past 5 minutes, and finally 20 in the last minute. Just remember that the load metric is dependent on the number of cores in a server. This other sample below may look scarier but it comes from a box with 192 cores; while this is an oversimplification of the metric, we could say the server was operating at (76.9/192=) 40% of its maximum processing capacity in the last minute:

It is not a problem if the data was captured under low or moderate load, it just changes the way we should look at and analyze the rest of it.

We can also have a good idea of overall memory usage, and how much of it was allocated to MySQL. From there, I like to have a sneaky peek at meminfo as well; sometimes we find a server with more than a little part of the memory converted into transparent huge pages (THP):

No swap space and 55% of the total memory converted in THP doesn’t look that good.

Depicting CPU Usage and I/O

My next stop is the mpstat sample, here’s one from a 16-core server under high load:

MySQL should be run by a regular, non-superuser (named mysql by default), thus all database work should be accounted for in the usr column above (which, in a dedicated server, is all about MySQL). In the sample above we see 2 cores pegged at 100% utilization; unless this sample came from an exceptional period of high load, this is a fine example of a server we should probably not have running in powersave mode (for CPU scaling governor, as seen in the 2nd post of this series) and have all cores running at full speed instead.

In the sample above we find a very good balance of load: all cores are working, and working hard. But picture a similar sample where you only see one of several cores at 100% usr load: in a replica, this is a sign replication is operating in single-thread mode. Back in the days of MySQL 5.5, and even 5.6, this was mostly the norm: we sometimes saw servers with one core running at 100% all time and replication lag continuously increasing. This was a time when replicas with less, but faster cores excelled. Multi-threaded replication appeared in MySQL 5.6 but it was limited to one database per (replication) thread; if your whole schema was organized around a single database, or if only one of the databases received most of the writes, it didn’t help much. Multi-threaded replication really blossomed in MySQL 5.7, with the introduction of the LOGICAL_CLOCK mode. The variables file contains all MySQL settings and you can use it to check how is replication configured in a replica:

Another important column to monitor in mpstat is the iowait column: if you see values repeatedly above 1.0, particularly under moderate load, it may be a sign that the storage subsystem is a limiting factor. Double-check in the iostat sample if the average queue size (avgqu-sz) depth is often bigger than 1:

In the sample above, we can see that sdb is already saturated with around 1000 write IOPS. Sometimes the IO capacity of the system is overestimated and the settings are set too high, making InnoDB believe it has more IOPS at its disposal than there actually is:

It is a good practice to measure the practical limit of IOPS that can be provided by your disks. Overestimating innodb_io_capacity may actually lead to worse performance and stalls in response times. For more information on this, see Tuning MySQL/InnoDB Flushing for a Write-Intensive Workload.

The last stop in my usual tour around the OS data collected by pt-stalk is the vmstat sample, which I mostly use to check if there was some swapping taking place when the data was collected:

While swapping is detrimental to performance, as discussed in the 2nd post, I’m one to advocate in favor of having swap space configured in a server in most cases – just make sure to adjust swappiness to a minimum as explained there. You can check that is the case by looking at the sysctl sample:

MySQL in Motion

This is my favorite part: mysqladmin files contain samples from SHOW GLOBAL STATUS and while it is difficult to make sense of the raw data, pt-mext, another tool from the Percona Toolkit, can be used to process them and provide a clear view of the status counters variation over time, using the first sample in the series as the base reference. Here’s how you can use it:

Let the fun begin!

Does My Hot Data Fit in Memory?

This is important information to have: is the InnoDB Buffer Pool big enough to store the data that is accessed more often? If it does, it means very little data is fetched from the tablespace files on the disk (which is slow to load) and that most of it is found in memory already. We get that ratio from two InnoDB status counters:

  • Innodb_buffer_pool_read_requests indicates the number of (logical) read requests
  • Innodb_buffer_pool_reads indicates the number of (logical) read requests that could not be satisfied from data already present in the Buffer Pool; the data had to first be read from disk and then loaded into the Buffer Pool to be served.

The “magic” of pt-mext is shown above: the first value presented for each metric comes from the first sample collected, which provides a base reference, while the values for all consecutive samples, taken 1 second apart, are the difference between them and the previous sample. This way, we get a clear idea of how the metric is changing over time.

Here are the values for the first four samples of Innodb_buffer_pool_reads used in the excerpt above, to better illustrate this function:

Raw valuept-mext
94200279420027
9420039+12
9420056+17
9420063+7

Looking at Temporary Tables

Another important aspect we discussed in the last post that is key to MySQL tuning is the use of temporary tables. If there are too many of them being created every second:

  1. Your queries may be needing some optimization;
  2. They may be taking too much of the memory available and the situation may become critical (as in running out of memory) during peaks of high activity/concurrency.

There’s another element to it: if they are growing bigger than the threshold for temporary tables to remain in-memory:

they will be moved to the disk (becoming on-disk temporary tables) and your queries will take more time to complete, which may also create a vicious cycle during moments of high concurrency. Here’s a somewhat extreme example from a server with only 126G of RAM and a Buffer Pool of 60G:

  • Created_tmp_tables indicates the total number of temporary tables created by the server, be it in-memory or on-disk.
  • Created_tmp_disk_tables only accounts for temporary tables created on-disk

In the excerpt above, we see a high number of temporary tables created every second, with most of them being created as or converted to on-disk temporary tables. If we consider the latter option, that they are being created in-memory until reaching the threshold set by tmp_table_size, the amount of memory allocated for temporary tables during peak time may surpass what remains available in the server. That is what was happening with this server until they downsized the buffer pool to the current value of 60G as a short term solution; despite having swappiness set to 1 and theoretically available memory in the OS cache, swap space was almost depleted due to memory pressure:

The long-term solution for them was to work on optimizing their queries to decrease the amount and size of temporary tables utilized by them.

Contention at the Table Cache Level

When a connection runs a query that accesses a given table, the table is opened and the Opened_tables counter is increased. If there are ten connections running the same query concurrently, the table will be opened ten times; the reference for a given opened table is not shared by different client sessions because each may have a different state set for the target table. Note this impacts the number of file descriptors used by MySQL.

For performance reasons, the reference for each opened table is kept in a cache. When that cache is full and someone tries to open a table that is not in the cache, MySQL needs to make space for it by either finding a table that is no longer being in use by any thread and recycling its entry on the table cache or by temporarily expanding the table cache.

The general advice suggests that if the value for Opened_tables increases constantly, it’s a sign that the table cache might be too small:

I rather look at the number of cache misses and overflows, which provides a more direct indication of contention at the table cache level:

As discussed in the previous post, pay attention to the relative size of each table cache instance:

Ever since the table cache has been partitioned to perform better in highly-concurrent environments, it is no longer one big cache with table_open_cache entries: this value is now divided by the number of table_open_cache_instances. If the number of entries per instance is low, it won’t scale well; you may need to increase table_open_cache, or yet decrease the number of instances.

There are a lot of other status variables we could be covered here but I hope you have got the general idea of how we can use pt-mext to process mysqladmin samples from the examples above. We shall keep going, there are two essential pieces of data we cannot leave uncovered.

Looking Inside the (Real) Engine

And that would be InnoDB. Yes, we get a lot of InnoDB-related metrics covered in the status variables but there is precious information about the activity of the storage engine we can only find in the output of SHOW ENGINE INNODB STATUS (“SEIS”) which we find in the innodbstatus files.

There is a lot of information in there, I reckon it is a common feeling to feel a bit lost when trying to make sense of it all. The trick is to break it down into sections; you will usually focus on a subset of them, according to what you are looking for. 

Transactions Running

InnoDB is a transactional storage engine and SEIS is one of the best places to look for information about them. Open one of the innodbstatus (there are two collected for each of the iterations executed) and have a general look. Before I do it myself, I like to get a general idea about the number of transactions inside InnoDB; I use grep for this:

That’s a lot of transactions, but it doesn’t mean they are all running:

If you have a lot of active transactions running inside InnoDB and they are taking too much time to complete, getting just a bit of CPU cycles to process each time and then going back to waiting for more, generating a lot of context switch, you may benefit from limiting the maximum number of transactions that can be allowed inside InnoDB at the same time – the rest of them will have to wait at the door, a concept that is not much different than using a connection pooler. The variable that can be used for this is innodb_thread_concurrency: the default value of zero indicates there is no limit in place.

This is one of the best examples I have in mind when I think about “tuning the engine for the race track”: the workload dictates how MySQL should be tuned. In a high concurrency system, if there are queries that compete for the same resources blocking each other, it may be better to allow only a fraction of them to run at a time so they complete faster and leave. In cases like this, how many threads should be allowed to run concurrently is a matter of experimentation: try with values that are multiples of the number of cores in the server.

Long-Running Transactions

You will find the oldest, longest-running transactions on the bottom of the list of transactions: if you are troubleshooting a performance problem, there is a good chance that those transactions are involved – either as culprits or victims of it.

Long-running transactions can be very detrimental to general server performance: they drive the InnoDB history list length high and can hold locks needed by other queries to complete. Here’s an example of such:

You can investigate those further by looking for them in the respective transactions file:

processlist file:

and lock-waits file:

I reckon I’m stopping short of providing a complete procedure for investigating locking issues; this is another subject that warrants more attention.

Checkpointing and Redo Log Space

We briefly discussed the importance of tuning the redo log space (innodb_log_file_size x innodb_log_files_in_group) in the previous post, and how it should be big enough to optimize the checkpointing mechanism. There is an old formula that can still be used for estimation, and the data for this can be found in SEIS. If we used the first innodbstatus file of each of the two samples:

We can obtain the number of bytes written in this 30-second space and make a rough projection for one hour:

Such a projection is very dependent on the timing when the SEIS samples were captured; you may want to track Log sequence number during an extended period of time but do note it is important that the redo log be tuned for the most demanding write periods, which usually takes place during peak time. You certainly don’t need to have it covered for a full hour; 15-30 minutes should be enough for checkpointing to operate in an optimized way:

Adaptive Hash Index

InnoDBs can make use of a special feature to index values that are being accessed frequently, called Adaptive Hash Index, or AHI, for short. As the name suggests, these are built as hash indexes in memory, on top of the B-Tree table indexes we usually talk about. They allow for very fast hashed lookups. But it won’t be helpful in workloads where most of the searches are non-hash searches:

In the excerpt above, we find that only 1% of the searches are “hashable”: in cases like this, disabling the AHI feature (innodb_adaptive_hash_index) can improve performance. AHI will have a bigger impact (be it positive or negative) on read-only workloads so it may make sense to have it enabled on your replicas but not on the master – or vice-versa. This highlights another interesting point when inspecting database servers: it is very likely you will find different variations of your workload if you split traffic into writes (master) and reads (replicas); you should analyze (and configure) your servers independently.

In other cases, the rate of hashable searches can be predominant but the system may still suffer contention from this mechanism. This used to be the case in MySQL 5.6 and earlier versions, when “the adaptive hash index search system was protected by a single latch (btr_search_latch) which could become a point of contention”. This is apparent when we see many threads waiting on an RW-latch created in btr0sea.c in the upper part of the SEIS output, for example. This has been mitigated in MySQL 5.7 with the possibility of partitioning the AHI search mechanism (innodb_adaptive_hash_index_parts): “each index is bound to a specific partition, with each partition protected by a separate latch”.  

If you are in doubt about the effectiveness of the AHI for your workload and can test operating with it disabled for comparison, that would be a simple path to pursuit.

A Quick Note on Processlist

We mentioned processlist samples briefly above: they contain basic information about ongoing connections to MySQL, such as what their source is and what they are currently doing, although not always in exact terms. There are moments when we need to inspect this file in search of a specific thread like we did earlier, but we can also extract more high-level stats from these samples. The following command will sort connections by Command:

Please note you cannot simply look at the overall raw numbers above: each processlist file contains 29 samples of SHOW FULL PROCESSLIST; averaging these values does not provide an accurate view of the system at the time, but it does provide a pretty good general representation of it. Sorting by State is even better in this sense:

By looking at the excerpt above, I would be inclined to further examine the efficiency of the table cache and that of the redo log space.

One last point that almost didn’t make this document: the MyISAM storage engine has its days counted. In MySQL 8.0, the move was made that even system tables are not MyISAM anymore. In certain situations, you may resort back to MyISAM for on-disk temporary tables (default_tmp_storage_engine). Just have the following information in mind: for obvious reasons, MyISAM tables aren’t featured in the output of SHOW ENGINE INNODB STATUS: if you have a performance issue involving a query and you can’t find anything about it in SEIS, chances are that the underlying tables are not InnoDB; look for clues about the query in SHOW PROCESSLIST instead. 

Daemon Mode

You can have pt-stalk running in the background as a daemon, ready to trigger when a predefined condition is met. By default, a threshold for Threads_running is used, but you can also employ your own trigger conditions through a custom function!

In the Next Post …

The best complement to our beloved Percona Toolkit trifecta is none other than Percona Monitoring and Management (PMM). It continually collects MySQL status variables for your servers and plots the metrics in Grafana panels that are easy to interpret. Plus, it includes a very rich Query Analytics dashboard that can help you identify your top slow queries. In the final post of this series, we have a look at how we can obtain much of the same information we got from the Percona Toolkit tools (and sometimes more) from PMM.

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Vasanth S

Hi Fernando,

Very detailed and excellent explanation. Kudos. \

Regarding ‘Contention at the Table Cache level’, you’ve stated that ‘this impacts the number of file descriptors used by MySQL‘ meaning that the same table opened in multiple instances will have different file descriptors for the same file.

I tried to recreate the above scenario by executing a ‘select * from tablename’ on a table which has 18 crore records in one session and another select query ‘select * from tablename order by id desc limit 1000’ on the same table in another session. I monitored the system calls using strace. From the strace output, we found out that session2 used the same file descriptor as session1 to retrieve data while the query in session1 was still running. Please note that the value of table_open_cache_instances was 16.

If a table is open, another session accessing the same table should open the table .ibd file using a new file descriptor. But that is not the case here.

When we say increasing table_open_cache_instances reduces contention, what contention do we exactly mean here? table level? In source code, it is mentioned that by increasing instances we can make sure to lock ‘m_lock’ instead of having to lock the ‘LOCK_OPEN’. From my understanding, this lock is unlocked in a very small span of time and don’t think it causes any overhead.

Please share your insights on this. Also, what do you think should be the ideal value for table_open_cache_instances?