MySQL 101: Monitor Disk I/O with pt-diskstats

Here on the Percona Support team we often ask customers to retrieve disk stats to monitor disk IO and to measure block devices iops and latency. There are a number of tools available to monitor IO on Linux. iostat is one of the popular tools and Percona Toolkit, which is free, contains the pt-diskstats tool for this purpose. The pt-diskstats tool is similar to iostat but it’s more interactive and contains extended information. pt-diskstats reports current disk activity and shows the statistics for the last second (which by default is 1 second) and will continue until interrupted. The pt-diskstats tool collects samples of /proc/diskstats.

In this post, I will share some examples about how to monitor and check to see if the IO subsystem is performing properly or if any disks are a limiting factor – all this by using the pt-diskstats tool.

pt-diskstats output consists on number of columns and in order to interpret pt-diskstats output we need to know what each column represents.

rd_s tells about number of reads per second while wr_s represents number of writes per second.
rd_rt and wr_rt shows average response time in milliseconds for reads & writes respectively, which is similar to iostat tool output await column but pt-diskstats shows individual response time for reads and writes at disk level. Just a note, modern iostat splits read and write latency out, but most distros don’t have the latest iostat in their systat (or equivalent) package.
rd_mrg and wr_mrg are other two important columns in pt-diskstats output. *_mrg is telling us how many of the original operations the IO elevator (disk scheduler) was able to merge to reduce IOPS, so *_mrg is telling us a quite important thing by letting us know that the IO scheduler was able to consolidate many or few operations. If rd_mrg/wr_mrg is high% then the IO workload is sequential on the other hand, If rd_mrg/wr_mrg is a low% then IO workload is all random. Binary logs, redo logs (aka ib_logfile*), undo log and doublewrite buffer all need sequential writes.
qtime and stime are last two columns in pt-diskstats output where qtime reflects to time spent in disk scheduler queue i.e. average queue time before sending it to physical device and on the other hand stime is average service time which is time accumulated to process the physical device request. Note, that qtime is not discriminated between reads and writes and you can check if response time is higher for qtime than it signal towards disk scheduler. Also note that service time (stime field and svctm field in in pt-diskstats & iostat output respectively) is not reliable on Linux. If you read the iostat manual you will see it is deprecated.

Along with that, there are many other parameters for pt-diskstats – you can found full documentation here. Below is an example of pt-disktats in action. I used the –devices-regex option which prints only device information that matches this Perl regex.

$ pt-diskstats --devices-regex=sd --interval 5
#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime
1.1 sda   21.6   22.8    0.5      45%    1.2  29.4 275.5   4.0    1.1      0%   40.0  145.1   65%   158 297.1 155.0   2.1
1.1 sdb   15.0   21.0    0.3      33%    0.1   5.2   0.0   0.0    0.0      0%    0.0    0.0   11%     1  15.0   0.5   4.7
1.1 sdc    5.6   10.0    0.1       0%    0.0   5.2   1.9   6.0    0.0     33%    0.0    2.0    3%     0   7.5   0.4   3.6
1.1 sdd    0.0    0.0    0.0       0%    0.0   0.0   0.0   0.0    0.0      0%    0.0    0.0    0%     0   0.0   0.0   0.0
5.0 sda   17.0   14.8    0.2      64%    3.1  66.7 404.9   4.6    1.8     14%  140.9  298.5  100%   111 421.9 277.6   1.9
5.0 sdb   14.0   19.9    0.3      48%    0.1   5.5   0.4 174.0    0.1     98%    0.0    0.0   11%     0  14.4   0.9   2.4
5.0 sdc    3.6   27.1    0.1      61%    0.0   3.5   2.8   5.7    0.0     30%    0.0    2.0    3%     0   6.4   0.7   2.4
5.0 sdd    0.0    0.0    0.0       0%    0.0   0.0   0.0   0.0    0.0      0%    0.0    0.0    0%     0   0.0   0.0   0.0

$ pt-diskstats --devices-regex=sd --interval 5

#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime

1.1 sda 21.6 22.8 0.5 45% 1.2 29.4 275.5 4.0 1.1 0% 40.0 145.1 65% 158 297.1 155.0 2.1

1.1 sdb 15.0 21.0 0.3 33% 0.1 5.2 0.0 0.0 0.0 0% 0.0 0.0 11% 1 15.0 0.5 4.7

1.1 sdc 5.6 10.0 0.1 0% 0.0 5.2 1.9 6.0 0.0 33% 0.0 2.0 3% 0 7.5 0.4 3.6

1.1 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0

5.0 sda 17.0 14.8 0.2 64% 3.1 66.7 404.9 4.6 1.8 14% 140.9 298.5 100% 111 421.9 277.6 1.9

5.0 sdb 14.0 19.9 0.3 48% 0.1 5.5 0.4 174.0 0.1 98% 0.0 0.0 11% 0 14.4 0.9 2.4

5.0 sdc 3.6 27.1 0.1 61% 0.0 3.5 2.8 5.7 0.0 30% 0.0 2.0 3% 0 6.4 0.7 2.4

5.0 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0

These are the stats from 7200 RPM SATA disks. As you can see, the write-response time is very high and most of that is made up of IO queue time. This shows the problem exactly. The problem is that the IO subsystem is not able to handle the write workload because the amount of writes that are being performed are way beyond what it can handle. It means the disks cannot service every request concurrently. The workload would actually depend a lot on where the hot data is stored and as we can see in this particular case the workload only hits a single disk out of the 4 disks. A single 7.2K RPM disk can only do about 100 random writes per second which is not a lot considering heavy workload.

It’s not particularly a hardware issue but a hardware capacity issue. The kind of workload that is present and the amount of writes that are performed per second are not something that the IO subsystem is able to handle in an efficient manner. Mostly writes are generated on this server as can be seen by the disk stats.

Let me show you a second example. Here you can see read latency. rd_rt is consistently between 10ms-30ms. It depends on how fast the disks are spinning and the number of disks. To deal with it possible solutions would be to optimize queries to avoid table scans, use memcached where possible, use SSD’s as it can provide good I/O performance with high concurrency. You will find this post useful on SSD’s from our CEO, Peter Zaitsev.

#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime
1.0 sdb    33.0  29.1      0.9     0%    1.1  34.7   7.0   10.3    0.1     61%    0.0   0.4   99%   1    40.0  2.2  19.5
1.0 sdb1    0.0   0.0      0.0     0%    0.0   0.0   7.0   10.3    0.1     61%    0.0   0.4    1%   0     7.0  0.0   0.4
1.0 sdb2   33.0  29.1      0.9     0%    1.1  34.7   0.0    0.0    0.0      0%    0.0   0.0   99%   1    33.0  3.5  30.2

1.0 sdb    81.9  28.5      2.3     0%    1.1  14.0   0.0    0.0    0.0      0%    0.0   0.0   99%   1    81.9  2.0  12.0
1.0 sdb1    0.0   0.0      0.0     0%    0.0   0.0   0.0    0.0    0.0      0%    0.0   0.0    0%   0     0.0  0.0   0.0
1.0 sdb2   81.9  28.5      2.3     0%    1.1  14.0   0.0    0.0    0.0      0%    0.0   0.0   99%   1    81.9  2.0  12.0

1.0 sdb    50.0  25.7      1.3     0%    1.3  25.1  13.0   11.7    0.1     66%    0.0   0.7   99%   1    63.0  3.4  11.3
1.0 sdb1   25.0  21.3      0.5     0%    0.6  25.2  13.0   11.7    0.1     66%    0.0   0.7   46%   1    38.0  3.2   7.3
1.0 sdb2   25.0  30.1      0.7     0%    0.6  25.0   0.0    0.0    0.0      0%    0.0   0.0   56%   0    25.0  3.6  22.2

#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime

1.0 sdb 33.0 29.1 0.9 0% 1.1 34.7 7.0 10.3 0.1 61% 0.0 0.4 99% 1 40.0 2.2 19.5

1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 7.0 10.3 0.1 61% 0.0 0.4 1% 0 7.0 0.0 0.4

1.0 sdb2 33.0 29.1 0.9 0% 1.1 34.7 0.0 0.0 0.0 0% 0.0 0.0 99% 1 33.0 3.5 30.2

1.0 sdb 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0

1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0

1.0 sdb2 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0

1.0 sdb 50.0 25.7 1.3 0% 1.3 25.1 13.0 11.7 0.1 66% 0.0 0.7 99% 1 63.0 3.4 11.3

1.0 sdb1 25.0 21.3 0.5 0% 0.6 25.2 13.0 11.7 0.1 66% 0.0 0.7 46% 1 38.0 3.2 7.3

1.0 sdb2 25.0 30.1 0.7 0% 0.6 25.0 0.0 0.0 0.0 0% 0.0 0.0 56% 0 25.0 3.6 22.2

From the below diskstats output it seems that IO is saturated between both reads and writes. This can be noticed with high value for columns rd_s and wr_s. In this particular case, consider having disks in either RAID 5 (better for read only workload) or RAID 10 array is good option along with battery-backed write cache (BBWC) as single disk can really be bad for performance when you are IO bound.

device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s  wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt  busy in_prg io_s  qtime stime
sdb1  362.0  27.4    9.7     0%    2.7    7.5   525.2  20.2    10.3    35%    6.4    8.0   100%   0    887.2  7.0   0.9
sdb1  439.9  26.5   11.4     0%    3.4    7.7   545.7  20.8    11.1    34%    9.8   11.9   100%   0    985.6  9.6   0.8
sdb1  576.6  26.5   14.9     0%    4.5    7.8   400.2  19.9     7.8    34%    6.7   10.9   100%   0    976.8  8.6   0.8
sdb1  410.8  24.2    9.7     0%    2.9    7.1   403.1  18.3     7.2    34%   10.8   17.7   100%   0    813.9 12.5   1.0
sdb1  378.4  24.6    9.1     0%    2.7    7.3   506.1  16.5     8.2    33%    5.7    7.6   100%   0    884.4  6.6   0.9
sdb1  572.8  26.1   14.6     0%    4.8    8.4   422.6  17.2     7.1    30%    1.7    2.8   100%   0    995.4  4.7   0.8
sdb1  429.2  23.0    9.6     0%    3.2    7.4   511.9  14.5     7.2    31%    1.2    1.7   100%   0    941.2  3.6   0.9

device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime

sdb1 362.0 27.4 9.7 0% 2.7 7.5 525.2 20.2 10.3 35% 6.4 8.0 100% 0 887.2 7.0 0.9

sdb1 439.9 26.5 11.4 0% 3.4 7.7 545.7 20.8 11.1 34% 9.8 11.9 100% 0 985.6 9.6 0.8

sdb1 576.6 26.5 14.9 0% 4.5 7.8 400.2 19.9 7.8 34% 6.7 10.9 100% 0 976.8 8.6 0.8

sdb1 410.8 24.2 9.7 0% 2.9 7.1 403.1 18.3 7.2 34% 10.8 17.7 100% 0 813.9 12.5 1.0

sdb1 378.4 24.6 9.1 0% 2.7 7.3 506.1 16.5 8.2 33% 5.7 7.6 100% 0 884.4 6.6 0.9

sdb1 572.8 26.1 14.6 0% 4.8 8.4 422.6 17.2 7.1 30% 1.7 2.8 100% 0 995.4 4.7 0.8

sdb1 429.2 23.0 9.6 0% 3.2 7.4 511.9 14.5 7.2 31% 1.2 1.7 100% 0 941.2 3.6 0.9

The following example reflects write heavy activity but write-response time is very good, under 1ms, which shows disks are healthy and capable of handling high number of IOPS.

#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime
1.0  dm-0  530.8 16.0    8.3     0%    0.3    0.5  6124.0  5.1    30.7    0%     1.7    0.3  86%    2   6654.8  0.2  0.1
2.0  dm-0  633.1 16.1    10.0    0%    0.3    0.5  6173.0  6.1    36.6    0%     1.7    0.3  88%    1   6806.1  0.2  0.1
3.0  dm-0  731.8 16.0    11.5    0%    0.4    0.5  6064.2  5.8    34.1    0%     1.9    0.3  90%    2   6795.9  0.2  0.1
4.0  dm-0  711.1 16.0    11.1    0%    0.3    0.5  6448.5  5.4    34.3    0%     1.8    0.3  92%    2   7159.6  0.2  0.1
5.0  dm-0  700.1 16.0    10.9    0%    0.4    0.5  5689.4  5.8    32.2    0%     1.9    0.3  88%    0   6389.5  0.2  0.1
6.0  dm-0  774.1 16.0    12.1    0%    0.3    0.4  6409.5  5.5    34.2    0%     1.7    0.3  86%    0   7183.5  0.2  0.1
7.0  dm-0  849.6 16.0    13.3    0%    0.4    0.5  6151.2  5.4    32.3    0%     1.9    0.3  88%    3   7000.8  0.2  0.1
8.0  dm-0  664.2 16.0    10.4    0%    0.3    0.5  6349.2  5.7    35.1    0%     2.0    0.3  90%    2   7013.4  0.2  0.1
9.0  dm-0  951.0 16.0    14.9    0%    0.4    0.4  5807.0  5.3    29.9    0%     1.8    0.3  90%    3   6758.0  0.2  0.1
10.0 dm-0  742.0 16.0    11.6    0%    0.3    0.5  6461.1  5.1    32.2    0%     1.7    0.3  87%    1   7203.2  0.2  0.1

#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime

1.0 dm-0 530.8 16.0 8.3 0% 0.3 0.5 6124.0 5.1 30.7 0% 1.7 0.3 86% 2 6654.8 0.2 0.1

2.0 dm-0 633.1 16.1 10.0 0% 0.3 0.5 6173.0 6.1 36.6 0% 1.7 0.3 88% 1 6806.1 0.2 0.1

3.0 dm-0 731.8 16.0 11.5 0% 0.4 0.5 6064.2 5.8 34.1 0% 1.9 0.3 90% 2 6795.9 0.2 0.1

4.0 dm-0 711.1 16.0 11.1 0% 0.3 0.5 6448.5 5.4 34.3 0% 1.8 0.3 92% 2 7159.6 0.2 0.1

5.0 dm-0 700.1 16.0 10.9 0% 0.4 0.5 5689.4 5.8 32.2 0% 1.9 0.3 88% 0 6389.5 0.2 0.1

6.0 dm-0 774.1 16.0 12.1 0% 0.3 0.4 6409.5 5.5 34.2 0% 1.7 0.3 86% 0 7183.5 0.2 0.1

7.0 dm-0 849.6 16.0 13.3 0% 0.4 0.5 6151.2 5.4 32.3 0% 1.9 0.3 88% 3 7000.8 0.2 0.1

8.0 dm-0 664.2 16.0 10.4 0% 0.3 0.5 6349.2 5.7 35.1 0% 2.0 0.3 90% 2 7013.4 0.2 0.1

9.0 dm-0 951.0 16.0 14.9 0% 0.4 0.4 5807.0 5.3 29.9 0% 1.8 0.3 90% 3 6758.0 0.2 0.1

10.0 dm-0 742.0 16.0 11.6 0% 0.3 0.5 6461.1 5.1 32.2 0% 1.7 0.3 87% 1 7203.2 0.2 0.1

Let me show you a final example. I used –interval and –iterations parameters for pt-diskstats which tells us to wait for a number of seconds before printing the next disk stats and to limit the number of samples respectively. If you notice, you will see in 3rd iteration high latency (rd_rt, wr_rt) mostly for reads. Also, you can notice a high value for queue time (qtime) and service time (stime) where qtime is related to disk IO scheduler settings. For MySQL database servers we usually recommends noop/deadline instead of default cfq.

$ pt-diskstats --interval=20 --iterations=3
#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime
10.4 hda   11.7  4.0     0.0     0%     0.0    1.1  40.7   11.7    0.5    26%    0.1    2.1  10%    0    52.5  0.4   1.5
10.4 hda2   0.0  0.0     0.0     0%     0.0    0.0   0.4    7.0    0.0    43%    0.0    0.1   0%    0     0.4  0.0   0.1
10.4 hda3   0.0  0.0     0.0     0%     0.0    0.0   0.4  107.0    0.0    96%    0.0    0.2   0%    0     0.4  0.0   0.2
10.4 hda5   0.0  0.0     0.0     0%     0.0    0.0   0.7   20.0    0.0    80%    0.0    0.3   0%    0     0.7  0.1   0.2
10.4 hda6   0.0  0.0     0.0     0%     0.0    0.0   0.1    4.0    0.0     0%    0.0    4.0   0%    0     0.1  0.0   4.0
10.4 hda9  11.7  4.0     0.0     0%     0.0    1.1  39.2   10.7    0.4     3%    0.1    2.7   9%    0    50.9  0.5   1.8
10.4 drbd1 11.7  4.0     0.0     0%     0.0    1.1  39.1   10.7    0.4     0%    0.1    2.8   9%    0    50.8  0.5   1.7
 
20.0 hda   14.6  4.0     0.1     0%     0.0    1.4  39.5   12.3    0.5    26%    0.3    6.4  18%    0    54.1  2.6   2.7
20.0 hda2   0.0  0.0     0.0     0%     0.0    0.0   0.4    9.1    0.0    56%    0.0   42.0   3%    0     0.4  0.0  42.0
20.0 hda3   0.0  0.0     0.0     0%     0.0    0.0   1.5   22.3    0.0    82%    0.0    1.5   0%    0     1.5  1.2   0.3
20.0 hda5   0.0  0.0     0.0     0%     0.0    0.0   1.1   18.9    0.0    79%    0.1   21.4  11%    0     1.1  0.1  21.3
20.0 hda6   0.0  0.0     0.0     0%     0.0    0.0   0.8   10.4    0.0    62%    0.0    1.5   0%    0     0.8  1.3   0.2
20.0 hda9  14.6  4.0     0.1     0%     0.0    1.4  35.8   11.7    0.4     3%    0.2    4.9  18%    0    50.4  0.5   3.5
20.0 drbd1 14.6  4.0     0.1     0%     0.0    1.4  36.4   11.6    0.4     0%    0.2    5.1  17%    0    51.0  0.5   3.4
 
20.0 hda    0.9  4.0     0.0     0%     0.2   251.9  28.8  61.8    1.7    92%    4.5   13.1  31%    2    29.6 12.8   0.9
20.0 hda2   0.0  0.0     0.0     0%     0.0     0.0   0.6   8.3    0.0    52%    0.1   98.2   6%    0     0.6 48.9  49.3
20.0 hda3   0.0  0.0     0.0     0%     0.0     0.0   2.0  23.2    0.0    83%    0.0    1.4   0%    0     2.0  1.2   0.3
20.0 hda5   0.0  0.0     0.0     0%     0.0     0.0   4.9 249.4    1.2    98%    4.0   13.2   9%    0     4.9 12.9   0.3
20.0 hda6   0.0  0.0     0.0     0%     0.0     0.0   0.0   0.0    0.0     0%    0.0    0.0   0%    0     0.0  0.0   0.0
20.0 hda9   0.9  4.0     0.0     0%     0.2   251.9  21.3  24.2    0.5    32%    0.4   12.9  31%    2    22.2 10.2   9.7
20.0 drbd1  0.9  4.0     0.0     0%     0.2   251.9  30.6  17.0    0.5     0%    0.7   24.1  30%    5    31.4 21.0   9.5

$ pt-diskstats --interval=20 --iterations=3

#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime

10.4 hda 11.7 4.0 0.0 0% 0.0 1.1 40.7 11.7 0.5 26% 0.1 2.1 10% 0 52.5 0.4 1.5

10.4 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 7.0 0.0 43% 0.0 0.1 0% 0 0.4 0.0 0.1

10.4 hda3 0.0 0.0 0.0 0% 0.0 0.0 0.4 107.0 0.0 96% 0.0 0.2 0% 0 0.4 0.0 0.2

10.4 hda5 0.0 0.0 0.0 0% 0.0 0.0 0.7 20.0 0.0 80% 0.0 0.3 0% 0 0.7 0.1 0.2

10.4 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.1 4.0 0.0 0% 0.0 4.0 0% 0 0.1 0.0 4.0

10.4 hda9 11.7 4.0 0.0 0% 0.0 1.1 39.2 10.7 0.4 3% 0.1 2.7 9% 0 50.9 0.5 1.8

10.4 drbd1 11.7 4.0 0.0 0% 0.0 1.1 39.1 10.7 0.4 0% 0.1 2.8 9% 0 50.8 0.5 1.7

20.0 hda 14.6 4.0 0.1 0% 0.0 1.4 39.5 12.3 0.5 26% 0.3 6.4 18% 0 54.1 2.6 2.7

20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 9.1 0.0 56% 0.0 42.0 3% 0 0.4 0.0 42.0

20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 1.5 22.3 0.0 82% 0.0 1.5 0% 0 1.5 1.2 0.3

20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 1.1 18.9 0.0 79% 0.1 21.4 11% 0 1.1 0.1 21.3

20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.8 10.4 0.0 62% 0.0 1.5 0% 0 0.8 1.3 0.2

20.0 hda9 14.6 4.0 0.1 0% 0.0 1.4 35.8 11.7 0.4 3% 0.2 4.9 18% 0 50.4 0.5 3.5

20.0 drbd1 14.6 4.0 0.1 0% 0.0 1.4 36.4 11.6 0.4 0% 0.2 5.1 17% 0 51.0 0.5 3.4

20.0 hda 0.9 4.0 0.0 0% 0.2 251.9 28.8 61.8 1.7 92% 4.5 13.1 31% 2 29.6 12.8 0.9

20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.6 8.3 0.0 52% 0.1 98.2 6% 0 0.6 48.9 49.3

20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 2.0 23.2 0.0 83% 0.0 1.4 0% 0 2.0 1.2 0.3

20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 4.9 249.4 1.2 98% 4.0 13.2 9% 0 4.9 12.9 0.3

20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0

20.0 hda9 0.9 4.0 0.0 0% 0.2 251.9 21.3 24.2 0.5 32% 0.4 12.9 31% 2 22.2 10.2 9.7

20.0 drbd1 0.9 4.0 0.0 0% 0.2 251.9 30.6 17.0 0.5 0% 0.7 24.1 30% 5 31.4 21.0 9.5

You can see the busy column in pt-diskstats output which is the same as the util column in iostat – which points to utilization. Actually, pt-diskstats is quite similar to the iostat tool but pt-diskstats is more interactive and has more information. The busy percentage is only telling us for how long the IO subsystem was busy, but is not indicating capacity. So the only time you care about %busy is when it’s 100% and at the same time latency (await in iostat and rd_rt/wr_rt in diskstats output) increases over -say- 5ms. You can estimate capacity of your IO subsystem and then look at the IOPS being consumed (r/s + w/s columns). Also, the system can process more than one request in parallel (in case of RAID) so %busy can go beyond 100% in pt-diskstats output.

If you need to check disk throughput, block device IOPS run the following to capture metrics from your IO subsystem and see if utilization matches other worrisome symptoms. I would suggest capturing disk stats during peak load. Output can be grouped by sample or by disk using the –group-by option. You can use the sysbench benchmark tool for this purpose to measure database server performance. You will find this link useful for sysbench tool details.

$ pt-diskstats --group-by=all --iterations=7200 > /tmp/pt-diskstats.out;

1	$ pt-diskstats --group-by=all --iterations=7200 > /tmp/pt-diskstats.out;

Conclusion:
pt-diskstats is one of the finest tools from Percona Toolkit. By using this tool you can easily spot disk bottlenecks, measure the IO subsystem and identify how much IOPS your drive can handle (i.e. disk capacity).

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

gggeek

9 years ago

Pardon my ignorance, but how does this work for NFS mounts? Are they ignored, treated as physical disks, or something else? And drbdb volumes?

One of the things I often find hard to grok when looking at the output of the disk-statistics tools is how to map the data to the underlying physical disks – esp. when there are network filesystems and LVM at play

Ernie Souhrada

9 years ago

NFS mounts are ignored; pt-diskstats collects its information from /proc/diskstats, which does not contain any information about NFS volumes. For DRBD volumes, both the actual physical disk and the DRBD volume will appear in the pt-diskstats output, often with very different numbers, since the DRBD metrics will include network and DRBD overhead but the physical device metrics will just be from the raw disk. The same is true for LVM, since you might have multiple LVs on a given PV or a singe LV spanning multiple PVs.

Valerie

9 years ago

Doesn’t work on FreeBSD. Sad face.

Cannot open /proc/diskstats: No such file or directory at /usr/local/bin/pt-diskstats line 3355.

Masroor Farooqi

9 years ago

I like that this is a read only tool. Combine that with fact that it polls /proc/diskstats means that one does not have to worry about additional overhead to the disk subsystem. As may be obvious to most readers of this post, I/O access is the most expensive operation that a database does and DBAs spend (or should spend) a lot of time fine tuning this. Careful use of a tool like this, or say, iostat should be a “must use” tool in any DBAs toolkit.

One point I’d like to make to gggeek in the context of databases is that please do not try to put database datafiles on NFS volumes. In fact, something like Oracle actually checks for NFS and if it finds its tablespaces are on NFS it will refuse to go online. This is because NFS is a no guarantee file subsystem. Couple that with the fact that it uses TCP as a transport mechanism and the result is that database access is (a) slow, and (b) not fully guaranteed.

gggeek

Reply to Masroor Farooqi

9 years ago

@Masroor I pretty much agree.

But as a web consultant, I often come across servers which are not dedicated to database serving, and they can host the webserver as well as memcache, solr, queueing services and whatnot. It is thus important to be able to easily tell apart IO coming from the different services, and which disks/partitions are being hit.

And while NFS is not so good a choice for a db storage, using DRBD is quite a common practice to achieve high-availability setups.

Masroor Farooqi

9 years ago

gggeek Agreed. I was a database consultant for a long time, so I have seen my fair share of, lets say, architectural abuses 🙂 and it’s invaluable to know what is the disk spinning for. I have not personally used DRBD for database replication, but have for offsite backups (we used a proprietary block replication solution called Compellent). Anyway’s good to know for future.

Have a good one

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL 101: Monitor Disk I/O with pt-diskstats

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL 101: Monitor Disk I/O with pt-diskstats

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation