pmp-check-pt-table-checksumRecently, I worked on a customer case where the customer needed to monitor the checksum via Nagios monitoring. The pmp-check-pt-table-checksum plugin from Percona Monitoring Plugins for MySQL achieves this goal. I thought it was worth a blogpost.

pmp-check-pt-table-checksum alerts you when the pt-table-checksum tool from Percona Toolkit finds data drifts on a replication slave. pmp-checksum-pt-table-checksum monitors data differences on the slave from the checksum table as per information in the last checksum performed by the pt-table-checksum tool. By default, the plugin queries the percona.checksum table to fetch information about data discrepancies. You can override this behavior with the “-T” option. You can check the pmp-check-pt-table-checksum documentation for details.

Let’s demonstrate checksum monitoring via Nagios. My setup contains a master with two slave(s) connected, as follows:

  • Host 10.0.3.131 is master.
  • Host 10.0.3.83 is slave1
  • Host 10.0.3.36 is slave2

I intentionally generated more data on the master so pt-table-checksum can catch the differences on the slave(s). Here’s what it looks like:

As you can see, slave1 and slave2 are different from the master: the master has ten rows while the slave(s) have five rows each (table t1).

Then, I executed pt-table-checksum from the master to check for data discrepancies:

pt-table-checksum correctly identifies the differences for the test.t1 table on slave1 and slave2. Now, you can use the  pmp-check-pt-table-checksum  Percona checksum monitoring plugin. Let’s try to run it locally (via CLI) from the Nagios host.

NOTE: The checksum_user database user needs SELECT privileges on both the checksum table (Percona.checksums) and the slave(s) in order for SQL to alert for checksum differences on slave(s).

On the Nagios monitoring server, you need to add the  pmp-check-pt-table-checksum command to the commands.cfg file:

NOTE: I used “-c” option for pmp-check-pt-table-checksum, which raises a critical error instead of a warning.

And, on the existing hosts.cfg file (i.e., slave1.cfg and slave2.cfg), you need to add a monitoring command accordingly as below:

In this command “1” is an argument to command “-c $ARG1$” so pmp-check-pt-table-checksum will raise a critical error when one or more chunks on the slave(s) are different from the master.

Last but not least, restart the Nagios daemon on the monitoring host to make the change.

Below is how it looks like on the Nagios monitoring on the web:

pmp-check-pt-table-checksum
pmp-check-pt-table-checksum

I also think the “INTERVAL” option is useful:

It makes sure that chunks are recent on the checksum table. Used the other way around, it checks on how old your chunks are. This option ensures the checksum cron executes at a defined number of days. Let’s say you have pt-table-checksum cron running once per week. In that case, setting INTERVAL 14 or 21 alerts you if chunks are older then defined number of days (i.e., the INTERVAL number).

Conclusion:

Percona Monitoring plugins for MySQL are very useful and easy to embed in your centralize monitoring dashboard. You can schedule pt-table-checksum via a cron job, and get reports regarding master/slave(s) data drifts (if any) from one global dashboard on the monitoring host. There are various plugins available from Percona, e.g. processlist plugin, replication delay plugin, etc. Along with that, Percona offers Cacti and Zabbix templates to graph various MySQL activities.