As of Percona Toolkit v2.1.5, pt-table-checksum works correctly with Percona XtraDB Cluster, but it doesn’t work quite like a traditional replication setup because cluster nodes are not like traditional replicas. In this post I demonstrate how to use pt-table-checksum with Percona XtraDB Cluster.
First, you’ll need Percona Toolkit v2.1.5 or newer and Percona XtraDB Cluster 5.5.27-23.6 or newer (these are the current versions at the time of writing). Second, I presume that you already know how to use pt-table-checksum and that your cluster is already setup. If the later presumption is false, then read the Percona XtraDB Cluster documentation or watch the Percona XtraDB Cluster – Installation and Setup webinar. My setup has 3 nodes called “node300”, “node4000”, and “node6000” on ports 4000, 5000, and 6000 respectively.
The first thing you must do is create a “DSNs table” because pt-table-checksum cannot currently auto-detect cluster nodes. My DSNs table contains:
1 2 3 4 5 6 7 8 | node4000> SELECT * FROM percona.dsns; +----+-----------+-----------------------+ | id | parent_id | dsn | +----+-----------+-----------------------+ | 1 | NULL | h=127.1,P=4000,u=root | | 2 | NULL | h=127.1,P=5000,u=root | | 3 | NULL | h=127.1,P=6000,u=root | +----+-----------+-----------------------+ |
There is a row for each node in the cluster, specifying the node’s DSN. (The id and parent_id columns are not used by pt-table-chekcusm yet, so their values don’t matter.) Since all these nodes are in a cluster, this table should exist on all nodes. This means you can run pt-table-checksum on any node. I’ll run the tool on node4000.
The second thing you must do is specify “–recursion-method dsn=DSN-TABLE” when running pt-table-checksum, where “DSN-TABLE” is a DSN specifying the above DSNs table. For example:
1 | --recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns |
That makes pt-table-checksum connect to the given host (127.0.0.1:4000), and select all rows from the given table (percona.dsns), and check those nodes for differences. Without this option, the tool may throw an error like:
1 2 | node4000 is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster. |
If the tool detects that the master host (the main DSN specified on the command line) is a cluster node but no other cluster nodes were specified and no regular replicas were detected, then it could checksum but it could not detect any differences, so it throws that error rather than doing only half its job. Furthermore, not being able to detect differences might be misinterpreted as there being no differences, so it’s better to error than potentially mislead users.
The aforementioned hints at something else you should know: pt-table-checksum can work with cluster nodes and regular replicas at the same time, but since cluster nodes require –recursion-method=dsn, you must also specify any regular replicas in the DSN table, else pt-table-checksum will not check them for differences.
Here’s the abbreviated output of a real run:
1 2 3 4 5 6 7 8 9 10 11 | $ ./pt-table-checksum h=127.1,P=4000,u=root \ --recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns Not checking replica lag on node4000 because it is a cluster node. Not checking replica lag on node5000 because it is a cluster node. Not checking replica lag on node6000 because it is a cluster node. TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 10-05T14:02:17 0 0 0 1 0 0.025 mysql.columns_priv 10-05T14:02:17 0 0 0 1 0 0.019 mysql.db ... 10-05T14:02:18 0 0 3 1 0 0.023 percona.dsns 10-05T14:02:18 0 0 10 1 0 0.032 test.t |
The tool works as usual: checking every database and table. So far there are no differences, but we’ll run it again with a difference on one node to prove that it works. Right now, though, notice the three warnings at the beginning of the output: pt-table-checksum cannot and will not check the replica lag on cluster nodes because SHOW SLAVE STATUS on a cluster node doesn’t work (because a node isn’t a slave) and cluster nodes should not be out of sync.
Now let’s make a difference on node5000 and verify that pt-table-checksum detects it when running on node4000:
1 2 3 4 5 6 7 8 9 10 11 12 13 | node5000> SET SQL_LOG_BIN=0; Query OK, 0 rows affected (0.00 sec) node5000> UPDATE test.t SET val=val+1 WHERE id=30; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 node5000> SELECT * FROM test.t WHERE id=30; +----+------+ | id | val | +----+------+ | 30 | 11 | +----+------+ |
Ok, so row 30 has value 11 on node5000. Let’s double-check that it’s different (the original value) on the other nodes:
1 2 3 4 5 6 7 8 9 10 11 12 13 | node4000> SELECT * FROM test.t WHERE id=30; +----+------+ | id | val | +----+------+ | 30 | 10 | +----+------+ node6000> SELECT * FROM test.t WHERE id=30; +----+------+ | id | val | +----+------+ | 30 | 10 | +----+------+ |
Ok, so we know there’s a difference on node5000, now let’s see if pt-table-checksum detects it:
1 2 3 4 5 6 7 | $ ./pt-table-checksum h=127.1,P=4000,u=root -d test \ --recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns Not checking replica lag on node4000 because it is a cluster node. Not checking replica lag on node5000 because it is a cluster node. Not checking replica lag on node6000 because it is a cluster node. TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 10-05T14:47:52 0 1 10 1 0 0.023 test.t |
Success! The difference was detected. Since we’re still “PXC-approving” pt-table-sync, it’s best that a DBA manually investigate any differences.
In summary:
- pt-table-checksum v2.1.5 and newer works with Percona XtraDB Cluster 5.5.27-23.6 and newer
- You must run pt-table-checksum with –recrusion-method=dsn
- In the DSN table you must specify all the cluster node that you want to check
- Any differences should be investigated manually until pt-table-sync is “PXC-approved”
- There are other limitations when using pt-table-checksum with Percona XtraDB Cluster
Please submit bugs if found.
You skipped the most interesting part: How does it work?
Daniel –
Can you provide a refresh of this article, based on new recursion-method=cluster?
Thanks,
Scott
PXC required binlog_format=ROW on all nodes. But pt-table-checksum required running on binlog_format=STATEMENT so this node should not be a cluster node. How it works?
No longer works properly.
https://bugs.launchpad.net/percona-toolkit/+bug/1316863
Joel, It’s fixed with PXC 5.6.20. Horray!
I have two entries in the dsns table but it does only one of the slaves.