Checking Data Consistency for RDS for MySQL

data consistency RDS MySQL MySQL for RDS and DBaaS, in general, are very controlled environments by the vendors, meaning that there are missing things like a SUPER grant for the root user (and any user in general). This has some implications on operations, one of them being the impossibility of running pt-table-checksum to verify data consistency between a primary and its replicas.

However, there’s a workaround that might overcome this situation and involves three things:

The pt-table-checksum itself
A way to collect executed queries
And the last one, which can be controversial, is to remove the read-only from the replica and use a maintenance window to stop traffic to the database while pt-table-checksum runs.

The problem with RDS is that you cannot change binlog_format to STATEMENT, which is one of the requirements for pt-table-checksum to run.

The workaround consists of capturing the executed queries and replay it them in the replica. There are several ways to collect the queries: one can be using the Performance Schema in a similar way as explained in this blog post. Another one is just using the slow log with long_query_time = 0. By default on RDS the log output is set to TABLE so with a simple query against mysql.slow_log you can get the queries. Another option that we prefer to avoid is to use pt-query-digest processlist feature since it might lose capturing some queries.

Queries look like this:

# Time: 2020-09-01T15:20:34
# User@Host: percona[percona] @ 192.168.1.200:59646 []
# Query_time: 0.007615  Lock_time: 0.000000  Rows_sent: 0  Rows_examined: 0
use dani;
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'dani', 'inconsistency', '5', 'PRIMARY', '9', NULL, COUNT(*), '0' FROM `dani`.`inconsistency` FORCE INDEX(`PRIMARY`) WHERE ((`id` > '9')) ORDER BY `id` /*past upper chunk*/;
# Time: 2020-09-01T15:20:34
# User@Host: percona[percona] @ 192.168.1.200:59646 []
# Query_time: 0.009266  Lock_time: 0.000000  Rows_sent: 0  Rows_examined: 0
use dani;
UPDATE `percona`.`checksums` SET chunk_time = '0.008633', master_crc = '0', master_cnt = '0' WHERE db = 'dani' AND tbl = 'inconsistency' AND chunk = '5';

# Time: 2020-09-01T15:20:34

# User@Host: percona[percona] @ 192.168.1.200:59646 []

# Query_time: 0.007615 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0

use dani;

REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'dani', 'inconsistency', '5', 'PRIMARY', '9', NULL, COUNT(*), '0' FROM `dani`.`inconsistency` FORCE INDEX(`PRIMARY`) WHERE ((`id` > '9')) ORDER BY `id` /*past upper chunk*/;

# Time: 2020-09-01T15:20:34

# User@Host: percona[percona] @ 192.168.1.200:59646 []

# Query_time: 0.009266 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0

use dani;

UPDATE `percona`.`checksums` SET chunk_time = '0.008633', master_crc = '0', master_cnt = '0' WHERE db = 'dani' AND tbl = 'inconsistency' AND chunk = '5';

The next step is to send those queries to the replicas as soon as possible so we can somehow guarantee that the point in time for comparison is the same for tables on both primary and secondary. And that’s the reason why one needs to change the read-only value in the replicas to 0. A change that can be rollbacked immediately after the pt-table-checksum process ends.

The Proof of Concept

I have created an RDS primary/secondary environment and have added a table with inconsistency on purpose.

Primary values:

mysql> select * from inconsistency;
+----+--------------+---------------+
| id | string_field | numeric_field |
+----+--------------+---------------+
|  1 | casa         |             1 |
|  2 | caza         |             2 |
|  3 | auto         |             3 |
|  4 | auto         |             3 |
|  5 | auto         |             4 |
|  6 | auto         |             5 |
|  7 | autos        |             5 |
|  8 | autos        |             6 |
|  9 | pepe         |             1 |
+----+--------------+---------------+
9 rows in set (0.09 sec)

mysql> select * from inconsistency;

+----+--------------+---------------+

| id | string_field | numeric_field |

+----+--------------+---------------+

| 1 | casa | 1 |

| 2 | caza | 2 |

| 3 | auto | 3 |

| 4 | auto | 3 |

| 5 | auto | 4 |

| 6 | auto | 5 |

| 7 | autos | 5 |

| 8 | autos | 6 |

| 9 | pepe | 1 |

+----+--------------+---------------+

9 rows in set (0.09 sec)

And replica values:

mysql> select * from inconsistency;
+----+--------------+---------------+
| id | string_field | numeric_field |
+----+--------------+---------------+
|  1 | casa         |             1 |
|  2 | caza         |             2 |
|  3 | auto         |             3 |
|  4 | auto         |             3 |
|  5 | auto         |             4 |
|  6 | auto         |             5 |
|  7 | autos        |             5 |
|  8 | autos        |             6 |
|  9 | papa         |             1 |
+----+--------------+---------------+
9 rows in set (0.08 sec)

mysql> select * from inconsistency;

+----+--------------+---------------+

| id | string_field | numeric_field |

+----+--------------+---------------+

| 1 | casa | 1 |

| 2 | caza | 2 |

| 3 | auto | 3 |

| 4 | auto | 3 |

| 5 | auto | 4 |

| 6 | auto | 5 |

| 7 | autos | 5 |

| 8 | autos | 6 |

| 9 | papa | 1 |

+----+--------------+---------------+

9 rows in set (0.08 sec)

Can you spot the difference :)? It’s the last row. While on the Primary the string_field says “pepe” in the replica it says “papa”.

So are we ready to run pt-table-checksum? Not quite. The tool will complain about not being able to change the binlog_format and it will end the execution. Unfortunately, currently, there’s no way to avoid that other than modifying the code. The change is to add a return to the following conditional:

      if ( VersionParser->new($dbh) >= '5.1.5' ) {
         $sql = 'SELECT @@binlog_format';

1 2	if ( VersionParser->new($dbh) >= '5.1.5' ) { $sql = 'SELECT @@binlog_format';

With the return:

      if ( VersionParser->new($dbh) >= '5.1.5' ) {
         return;
         $sql = 'SELECT @@binlog_format';

if ( VersionParser->new($dbh) >= '5.1.5' ) {

return;

$sql = 'SELECT @@binlog_format';

In pt-table-checksum version 3.2.1, that is in the line 10181:
https://github.com/percona/percona-toolkit/blob/release-3.2.1/bin/pt-table-checksum#L10181

Now we are ready! Let’s see if we can find out that difference using the tools. To send the queries to the replicas, execute the queries previously captured.

And finally, the actual pt-table-checksum command:

pt-table-checksum --host=dgb-primary --user=percona --password=xxxxx --no-check-binlog-format --no-check-slave-tables --databases=dani --recursion-method=none --chunk-size=3

1	pt-table-checksum --host=dgb-primary --user=percona --password=xxxxx --no-check-binlog-format --no-check-slave-tables --databases=dani --recursion-method=none --chunk-size=3

The output won’t report any difference and is expected to happen like that, so don’t panic. So, how do we check the reality? By querying the checksums table in the replica:

mysql> select * from percona.checksums;
+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db   | tbl           | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts                  |
+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| dani | inconsistency |     1 |    0.00877 | PRIMARY     | 1              | 3              | ae8eafc4 |        3 | ae8eafc4   |          3 | 2020-09-01 16:48:04 |
| dani | inconsistency |     2 |   0.008754 | PRIMARY     | 4              | 6              | 374d887b |        3 | 374d887b   |          3 | 2020-09-01 16:48:04 |
| dani | inconsistency |     3 |   0.008737 | PRIMARY     | 7              | 9              | 25680fb9 |        3 | d7e101a5   |          3 | 2020-09-01 16:48:04 |
| dani | inconsistency |     4 |   0.008944 | PRIMARY     | NULL           | 1              | 0        |        0 | 0          |          0 | 2020-09-01 16:48:04 |
| dani | inconsistency |     5 |   0.008905 | PRIMARY     | 9              | NULL           | 0        |        0 | 0          |          0 | 2020-09-01 16:48:04 |
+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

mysql> select * from percona.checksums;

+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

| dani | inconsistency | 1 | 0.00877 | PRIMARY | 1 | 3 | ae8eafc4 | 3 | ae8eafc4 | 3 | 2020-09-01 16:48:04 |

| dani | inconsistency | 2 | 0.008754 | PRIMARY | 4 | 6 | 374d887b | 3 | 374d887b | 3 | 2020-09-01 16:48:04 |

| dani | inconsistency | 3 | 0.008737 | PRIMARY | 7 | 9 | 25680fb9 | 3 | d7e101a5 | 3 | 2020-09-01 16:48:04 |

| dani | inconsistency | 4 | 0.008944 | PRIMARY | NULL | 1 | 0 | 0 | 0 | 0 | 2020-09-01 16:48:04 |

| dani | inconsistency | 5 | 0.008905 | PRIMARY | 9 | NULL | 0 | 0 | 0 | 0 | 2020-09-01 16:48:04 |

+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

See the difference? It is the chunk number 3, the “this_crc” and “master_crc” are different. It’s hard to spot, right? Let’s try with some filters to the query:

mysql> SELECT * FROM percona.checksums WHERE (  master_cnt <> this_cnt  OR master_crc <> this_crc  OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db   | tbl           | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts                  |
+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| dani | inconsistency |     3 |   0.008602 | PRIMARY     | 7              | 9              | 25680fb9 |        3 | d7e101a5   |          3 | 2020-09-01 16:56:16 |
+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
1 row in set (0.08 sec)

mysql> SELECT * FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;

+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

| dani | inconsistency | 3 | 0.008602 | PRIMARY | 7 | 9 | 25680fb9 | 3 | d7e101a5 | 3 | 2020-09-01 16:56:16 |

+------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+

1 row in set (0.08 sec)

There you go, data inconsistency detected.

Working with MySQL 8.0? There’s an even easier way to check data consistency!

Fine print

Some things to consider:

Replicas should be up to date – If there’s a lag between primary and secondary you would get false negatives.
The read-only itself: it’s kind of ironic that to check data consistency you have to disable the one thing that guarantees data consistency. However, it is temporary, and it is highly important to revert to read-only=on once the process is done.
Traffic to the database must be stopped in order to guarantee 100% that the data that we are checking is in the same point-in-time, meaning: no changes happened in between.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

greenlion

3 years ago

Can pt-q-d read from P_S.threads? SHOW PROCESSLIST holds a mutex…

Daniel Guzmán Burgos

Author

Reply to greenlion

3 years ago

it could..that’s a good feature request.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Checking Data Consistency for RDS for MySQL

The Proof of Concept

Fine print

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Checking Data Consistency for RDS for MySQL

The Proof of Concept

Fine print

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation