How easy is it to identify and debug Percona XtraDB Cluster replication problem ?

If you are using PXC, you may have already seen in your datadirectory several log files starting with GRA_

Those files correspond to a replication failure. That means the slave thread was not able to apply one transaction. For each of those file, a corresponding warning or error message is present in the mysql error log file.

Those error can also be false positive like a bad DDL statement (DROP a table that doesn’t exists for example) and therefore nothing to worry about. However it’s always recommended to understand what’s is happening.

As the GRA files contain binlog events in ROW format representing the failed transaction this post explains how to proceed.

The first step to be able to analyze your GRA files is to add a binlog header to the file.
You can download one here :GRA-header

We can verify it easily:

Now we need to select one GRA log file:

We add the header and we can then use mysqlbinlog to see its content:

So it’s clear that the problem occurred when inserting a record in sakila.actor table.
And if we check in the error log for the corresponding error message (we know at what time to check):

In this case it’s obvious why it failed but it’s not always the case. Now you know how to find the cause of these replication problems.

Also those files (GRA_*.log) doesn’t clean up automatically and are present only for troubleshooting purpose, so after having identified if they really represent a problem or not, you can manually delete them.

This was also discussed in galera-codership mailing list.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Marcus

Is there a header for 5.7?

Marc Castrovinci

Great info! Is there a way to point the GRA_* files to another folder or location?

Kasi

I can able to see the replicated statement exactly by using cat GRA_2_146.log .