DDLs Percona XtraDB Cluster InconsistenciesApologies for the silly title, but the issue is a real one, even though it is not a new thing. Schema upgrades are not an ordinary operation in Galera. For the subject at hand, the bottom line is: under the default Total Order Isolation (TOI) method, “the cluster replicates the schema change query as a statement before its execution.” What this means in practice is that a DDL issued in one node is replicated to other nodes in the cluster before it is even executed in the source node, let alone completed successfully.

As a result of this, it may fail in one node and be successful in another, and this without raising loud alerts or stopping nodes to protect against data inconsistency. This is not a bug in itself but rather a compromise of design. With new changes in MySQL and the recent support for atomic DDLs added in version 8.0, we may see improvements in this area over time in Galera as well. For now, the warning should be maintained: DDLs are not atomic in Galera, even in 8.0.

How Big of an Issue Is This?

To be fair, it’s rare – we do not see this happening often.

How Does the Problem Manifest Itself in Practice? Give Us an Example!

We recently investigated a case where a DBA issued a TRUNCATE TABLE statement in one node, which was taking some time to complete. His reaction at the time was to hit Ctrl+C on the session where the DDL was issued. That session seemed to hang, so he ended up opening a new session and issued a full DELETE on that same table. It also took time to complete, however this time, he was more patient. The problem was: once the statement was completed, he noticed that the two other nodes disappeared from the cluster. Upon closer inspection of the logs, both nodes reported the following error before evicting the cluster:

The DELETE was successful on the source node but failed to complete on the other two nodes. Why? Because the target table was already empty: the previous TRUNCATE TABLE was effectively executed in those nodes.

Wasn’t the TRUNCATE TABLE also successful on the source node? It turned out it wasn’t. In fact, the TRUNCATE was recorded in the slow query log of the source node like this:

Note the Last_errno indicated above:

Our colleague Przemek reproduced a similar problem, which he described in bug PXC-3536: Modify processing to not allow threads/queries to be killed if the thread is in TOI. The Percona Engineering team has since fixed the abortion issue, leaving the following note on the bug tracker:

Fixed in 5.7[.33-31.49] and 8.0[.22-13.1].  Any attempt to kill the query or connection while in TOI will fail.  Note that due to timing, the “kill query” may sometimes return success, but the operation will still have succeeded.

In the case we investigated, the MySQL error log of the source node had the answer; there was some other higher-priority transaction already in place on that node that prevented it from executing the TRUNCATE:

To clarify, this cluster was receiving writes from multiple nodes, and the higher-priority transaction was applied by a replication thread, which usually has precedence over local events when a conflict is detected.

As you may have suspected, the TRUNCATE was not found in the binary log of the source node; since the statement did not complete, it was not logged. Only the subsequent DELETE statement was there.

And as for the binary logs of the other two nodes, they included both the TRUNCATE as well as the DELETEs covering all rows in the target table. Rows these other nodes no longer had in place and thus were unable to remove caused the data inconsistency that followed.