Replay AnomalyIn this blog post, we’ll look at a transaction replay anomaly in Percona XtraDB Cluster.

Introduction

Percona XtraDB Cluster/Galera replays a transaction if the data is non-conflicting but, the transaction happens to have conflicting locks.

Anomaly

Let’s understand this with an example:

  • Let’s assume a two-node cluster (node-1 and node-2)
  • Base table “t” is created as follows:

  • node-2 starts runs a transaction (trx-2):

  • node-2 creates a write-set and is just about to replicate it. At the same time, node-1 executes the following transaction (trx-1), and is first to add it to the group-channel (before node-2 adds transaction (trx-2))

  • trx-1 is replicated on node-2, and it proceeds with the apply action. Since there is a lock conflict (no certification conflict), node-2 local transaction (trx-2) is aborted and scheduled for replay.
  • trx-1 causes addition of (3, ‘a’) and then node-2 transaction is REPLAYed.
  • REPLAY is done using the pre-created write-set that only modifies existing entries (1,2,4).

End-result:

  • At first, nothing looks wrong. If you look closely, however, the REPLAYed transaction “UPDATE t set c= ‘pqr'” is last to commit. But the effect of it is not seen as there is still a row (3, ‘a’) that has ‘a’ instead of ‘pqr’.

  • We have used a simple char string, but if there is a constraint here, like c should have X after UPDATE is complete, than the CONSTRAINT will be violated even though the application reports UPDATE as a success.
  • Is it interesting to note what happens on node-1:
    • node-1 applies the local transaction (trx-1) and then gets the replicated write-set from node-2 (trx-2) that has changes only for (1,2,4). Thereby data consistency is not compromised.