In this blog post, we’ll look at a transaction replay anomaly in Percona XtraDB Cluster.
Introduction
Percona XtraDB Cluster/Galera replays a transaction if the data is non-conflicting but, the transaction happens to have conflicting locks.
Anomaly
Let’s understand this with an example:
- Let’s assume a two-node cluster (node-1 and node-2)
- Base table “t” is created as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | create database test; use test; create table t (i int, c char(20), primary key pk(i)) engine=innodb; insert into t values (1, 'abc'), (2, 'abc'), (4, 'abc'); select * from t; mysql> select * from t; +---+------+ | i | c | +---+------+ | 1 | abc | | 2 | abc | | 4 | abc | +---+------+ |
- node-2 starts runs a transaction (trx-2):
1 | trx-2: update t set c = 'pqr'; |
- node-2 creates a write-set and is just about to replicate it. At the same time, node-1 executes the following transaction (trx-1), and is first to add it to the group-channel (before node-2 adds transaction (trx-2))
1 | trx-1: insert into t values (3, 'a'); |
- trx-1 is replicated on node-2, and it proceeds with the apply action. Since there is a lock conflict (no certification conflict), node-2 local transaction (trx-2) is aborted and scheduled for replay.
- trx-1 causes addition of (3, ‘a’) and then node-2 transaction is REPLAYed.
- REPLAY is done using the pre-created write-set that only modifies existing entries (1,2,4).
End-result:
1 2 3 4 5 6 7 8 9 | mysql> select * from t; +---+------+ | i | c | +---+------+ | 1 | pqr | | 2 | pqr | | 3 | a | | 4 | pqr | +---+------+ |
- At first, nothing looks wrong. If you look closely, however, the REPLAYed transaction “UPDATE t set c= ‘pqr'” is last to commit. But the effect of it is not seen as there is still a row (3, ‘a’) that has ‘a’ instead of ‘pqr’.
1 2 3 4 5 6 7 8 9 10 11 12 | | mysql-bin.000003 | 792 | Gtid | 2 | 857 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:4' | | mysql-bin.000003 | 857 | Query | 2 | 925 | BEGIN | | mysql-bin.000003 | 925 | Table_map | 2 | 972 | table_id: 219 (test.t) | | mysql-bin.000003 | 972 | Write_rows | 2 | 1014 | table_id: 219 flags: STMT_END_F existing| | mysql-bin.000003 | 1014 | Xid | 2 | 1045 | COMMIT /* xid=4 */ | | mysql-bin.000003 | 1045 | Gtid | 3 | 1110 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:5' | | mysql-bin.000003 | 1110 | Query | 3 | 1187 | BEGIN | | mysql-bin.000003 | 1187 | Table_map | 3 | 1234 | table_id: 219 (test.t) | | mysql-bin.000003 | 1234 | Update_rows | 3 | 1324 | table_id: 219 flags: STMT_END_F | | mysql-bin.000003 | 1324 | Xid | 3 | 1355 | COMMIT /* xid=5 */ | +------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------------------+ 21 rows in set (0.00 sec) |
- We have used a simple char string, but if there is a constraint here, like c should have X after UPDATE is complete, than the CONSTRAINT will be violated even though the application reports UPDATE as a success.
- Is it interesting to note what happens on node-1:
- node-1 applies the local transaction (trx-1) and then gets the replicated write-set from node-2 (trx-2) that has changes only for (1,2,4). Thereby data consistency is not compromised.